# 반도체 공정 이상 탐지

1. 문제상황 및 데이터 살펴보기
2. 문제해결 프로세스 정의
3. Data 전처리 및 EDA (데이터 조기 특성 탐색 진행)
4. Feature Selection (의미 있는 변수 선택)
5. 이상 탐지 모델링

In [1]:
# Warnings 제거
import warnings
warnings.filterwarnings('ignore')

# 문제상황 및 데이터 살펴보기

| 시나리오

A사는 반도체를 생산하는 글로벌 회사이다. 반도체 구성품 중 Wafer는 반도체 집적회로의 핵심 재료이다. A사는 반도체의 성능을 향샹시키기 위해서 최근 Wafer 설계를 변경하고 제품을 생산 중에 있다. 설계 변경으로 인해 최근 불량 제품이 발생하고 있어, 이상 탐지 모델링을 통해 이상인 Wafer를 사전에 검출하고자 한다.

> 데이터 살펴보기

- Wafer 공정 Data
- 데이터 명세

|Column|Description|
|:---|:---|
|feature1~n|Wafer 특성 데이터|
|Class|이상 여부|


In [2]:
import pandas as pd
pd.set_option('display.max_columns', 100)
pd.set_option('display.max_rows', 100)

df_train = pd.read_csv("chapter03_df_train.csv")
df_test = pd.read_csv("chapter03_df_test.csv")

df = pd.concat([df_train, df_test], axis=0)
df.head()

# 정상이면 Class == 0.0
# 이상이면 Class == 1.0

Unnamed: 0,feature_1,feature_2,feature_3,feature_4,feature_5,feature_6,feature_7,feature_8,feature_9,feature_10,feature_11,feature_12,feature_13,feature_14,feature_15,feature_16,feature_17,feature_18,feature_19,feature_20,feature_21,feature_22,feature_23,feature_24,feature_25,feature_26,feature_27,feature_28,feature_29,feature_30,feature_31,feature_32,feature_33,feature_34,feature_35,feature_36,feature_37,feature_38,feature_39,feature_40,feature_41,feature_42,feature_43,feature_44,feature_45,feature_46,feature_47,feature_48,feature_49,feature_50,...,feature_1510,feature_1511,feature_1512,feature_1513,feature_1514,feature_1515,feature_1516,feature_1517,feature_1518,feature_1519,feature_1520,feature_1521,feature_1522,feature_1523,feature_1524,feature_1525,feature_1526,feature_1527,feature_1528,feature_1529,feature_1530,feature_1531,feature_1532,feature_1533,feature_1534,feature_1535,feature_1536,feature_1537,feature_1538,feature_1539,feature_1540,feature_1541,feature_1542,feature_1543,feature_1544,feature_1545,feature_1546,feature_1547,feature_1548,feature_1549,feature_1550,feature_1551,feature_1552,feature_1553,feature_1554,feature_1555,feature_1556,feature_1557,feature_1558,Class
0,100.0,160.0,1.6,0.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.0
1,20.0,83.0,4.15,1.0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0.0
2,99.0,150.0,1.5151,1.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.0
3,40.0,40.0,1.0,0.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.0
4,12.0,234.0,19.5,1.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.0


# 문제해결 프로세스 정의

> 문제정의
- 최근 설계변경으로 인한 Wafer 불량 발생

> 기대효과
- Wafer 불량 사전 탐지를 통해 반도체 완성 전 사전 처리
- 불량 및 폐기 비용 감소

> 해결 방안
- 이상 탐지 모델링을 통해 반도체 완성품 조립 전 Wafer 불량 발견
- Data 전처리 및 EDA
- Feature Selection
- 이상 탐지 모델링

> 성과측정 (KPI)
- 모델 활용 전/후 Wafer 불량률 비교

> 현업적용
- Wafer 공정 데이터 수집 체계 구축
- 공정 데이터 Model Input
- 이상 Wafer 추출 및 점검

### 주요 코드
1. 데이터 전처리 및 EDA

.isna(), .dropna(), .isnull().sum().sum(), .loc, value_counts()

2. Feature Selection

StandardScaler().fit_transform(X), np.digitize(df_[col], bins)

3. 모델링

.iloc, pca.explained_variance_ratio_, np.cumsum, .dist_
