## 불균형 데이터셋에 대한 웨이퍼 불량식별을 위한 CNN
### 키워드
- 데이터 전처리
    - Data Augmentaion
    - 불량 클래스
        - Center
        - Donut
        - Local
        - Edge-Loc
        - Edge-Ring
        - Scratch
        - Random
        - Near-Full
        - None
- 모델 구성
    - Batch Normalization
    - Spatical Dropout
    - Regularization

### 데이터 확인사항
- waferMap 사이즈를 확인하여 추후 개발할 신경망 모델의 224x224 사이즈에 맞게 resizing 작업이 필요할 것으로 예상됨
- augmentaion 작업과 resizing 작업이 한 번에 해결될 수 있도록 전처리하는 것이 좋을 듯

In [1]:
import numpy as np
import pandas as pd
import cv2
import matplotlib.pyplot as plt
import random
from tensorflow.keras import layers
from tensorflow.keras import optimizers
from tensorflow.keras.models import Model
from tensorflow.keras.callbacks import ModelCheckpoint

In [2]:
wm811k = pd.read_pickle('./data/LSWMD.pkl')

In [3]:
wm811k.head()

Unnamed: 0,waferMap,dieSize,lotName,waferIndex,trianTestLabel,failureType
0,"[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...",1683.0,lot1,1.0,[[Training]],[[none]]
1,"[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...",1683.0,lot1,2.0,[[Training]],[[none]]
2,"[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...",1683.0,lot1,3.0,[[Training]],[[none]]
3,"[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...",1683.0,lot1,4.0,[[Training]],[[none]]
4,"[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...",1683.0,lot1,5.0,[[Training]],[[none]]


In [4]:
wm811k.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 811457 entries, 0 to 811456
Data columns (total 6 columns):
 #   Column          Non-Null Count   Dtype  
---  ------          --------------   -----  
 0   waferMap        811457 non-null  object 
 1   dieSize         811457 non-null  float64
 2   lotName         811457 non-null  object 
 3   waferIndex      811457 non-null  float64
 4   trianTestLabel  811457 non-null  object 
 5   failureType     811457 non-null  object 
dtypes: float64(2), object(4)
memory usage: 37.1+ MB


In [5]:
# 불필요 컬럼 제거
wm811k = wm811k.drop(['waferIndex'], axis = 1)
wm811k.head()

Unnamed: 0,waferMap,dieSize,lotName,trianTestLabel,failureType
0,"[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...",1683.0,lot1,[[Training]],[[none]]
1,"[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...",1683.0,lot1,[[Training]],[[none]]
2,"[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...",1683.0,lot1,[[Training]],[[none]]
3,"[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...",1683.0,lot1,[[Training]],[[none]]
4,"[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...",1683.0,lot1,[[Training]],[[none]]


In [6]:
# wafermap size 확인 및 컬럼 추가
def find_dim(x):
    dim0=np.size(x,axis=0)
    dim1=np.size(x,axis=1)
    return dim0,dim1
wm811k['waferMapDim']=wm811k['waferMap'].apply(find_dim)
wm811k.head()

Unnamed: 0,waferMap,dieSize,lotName,trianTestLabel,failureType,waferMapDim
0,"[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...",1683.0,lot1,[[Training]],[[none]],"(45, 48)"
1,"[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...",1683.0,lot1,[[Training]],[[none]],"(45, 48)"
2,"[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...",1683.0,lot1,[[Training]],[[none]],"(45, 48)"
3,"[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...",1683.0,lot1,[[Training]],[[none]],"(45, 48)"
4,"[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...",1683.0,lot1,[[Training]],[[none]],"(45, 48)"


In [7]:
# 불량 클래스 확인 및 학습/검증/테스트 데이터 셋 확인
wm811k['failureNum']=wm811k['failureType']
wm811k['trainTestNum']=wm811k['trianTestLabel']
mapping_type={'Center':0,'Donut':1,'Edge-Loc':2,'Edge-Ring':3,'Loc':4,'Random':5,'Scratch':6,'Near-full':7,'none':8}
mapping_traintest={'Training':0,'Test':1}
wm811k=wm811k.replace({'failureNum':mapping_type, 'trainTestNum':mapping_traintest})
wm811k.head()

  op = lambda x: operator.eq(x, b)


Unnamed: 0,waferMap,dieSize,lotName,trianTestLabel,failureType,waferMapDim,failureNum,trainTestNum
0,"[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...",1683.0,lot1,[[Training]],[[none]],"(45, 48)",8,0
1,"[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...",1683.0,lot1,[[Training]],[[none]],"(45, 48)",8,0
2,"[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...",1683.0,lot1,[[Training]],[[none]],"(45, 48)",8,0
3,"[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...",1683.0,lot1,[[Training]],[[none]],"(45, 48)",8,0
4,"[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...",1683.0,lot1,[[Training]],[[none]],"(45, 48)",8,0


In [8]:
wm811k['trianTestLabel'].apply(lambda x: str(x)).value_counts()

[]                638507
[['Test']]        118595
[['Training']]     54355
Name: trianTestLabel, dtype: int64

In [9]:
wm811k['trainTestNum'].apply(lambda x: str(x)).value_counts()

[]    638507
1     118595
0      54355
Name: trainTestNum, dtype: int64

In [10]:
wm811k_train = wm811k.query("trainTestNum == 0")

In [11]:
wm811k_train.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 54355 entries, 0 to 791476
Data columns (total 8 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   waferMap        54355 non-null  object 
 1   dieSize         54355 non-null  float64
 2   lotName         54355 non-null  object 
 3   trianTestLabel  54355 non-null  object 
 4   failureType     54355 non-null  object 
 5   waferMapDim     54355 non-null  object 
 6   failureNum      54355 non-null  object 
 7   trainTestNum    54355 non-null  object 
dtypes: float64(1), object(7)
memory usage: 3.7+ MB


### wm811k_test set
- 10,000개 augmentation된 데이터 train: validation : test = 65: 20: 15
- 최종 모델 확인 이후 wm811k_test 데이터 활용 모델 성능 확인하는 것이 좋을 듯

In [12]:
wm811k_test = wm811k.query("trainTestNum == 1")

In [13]:
wm811k_test.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 118595 entries, 639663 to 811454
Data columns (total 8 columns):
 #   Column          Non-Null Count   Dtype  
---  ------          --------------   -----  
 0   waferMap        118595 non-null  object 
 1   dieSize         118595 non-null  float64
 2   lotName         118595 non-null  object 
 3   trianTestLabel  118595 non-null  object 
 4   failureType     118595 non-null  object 
 5   waferMapDim     118595 non-null  object 
 6   failureNum      118595 non-null  object 
 7   trainTestNum    118595 non-null  object 
dtypes: float64(1), object(7)
memory usage: 8.1+ MB


In [14]:
wm811k_test['failureNum'].apply(lambda x: str(x)).value_counts()

8    110701
2      2772
4      1973
3      1126
0       832
6       693
5       257
1       146
7        95
Name: failureNum, dtype: int64

### waferMap size 확인
- data print 결과 빈 부분은 0, 정상 pixel은 1, 불량 pixel은 2로 표현되어있는 듯
    - input shape 맞출 때 0으로 padding 주듯이 채우면 될 듯
- 예상 전처리 과정 : 기존 데이터에 data augmentaion 적용 $ \rightarrow $ 변형 데이터에 224x224 size zero_padding $ \rightarrow $ input data

In [None]:
# pd.set_option('display.max_columns', None)
# pd.set_option('display.max_rows', None)

In [None]:
# 데이터 형태 출력
# for i in range(len(wm811k.iloc[0]['waferMap'])):
#     print(wm811k.iloc[0]['waferMap'][i])

In [15]:
# waferMap size 확인
wm811k['waferMapDim'].value_counts()

(32, 29)    108687
(25, 27)     64083
(49, 39)     39323
(26, 26)     30078
(30, 34)     29513
             ...  
(24, 71)         1
(61, 55)         1
(54, 69)         1
(18, 4)          1
(32, 71)         1
Name: waferMapDim, Length: 632, dtype: int64

In [16]:
# 학습 데이터 내 불량 클래스 개수 확인
wm811k_train['failureNum'].value_counts()

8    36730
3     8554
0     3462
2     2417
4     1620
5      609
6      500
1      409
7       54
Name: failureNum, dtype: int64

In [17]:
wm811k_train.head()

Unnamed: 0,waferMap,dieSize,lotName,trianTestLabel,failureType,waferMapDim,failureNum,trainTestNum
0,"[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...",1683.0,lot1,[[Training]],[[none]],"(45, 48)",8,0
1,"[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...",1683.0,lot1,[[Training]],[[none]],"(45, 48)",8,0
2,"[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...",1683.0,lot1,[[Training]],[[none]],"(45, 48)",8,0
3,"[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...",1683.0,lot1,[[Training]],[[none]],"(45, 48)",8,0
4,"[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...",1683.0,lot1,[[Training]],[[none]],"(45, 48)",8,0


### Augmentation 구현
- 기법 별로 함수 작성
- 비율 개수만큼 sampling
- sampling 데이터에 작성한 기법별 함수 적용
- waferMapDim 확인 / waferMap 이미지 부분 잘리거나 한 부분 없게끔 적용되게 확인 : 미리 padding 주기?
- data concat

- 10도 회전
    - 참고 : https://076923.github.io/posts/Python-opencv-6/

In [None]:
def test_rotation_10_degree(data_img):
    height, width = data_img.shape
    # positive for anti-clockwise and negative for clockwise
    rotation_10_degree_img = cv2.getRotationMatrix2D((width/2, height/2), 10, 1) # 중심점, 각도, 배율
    dst = cv2.warpAffine(data_img, rotation_10_degree_img, (width,height))
    return dst

In [None]:
def test_rotation_minus_10_degree(data_img):
    height, width = data_img.shape
    # positive for anti-clockwise and negative for clockwise
    rotation_10_degree_img = cv2.getRotationMatrix2D((width/2, height/2), -10, 1) # 중심점, 각도, 배율
    dst = cv2.warpAffine(data_img, rotation_10_degree_img, (width,height))
    return dst

In [None]:
# 회전 변환 다른 라이브러리 활용하는 방법 참고
# from scipy.ndimage.interpolation import rotate
# rotated = rotate(test_wafermap.iloc[3]["waferMap"], angle=10,reshape=False)
# plt.imshow(rotated)
# plt.show()

- 좌우 대칭 및 너비 이동(horizontal flipping and width shift)
    - cv2.flip : https://crmn.tistory.com/54
        - flip_img = cv2.flip(data_img, 1) # 1:좌우반전, 0:상하반전 
    - translate : https://m.blog.naver.com/PostView.naver?isHttpsRedirect=true&blogId=vps32&logNo=221762189533
        - width shift
        - height shift

In [None]:
def test_translate(data_img):
    height, width = data_img.shape
    # random ~ 범위 지정하여 함수 실행 시 마다 무작위로 평행이동 정도 부여하게끔 수정 필요
    # 224x224 size 안에서 이동하도록 코딩 필요
    translate_matrix = np.float32([[1,0,10], [0,1,5]]) # 세로 10, 가로 5 만큼 평행 이동 
    dst = cv2.warpAffine(data_img, translate_matrix, (width,height))
    return dst

- 전단 범위(shearing range)
    - https://www.thepythoncode.com/article/image-transformations-using-opencv-in-python
    - https://stackoverflow.com/questions/57881430/how-could-i-implement-a-centered-shear-an-image-with-opencv

In [None]:
def test_shearing(data_img):
    height, width = data_img.shape
    # random ~ 범위 지정하여 함수 실행 시 마다 x축, y축 shearing 정도 부여하게끔 수정 필요
    shearing_matrix = np.float32([[1, 0.5, 0],    # shearing applied to y-axis
             	                  [0, 1  , 0],    # M = np.float32([[1,   0, 0],
            	                  [0, 0  , 1]])   #             	[0.5, 1, 0],
                                                  #             	[0,   0, 1]])  
    # apply a perspective transformation to the image                
    dst = cv2.warpPerspective(data_img, shearing_matrix, (int(width*1.5),int(height*1.5)))                      
    # 변형 이미지 중심 보정?
    return dst

- 채널이동 및 확대 축소(channel shift and zooming)
    - 단일 채널 이미지라 channel shift는 적용 안 될 것 같음
    - zoom : https://076923.github.io/posts/Python-opencv-7/
    - cv2.resize 함수 참고
        - 참고 : https://seokii.tistory.com/14

In [None]:
def test_zoom(data_img):
    height, width = data_img.shape
    # 2배 확대 이미지 / 가로 세로 값이 조건에 맞으면 dstsize로 값 부여하여 세밀 조정 가능
    # 224x224 size 안에서 2배 확대 가능하도록 코딩 필요
    dst = cv2.pyrUp(data_img, dstsize=(width * 2, height * 2), borderType=cv2.BORDER_DEFAULT)
    return dst

- test

In [None]:
test_wafermap = wm811k_train.query("failureNum == 6").sample(n=5)

In [None]:
test_wafermap.head()

In [None]:
test_wafermap["waferMap_augmentation"] = test_wafermap["waferMap"].apply(lambda x: test_rotation_10_degree(x))
# test_wafermap["waferMap_augmentation"] = test_wafermap["waferMap"].apply(lambda x: test_translate(x))
# test_wafermap["waferMap_augmentation"] = test_wafermap["waferMap"].apply(lambda x: test_shearing(x))
# test_wafermap["waferMap_augmentation"] = test_wafermap["waferMap"].apply(lambda x: test_zoom(x))

In [None]:
test_wafermap.head()

In [None]:
plt.imshow(test_wafermap.iloc[3]["waferMap"])
plt.show()

In [None]:
plt.imshow(test_wafermap.iloc[3]["waferMap_augmentation"])
plt.show()

In [None]:
# augmentated data dim
test_wafermap['waferMap_augmentation_Dim']=test_wafermap['waferMap_augmentation'].apply(find_dim)
test_wafermap.head()

### Input size에 맞게 zero-padding
- 참고 : https://webnautes.tistory.com/1652

In [18]:
def zero_padding(data_img, set_size):
    height, width = data_img.shape
    
    if max(height, width) > set_size:
        return data_img
    
    delta_width = set_size - width
    delta_height = set_size - height
    top, bottom = delta_height//2, delta_height-(delta_height//2)
    left, right = delta_width//2, delta_width-(delta_width//2)
    
    padded_img = cv2.copyMakeBorder(data_img, top, bottom, left, right, cv2.BORDER_CONSTANT, value=[0,0,0])
    return padded_img

### pad 이후 전처리 / 전처리 이후 pad 비교

In [None]:
# 원본
test_origin = test_wafermap.iloc[2]["waferMap"]
plt.imshow(test_origin)
plt.show()

In [None]:
pad_origin = zero_padding(test_origin, 224)
plt.imshow(pad_origin)
plt.show()

- 회전의 경우 pad 이후 회전과 회전 이후 pad 가 비슷해보임

In [None]:
# 10도 회전
degree_10 = test_wafermap.iloc[2]["waferMap_augmentation"]
plt.imshow(degree_10)
plt.show()

pad_10_degree = zero_padding(degree_10, 224)
plt.imshow(pad_10_degree)
plt.show()

degree_10_pad_origin = test_rotation_10_degree(pad_origin)
plt.imshow(degree_10_pad_origin)
plt.show()

- 평행이동 : 이동 이후 wafer 모양이 잘리는 경우를 감안하면 pad 이후 이동하는 것이 나아보임

In [None]:
translated =  test_translate(test_origin)
plt.imshow(translated)
plt.show()

pad_translated = zero_padding(translated, 224)
plt.imshow(pad_translated)
plt.show()

translate_pad_origin = test_translate(pad_origin)
plt.imshow(translate_pad_origin)
plt.show()

In [None]:
translate_pad_origin.shape

- shearing : 잘리지 않을 범위 내에서 shearing 이후 pad 하는 것이 나을 듯
    - shearing 정도에 따라 translate와 마찬가지로 wafer 모양이 잘릴 가능성이 있음
    - pad 이후 shearing 시 shape이 변함

In [None]:
sheared = test_shearing(test_origin)
plt.imshow(sheared)
plt.show()

pad_sheared = zero_padding(sheared, 224)
plt.imshow(pad_sheared)
plt.show()

sheared_pad_origin = test_shearing(pad_origin)
plt.imshow(sheared_pad_origin)
plt.show()

In [None]:
sheared_pad_origin.shape

- zooming : pad 이후 zoom할 경우 448x448 사이즈 이미지가 생성되므로 zoom 이후 pad 적용

In [None]:
zoomed = zoom(test_origin)
plt.imshow(zoomed)
plt.show()

pad_zoomed = zero_padding(zoomed, 224)
plt.imshow(pad_zoomed)
plt.show()

zoom_pad_origin = test_zoom(pad_origin)
plt.imshow(zoom_pad_origin)
plt.show()

### augmentaion 함수별 random 요소 추가 및 padding 함수 추가
- rotation : 회전 변환 이후 padding
- translate : padding 이후 평행이동, dsize ~ 224로 출력
- shearing : shearing 정도에 random 요소 추가, dsize ~ 224로 출력 

In [19]:
def rotation_10_degree(data_img):
    height, width = data_img.shape
    # positive for anti-clockwise and negative for clockwise
    rotation_10_degree_img = cv2.getRotationMatrix2D((width/2, height/2), 10, 1) # 중심점, 각도, 배율
    dst = cv2.warpAffine(data_img, rotation_10_degree_img, (width,height))
    padded_dst = zero_padding(dst, 224)
    return padded_dst

In [20]:
def rotation_minus_10_degree(data_img):
    height, width = data_img.shape
    # positive for anti-clockwise and negative for clockwise
    rotation_10_degree_img = cv2.getRotationMatrix2D((width/2, height/2), -10, 1) # 중심점, 각도, 배율
    dst = cv2.warpAffine(data_img, rotation_10_degree_img, (width,height))
    padded_dst = zero_padding(dst, 224)
    return dst

In [21]:
def translate(data_img):
    padded_img = zero_padding(data_img, 224)
    x_translate = random.randrange(-20, 21)
    y_translate = random.randrange(-20, 21)
    translate_matrix = np.float32([[1, 0, y_translate],  # 세로 평행이동
                                   [0, 1, x_translate]]) # 가로 평행이동 
    dst = cv2.warpAffine(padded_img, translate_matrix, (224,224))  # (src, matrix, dsize)
    return dst

In [31]:
def filping(data_img):
    padded_img = zero_padding(data_img, 224)
    dst = cv2.flip(padded_img, random.choice([0, 1])) # 0:상하 반전, 1:좌우 반전 
    return dst

In [22]:
def shearing(data_img):    
    x_shearing = random.random()
    y_shearing = random.random()
    shearing_matrix = np.float32([[1,          x_shearing, 0],
             	                  [y_shearing, 1,          0],
            	                  [0,          0  ,        1]])           
    dst = cv2.warpPerspective(data_img, shearing_matrix, (224, 224)) # warpPerspective(src, matrix, dsize~(width, height))
    return dst

In [59]:
def resizing(data_img):
    scale_list = [0.2, 0.5, 0.7, 1.2, 1.3, 1.5]
    dst = cv2.resize(data_img, dsize=(0,0), fx=random.choice(scale_list), fy=random.choice(scale_list), interpolation=cv2.INTER_LINEAR)    
    padded_img = zero_padding(dst, 224)
    return padded_img

### Data-Augmentaion - 클래스별로 10,000개 / 논문 Augmentation 기법 적용
- 10도 회전 : 20%
- 좌우 대칭 및 너비 이동(horizontal flipping and width shift) : 20%
- 높이 이동(height shfit) : 15%
- 전단 범위(shearing range) : 10%
- 채널이동 및 확대 축소(channel shift and zooming) : 10%
- 75%밖에 안 되는 듯 : 증량 비율 수정

### wm811k_new_train_class_XXX 만들어서 concat하기
- CNN-WDI input shape 244x244에 맞게 zero-padding 줘야함

In [24]:
# None(8): 36,730 중 10,000 개 Sampling
# 참고 : https://rfriend.tistory.com/602
wm811k_new_train_class_None = wm811k_train.query("failureNum == 8").sample(n=10000, random_state=2022)
wm811k_new_train_class_None.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 10000 entries, 749336 to 244652
Data columns (total 8 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   waferMap        10000 non-null  object 
 1   dieSize         10000 non-null  float64
 2   lotName         10000 non-null  object 
 3   trianTestLabel  10000 non-null  object 
 4   failureType     10000 non-null  object 
 5   waferMapDim     10000 non-null  object 
 6   failureNum      10000 non-null  object 
 7   trainTestNum    10000 non-null  object 
dtypes: float64(1), object(7)
memory usage: 703.1+ KB


In [25]:
# Edge-Ring(3) 현 보유 8,554개 / 1,446개 augmentation 필요
wm811k_new_train_class_Edge_Ring = wm811k_train.query("failureNum == 3")
wm811k_new_train_class_Edge_Ring["waferMap_augmentation"] = wm811k_new_train_class_Edge_Ring['waferMap']
wm811k_new_train_class_Edge_Ring_10['waferMap_augmentation_Dim']=wm811k_new_train_class_Edge_Ring_10['waferMap_augmentation'].apply(find_dim)
wm811k_new_train_class_Edge_Ring.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 8554 entries, 100 to 786313
Data columns (total 8 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   waferMap        8554 non-null   object 
 1   dieSize         8554 non-null   float64
 2   lotName         8554 non-null   object 
 3   trianTestLabel  8554 non-null   object 
 4   failureType     8554 non-null   object 
 5   waferMapDim     8554 non-null   object 
 6   failureNum      8554 non-null   object 
 7   trainTestNum    8554 non-null   object 
dtypes: float64(1), object(7)
memory usage: 601.5+ KB


In [27]:
# +10도 회전 : 10% - 144개
wm811k_new_train_class_Edge_Ring_10 = wm811k_new_train_class_Edge_Ring.sample(n=144, random_state=2022)
wm811k_new_train_class_Edge_Ring_10["waferMap_augmentation"] = wm811k_new_train_class_Edge_Ring_10["waferMap"].apply(lambda x: rotation_10_degree(x))
wm811k_new_train_class_Edge_Ring_10['waferMap_augmentation_Dim']=wm811k_new_train_class_Edge_Ring_10['waferMap_augmentation'].apply(find_dim)
wm811k_new_train_class_Edge_Ring_10.head()

Unnamed: 0,waferMap,dieSize,lotName,trianTestLabel,failureType,waferMapDim,failureNum,trainTestNum,waferMap_augmentation,waferMap_augmentation_Dim
359915,"[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...",2126.0,lot21560,[[Training]],[[Edge-Ring]],"(53, 52)",3,0,"[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...","(224, 224)"
216064,"[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...",1291.0,lot13709,[[Training]],[[Edge-Ring]],"(41, 41)",3,0,"[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...","(224, 224)"
186650,"[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...",3036.0,lot11856,[[Training]],[[Edge-Ring]],"(63, 62)",3,0,"[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...","(224, 224)"
228243,"[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...",2085.0,lot14325,[[Training]],[[Edge-Ring]],"(55, 48)",3,0,"[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...","(224, 224)"
228321,"[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...",2072.0,lot14328,[[Training]],[[Edge-Ring]],"(56, 48)",3,0,"[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...","(224, 224)"


In [28]:
# -10도 회전 : 10% - 144개
wm811k_new_train_class_Edge_Ring_minus_10 = wm811k_new_train_class_Edge_Ring.sample(n=144, random_state=2022)
wm811k_new_train_class_Edge_Ring_minus_10["waferMap_augmentation"] = wm811k_new_train_class_Edge_Ring_minus_10["waferMap"].apply(lambda x: rotation_minus_10_degree(x))
wm811k_new_train_class_Edge_Ring_minus_10['waferMap_augmentation_Dim']=wm811k_new_train_class_Edge_Ring_minus_10['waferMap_augmentation'].apply(find_dim)
wm811k_new_train_class_Edge_Ring_minus_10.head()

Unnamed: 0,waferMap,dieSize,lotName,trianTestLabel,failureType,waferMapDim,failureNum,trainTestNum,waferMap_augmentation,waferMap_augmentation_Dim
359915,"[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...",2126.0,lot21560,[[Training]],[[Edge-Ring]],"(53, 52)",3,0,"[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...","(53, 52)"
216064,"[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...",1291.0,lot13709,[[Training]],[[Edge-Ring]],"(41, 41)",3,0,"[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...","(41, 41)"
186650,"[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...",3036.0,lot11856,[[Training]],[[Edge-Ring]],"(63, 62)",3,0,"[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...","(63, 62)"
228243,"[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...",2085.0,lot14325,[[Training]],[[Edge-Ring]],"(55, 48)",3,0,"[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...","(55, 48)"
228321,"[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...",2072.0,lot14328,[[Training]],[[Edge-Ring]],"(56, 48)",3,0,"[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...","(56, 48)"


In [32]:
# 좌우 대칭 : 20%  - 288개
wm811k_new_train_class_Edge_Ring_flip = wm811k_new_train_class_Edge_Ring.sample(n=288, random_state=2022)
wm811k_new_train_class_Edge_Ring_flip["waferMap_augmentation"] = wm811k_new_train_class_Edge_Ring_flip["waferMap"].apply(lambda x: filping(x))
wm811k_new_train_class_Edge_Ring_flip['waferMap_augmentation_Dim']=wm811k_new_train_class_Edge_Ring_flip['waferMap_augmentation'].apply(find_dim)
wm811k_new_train_class_Edge_Ring_flip.head()

Unnamed: 0,waferMap,dieSize,lotName,trianTestLabel,failureType,waferMapDim,failureNum,trainTestNum,waferMap_augmentation,waferMap_augmentation_Dim
359915,"[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...",2126.0,lot21560,[[Training]],[[Edge-Ring]],"(53, 52)",3,0,"[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...","(224, 224)"
216064,"[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...",1291.0,lot13709,[[Training]],[[Edge-Ring]],"(41, 41)",3,0,"[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...","(224, 224)"
186650,"[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...",3036.0,lot11856,[[Training]],[[Edge-Ring]],"(63, 62)",3,0,"[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...","(224, 224)"
228243,"[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...",2085.0,lot14325,[[Training]],[[Edge-Ring]],"(55, 48)",3,0,"[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...","(224, 224)"
228321,"[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...",2072.0,lot14328,[[Training]],[[Edge-Ring]],"(56, 48)",3,0,"[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...","(224, 224)"


In [33]:
# 평행 이동 : 30% - 432개
wm811k_new_train_class_Edge_Ring_translate = wm811k_new_train_class_Edge_Ring.sample(n=432, random_state=2022)
wm811k_new_train_class_Edge_Ring_translate["waferMap_augmentation"] = wm811k_new_train_class_Edge_Ring_translate["waferMap"].apply(lambda x: translate(x))
wm811k_new_train_class_Edge_Ring_translate['waferMap_augmentation_Dim']=wm811k_new_train_class_Edge_Ring_translate['waferMap_augmentation'].apply(find_dim)
wm811k_new_train_class_Edge_Ring_translate.head()

Unnamed: 0,waferMap,dieSize,lotName,trianTestLabel,failureType,waferMapDim,failureNum,trainTestNum,waferMap_augmentation,waferMap_augmentation_Dim
359915,"[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...",2126.0,lot21560,[[Training]],[[Edge-Ring]],"(53, 52)",3,0,"[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...","(224, 224)"
216064,"[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...",1291.0,lot13709,[[Training]],[[Edge-Ring]],"(41, 41)",3,0,"[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...","(224, 224)"
186650,"[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...",3036.0,lot11856,[[Training]],[[Edge-Ring]],"(63, 62)",3,0,"[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...","(224, 224)"
228243,"[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...",2085.0,lot14325,[[Training]],[[Edge-Ring]],"(55, 48)",3,0,"[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...","(224, 224)"
228321,"[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...",2072.0,lot14328,[[Training]],[[Edge-Ring]],"(56, 48)",3,0,"[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...","(224, 224)"


In [34]:
# 전단 범위(shearing range) : 10% - 144개
wm811k_new_train_class_Edge_Ring_shearing = wm811k_new_train_class_Edge_Ring.sample(n=144, random_state=2022)
wm811k_new_train_class_Edge_Ring_shearing["waferMap_augmentation"] = wm811k_new_train_class_Edge_Ring_shearing["waferMap"].apply(lambda x: shearing(x))
wm811k_new_train_class_Edge_Ring_shearing['waferMap_augmentation_Dim']=wm811k_new_train_class_Edge_Ring_shearing['waferMap_augmentation'].apply(find_dim)
wm811k_new_train_class_Edge_Ring_shearing.head()

Unnamed: 0,waferMap,dieSize,lotName,trianTestLabel,failureType,waferMapDim,failureNum,trainTestNum,waferMap_augmentation,waferMap_augmentation_Dim
359915,"[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...",2126.0,lot21560,[[Training]],[[Edge-Ring]],"(53, 52)",3,0,"[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...","(224, 224)"
216064,"[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...",1291.0,lot13709,[[Training]],[[Edge-Ring]],"(41, 41)",3,0,"[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...","(224, 224)"
186650,"[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...",3036.0,lot11856,[[Training]],[[Edge-Ring]],"(63, 62)",3,0,"[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...","(224, 224)"
228243,"[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...",2085.0,lot14325,[[Training]],[[Edge-Ring]],"(55, 48)",3,0,"[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...","(224, 224)"
228321,"[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...",2072.0,lot14328,[[Training]],[[Edge-Ring]],"(56, 48)",3,0,"[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...","(224, 224)"


In [60]:
# 확대 : 20% - 294개
wm811k_new_train_class_Edge_Ring_resize = wm811k_new_train_class_Edge_Ring.sample(n=294, random_state=2022)
wm811k_new_train_class_Edge_Ring_resize["waferMap_augmentation"] = wm811k_new_train_class_Edge_Ring_resize["waferMap"].apply(lambda x: resizing(x))
wm811k_new_train_class_Edge_Ring_resize['waferMap_augmentation_Dim']=wm811k_new_train_class_Edge_Ring_resize['waferMap_augmentation'].apply(find_dim)
wm811k_new_train_class_Edge_Ring_resize.head()

Unnamed: 0,waferMap,dieSize,lotName,trianTestLabel,failureType,waferMapDim,failureNum,trainTestNum,waferMap_augmentation,waferMap_augmentation_Dim
359915,"[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...",2126.0,lot21560,[[Training]],[[Edge-Ring]],"(53, 52)",3,0,"[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...","(224, 224)"
216064,"[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...",1291.0,lot13709,[[Training]],[[Edge-Ring]],"(41, 41)",3,0,"[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...","(224, 224)"
186650,"[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...",3036.0,lot11856,[[Training]],[[Edge-Ring]],"(63, 62)",3,0,"[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...","(224, 224)"
228243,"[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...",2085.0,lot14325,[[Training]],[[Edge-Ring]],"(55, 48)",3,0,"[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...","(224, 224)"
228321,"[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...",2072.0,lot14328,[[Training]],[[Edge-Ring]],"(56, 48)",3,0,"[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...","(224, 224)"


In [61]:
# concat
wm811k_new_train_class_Edge_Ring_augmentation = pd.concat([wm811k_new_train_class_Edge_Ring,
                                                           wm811k_new_train_class_Edge_Ring_10,
                                                           wm811k_new_train_class_Edge_Ring_minus_10,
                                                           wm811k_new_train_class_Edge_Ring_flip,
                                                           wm811k_new_train_class_Edge_Ring_translate,
                                                           wm811k_new_train_class_Edge_Ring_shearing,
                                                           wm811k_new_train_class_Edge_Ring_resize
                                                          ])
wm811k_new_train_class_Edge_Ring_augmentation.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 10000 entries, 100 to 199660
Data columns (total 10 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   waferMap                   10000 non-null  object 
 1   dieSize                    10000 non-null  float64
 2   lotName                    10000 non-null  object 
 3   trianTestLabel             10000 non-null  object 
 4   failureType                10000 non-null  object 
 5   waferMapDim                10000 non-null  object 
 6   failureNum                 10000 non-null  object 
 7   trainTestNum               10000 non-null  object 
 8   waferMap_augmentation      1446 non-null   object 
 9   waferMap_augmentation_Dim  1446 non-null   object 
dtypes: float64(1), object(9)
memory usage: 859.4+ KB


In [63]:
# Center(0) 현 보유 3,462개 / 6,538개 augmentation 필요
wm811k_new_train_class_Center = wm811k_train.query("failureNum == 0")
wm811k_new_train_class_Center["waferMap_augmentation"] = wm811k_new_train_class_Center['waferMap']
wm811k_new_train_class_Center['waferMap_augmentation_Dim']=wm811k_new_train_class_Center['waferMap_augmentation'].apply(find_dim)
wm811k_new_train_class_Center.info()

# +10도 회전 : 10% - 654개
wm811k_new_train_class_Center_10 = wm811k_new_train_class_Center.sample(n=654, random_state=2022)
wm811k_new_train_class_Center_10["waferMap_augmentation"] = wm811k_new_train_class_Center_10["waferMap"].apply(lambda x: rotation_10_degree(x))
wm811k_new_train_class_Center_10['waferMap_augmentation_Dim']=wm811k_new_train_class_Center_10['waferMap_augmentation'].apply(find_dim)

# -10도 회전 : 10% - 654개
wm811k_new_train_class_Center_minus_10 = wm811k_new_train_class_Center.sample(n=654, random_state=2022)
wm811k_new_train_class_Center_minus_10["waferMap_augmentation"] = wm811k_new_train_class_Center_minus_10["waferMap"].apply(lambda x: rotation_minus_10_degree(x))
wm811k_new_train_class_Center_minus_10['waferMap_augmentation_Dim']=wm811k_new_train_class_Center_minus_10['waferMap_augmentation'].apply(find_dim)

# 좌우 대칭 : 20%  - 1308개
wm811k_new_train_class_Center_flip = wm811k_new_train_class_Center.sample(n=1308, random_state=2022)
wm811k_new_train_class_Center_flip["waferMap_augmentation"] = wm811k_new_train_class_Center_flip["waferMap"].apply(lambda x: filping(x))
wm811k_new_train_class_Center_flip['waferMap_augmentation_Dim']=wm811k_new_train_class_Center_flip['waferMap_augmentation'].apply(find_dim)

# 평행 이동 : 30%  - 1962개
wm811k_new_train_class_Center_translate = wm811k_new_train_class_Center.sample(n=1962, random_state=2022)
wm811k_new_train_class_Center_translate["waferMap_augmentation"] = wm811k_new_train_class_Center_translate["waferMap"].apply(lambda x: translate(x))
wm811k_new_train_class_Center_translate['waferMap_augmentation_Dim']=wm811k_new_train_class_Center_translate['waferMap_augmentation'].apply(find_dim)

# 전단 범위(shearing range) : 10% - 652개
wm811k_new_train_class_Center_shearing = wm811k_new_train_class_Center.sample(n=652, random_state=2022)
wm811k_new_train_class_Center_shearing["waferMap_augmentation"] = wm811k_new_train_class_Center_shearing["waferMap"].apply(lambda x: shearing(x))
wm811k_new_train_class_Center_shearing['waferMap_augmentation_Dim']=wm811k_new_train_class_Center_shearing['waferMap_augmentation'].apply(find_dim)

# 확대 : 20% - 1308개
wm811k_new_train_class_Center_resize = wm811k_new_train_class_Center.sample(n=1308, random_state=2022)
wm811k_new_train_class_Center_resize["waferMap_augmentation"] = wm811k_new_train_class_Center_resize["waferMap"].apply(lambda x: resizing(x))
wm811k_new_train_class_Center_resize['waferMap_augmentation_Dim']=wm811k_new_train_class_Center_resize['waferMap_augmentation'].apply(find_dim)

# concat
wm811k_new_train_class_Center_augmentation = pd.concat([wm811k_new_train_class_Center,
                                                        wm811k_new_train_class_Center_10,
                                                        wm811k_new_train_class_Center_minus_10,
                                                        wm811k_new_train_class_Center_flip,
                                                        wm811k_new_train_class_Center_translate,
                                                        wm811k_new_train_class_Center_shearing,
                                                        wm811k_new_train_class_Center_resize
                                                       ])
wm811k_new_train_class_Center_augmentation.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 3462 entries, 44 to 785245
Data columns (total 8 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   waferMap        3462 non-null   object 
 1   dieSize         3462 non-null   float64
 2   lotName         3462 non-null   object 
 3   trianTestLabel  3462 non-null   object 
 4   failureType     3462 non-null   object 
 5   waferMapDim     3462 non-null   object 
 6   failureNum      3462 non-null   object 
 7   trainTestNum    3462 non-null   object 
dtypes: float64(1), object(7)
memory usage: 243.4+ KB
<class 'pandas.core.frame.DataFrame'>
Int64Index: 10000 entries, 44 to 360911
Data columns (total 10 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   waferMap                   10000 non-null  object 
 1   dieSize                    10000 non-null  float64
 2   lotName                    10000 non-null  obje

In [74]:
# Edge-Loc(2) 현 보유 2,417개 / 7,583개 augmentation 필요
wm811k_new_train_class_Edge_Loc = wm811k_train.query("failureNum == 2")
wm811k_new_train_class_Edge_Loc["waferMap_augmentation"] = wm811k_new_train_class_Edge_Loc['waferMap']
wm811k_new_train_class_Edge_Loc['waferMap_augmentation_Dim']=wm811k_new_train_class_Edge_Loc['waferMap_augmentation'].apply(find_dim)
wm811k_new_train_class_Edge_Loc.info()

# +10도 회전 : 10% - 758개
wm811k_new_train_class_Edge_Loc_10 = wm811k_new_train_class_Edge_Loc.sample(n=758, random_state=2022)
wm811k_new_train_class_Edge_Loc_10["waferMap_augmentation"] = wm811k_new_train_class_Edge_Loc_10["waferMap"].apply(lambda x: rotation_10_degree(x))
wm811k_new_train_class_Edge_Loc_10['waferMap_augmentation_Dim']=wm811k_new_train_class_Edge_Loc_10['waferMap_augmentation'].apply(find_dim)

# -10도 회전 : 10% - 758개
wm811k_new_train_class_Edge_Loc_minus_10 = wm811k_new_train_class_Edge_Loc.sample(n=758, random_state=2022)
wm811k_new_train_class_Edge_Loc_minus_10["waferMap_augmentation"] = wm811k_new_train_class_Edge_Loc_minus_10["waferMap"].apply(lambda x: rotation_minus_10_degree(x))
wm811k_new_train_class_Edge_Loc_minus_10['waferMap_augmentation_Dim']=wm811k_new_train_class_Edge_Loc_minus_10['waferMap_augmentation'].apply(find_dim)

# 좌우 대칭 : 20% - 1517개
wm811k_new_train_class_Edge_Loc_flip = wm811k_new_train_class_Edge_Loc.sample(n=1517, random_state=2022)
wm811k_new_train_class_Edge_Loc_flip["waferMap_augmentation"] = wm811k_new_train_class_Edge_Loc_flip["waferMap"].apply(lambda x: filping(x))
wm811k_new_train_class_Edge_Loc_flip['waferMap_augmentation_Dim']=wm811k_new_train_class_Edge_Loc_flip['waferMap_augmentation'].apply(find_dim)

# 평행 이동 : 30% - 2275개
wm811k_new_train_class_Edge_Loc_translate = wm811k_new_train_class_Edge_Loc.sample(n=2275, random_state=2022)
wm811k_new_train_class_Edge_Loc_translate["waferMap_augmentation"] = wm811k_new_train_class_Edge_Loc_translate["waferMap"].apply(lambda x: translate(x))
wm811k_new_train_class_Edge_Loc_translate['waferMap_augmentation_Dim']=wm811k_new_train_class_Edge_Loc_translate['waferMap_augmentation'].apply(find_dim)

# 전단 범위(shearing range) : 10% - 758개
wm811k_new_train_class_Edge_Loc_shearing = wm811k_new_train_class_Edge_Loc.sample(n=758, random_state=2022)
wm811k_new_train_class_Edge_Loc_shearing["waferMap_augmentation"] = wm811k_new_train_class_Edge_Loc_shearing["waferMap"].apply(lambda x: shearing(x))
wm811k_new_train_class_Edge_Loc_shearing['waferMap_augmentation_Dim']=wm811k_new_train_class_Edge_Loc_shearing['waferMap_augmentation'].apply(find_dim)

# 확대 : 20% - 1517개
wm811k_new_train_class_Edge_Loc_resize = wm811k_new_train_class_Edge_Loc.sample(n=1517, random_state=2022)
wm811k_new_train_class_Edge_Loc_resize["waferMap_augmentation"] = wm811k_new_train_class_Edge_Loc_resize["waferMap"].apply(lambda x: resizing(x))
wm811k_new_train_class_Edge_Loc_resize['waferMap_augmentation_Dim']=wm811k_new_train_class_Edge_Loc_resize['waferMap_augmentation'].apply(find_dim)

# concat
wm811k_new_train_class_Edge_Loc_augmentation = pd.concat([wm811k_new_train_class_Edge_Loc,
                                                          wm811k_new_train_class_Edge_Loc_10,
                                                          wm811k_new_train_class_Edge_Loc_minus_10,
                                                          wm811k_new_train_class_Edge_Loc_flip,
                                                          wm811k_new_train_class_Edge_Loc_translate,
                                                          wm811k_new_train_class_Edge_Loc_shearing,
                                                          wm811k_new_train_class_Edge_Loc_resize
                                                         ])
wm811k_new_train_class_Edge_Loc_augmentation.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 2417 entries, 36 to 791230
Data columns (total 8 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   waferMap        2417 non-null   object 
 1   dieSize         2417 non-null   float64
 2   lotName         2417 non-null   object 
 3   trianTestLabel  2417 non-null   object 
 4   failureType     2417 non-null   object 
 5   waferMapDim     2417 non-null   object 
 6   failureNum      2417 non-null   object 
 7   trainTestNum    2417 non-null   object 
dtypes: float64(1), object(7)
memory usage: 169.9+ KB
<class 'pandas.core.frame.DataFrame'>
Int64Index: 10000 entries, 36 to 417018
Data columns (total 10 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   waferMap                   10000 non-null  object 
 1   dieSize                    10000 non-null  float64
 2   lotName                    10000 non-null  obje

In [73]:
wm811k_new_train_class_Edge_Loc_resize.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1517 entries, 265906 to 417018
Data columns (total 10 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   waferMap                   1517 non-null   object 
 1   dieSize                    1517 non-null   float64
 2   lotName                    1517 non-null   object 
 3   trianTestLabel             1517 non-null   object 
 4   failureType                1517 non-null   object 
 5   waferMapDim                1517 non-null   object 
 6   failureNum                 1517 non-null   object 
 7   trainTestNum               1517 non-null   object 
 8   waferMap_augmentation      1517 non-null   object 
 9   waferMap_augmentation_Dim  1517 non-null   object 
dtypes: float64(1), object(9)
memory usage: 130.4+ KB


In [None]:
# Loc(4) 현 보유 1,620개 / 8,380개 augmentation 필요
# +10도 회전 : 10% - 838개
# -10도 회전 : 10% - 838개
# 좌우 대칭 : 20%  - "1676개"
# 너비 이동 : 20%  - "1676개"
# 높이 이동: 20%   - "1676개"
# 전단 범위(shearing range) : 10% - 838개
# 확대 : 10% - 838개

In [None]:
wm811k_new_train_class_Loc = wm811k_train.query("failureNum == 4")
wm811k_new_train_class_Loc.info()

In [None]:
# Random(5) 현 보유 609개 / 9,391개 augmentation 필요
# +10도 회전 : 10%
# -10도 회전 : 10%
# 좌우 대칭 : 20% 
# 너비 이동 : 20%
# 높이 이동: 20%
# 전단 범위(shearing range) : 10%
# 확대 : 10%

In [None]:
wm811k_new_train_class_Random = wm811k_train.query("failureNum == 5")
wm811k_new_train_class_Random.info()

In [None]:
# Scratch(6) 현 보유 500개 / 9,500개 augmentation 필요
# +10도 회전 : 10%
# -10도 회전 : 10%
# 좌우 대칭 : 20% 
# 너비 이동 : 20%
# 높이 이동: 20%
# 전단 범위(shearing range) : 10%
# 확대 : 10%

In [None]:
wm811k_new_train_class_Scratch = wm811k_train.query("failureNum == 6")
wm811k_new_train_class_Scratch.info()

In [None]:
# Donut(1) 현 보유 409개 / 9,591개 augmentation 필요
# +10도 회전 : 10%
# -10도 회전 : 10%
# 좌우 대칭 : 20% 
# 너비 이동 : 20%
# 높이 이동: 20%
# 전단 범위(shearing range) : 10%
# 확대 : 10%

In [None]:
wm811k_new_train_class_Donut = wm811k_train.query("failureNum == 1")
wm811k_new_train_class_Donut.info()

In [None]:
# Near-full(7) 현 보유 54개 / 9,946개 augmentation 필요
# +10도 회전 : 10%
# -10도 회전 : 10%
# 좌우 대칭 : 20% 
# 너비 이동 : 20%
# 높이 이동: 20%
# 전단 범위(shearing range) : 10%
# 확대 : 10%

In [None]:
wm811k_new_train_class_Near_Full = wm811k_train.query("failureNum == 7")
wm811k_new_train_class_Near_Full.info()

- augmentation 데이터 파일 저장
    - pkl 파일 저장 참고 : https://seing.tistory.com/95

## CNN-WDI Modeling

In [None]:
# CNN-WDI model
# layer별 padding option check ~ valid, same
input_shape = (224,224,1) # 학습 data shape 확인
class_num = 9
KERNEL_SIZE = 3

input_layer = layers.Input(shape=input_shape) # Input 224x224
x = layers.Conv2D(16, kernel_size=KERNEL_SIZE, padding='valid')(input_layer) # 16, 3x3
x = layers.Activation(activation='relu')(x) # ReLU
x = layers.BatchNormalization()(x) # BatchNormalization
x = layers.MaxPool2D(pool_size=(2,2))(x) # Max-pooling
x = layers.Conv2D(16, kernel_size=KERNEL_SIZE, padding='same')(x) # 16, 3x3
x = layers.Activation(activation='relu')(x) # ReLU
x = layers.BatchNormalization()(x) # BatchNormalization

x = layers.Conv2D(32, kernel_size=KERNEL_SIZE, padding='same')(x) # 32, 3x3
x = layers.Activation(activation='relu')(x) # ReLU
x = layers.BatchNormalization()(x) # BatchNormalization
x = layers.MaxPool2D(pool_size=(2,2))(x) # Max-pooling
x = layers.Conv2D(32, kernel_size=KERNEL_SIZE, padding='same')(x) # 32, 3x3
x = layers.Activation(activation='relu')(x) # ReLU
x = layers.BatchNormalization()(x) # BatchNormalization

x = layers.Conv2D(64, kernel_size=KERNEL_SIZE, padding='same')(x) # 64, 3x3
x = layers.Activation(activation='relu')(x) # ReLU
x = layers.BatchNormalization()(x) # BatchNormalization
x = layers.MaxPool2D(pool_size=(2,2))(x) # Max-pooling
x = layers.Conv2D(64, kernel_size=KERNEL_SIZE, padding='same')(x) # 64, 3x3
x = layers.Activation(activation='relu')(x) # ReLU
x = layers.BatchNormalization()(x) # BatchNormalization

x = layers.Conv2D(128, kernel_size=KERNEL_SIZE, padding='same')(x) # 128, 3x3
x = layers.Activation(activation='relu')(x) # ReLU
x = layers.BatchNormalization()(x) # BatchNormalization
x = layers.MaxPool2D(pool_size=(2,2))(x) # Max-pooling
x = layers.Conv2D(128, kernel_size=KERNEL_SIZE, padding='same')(x) # 128, 3x3
x = layers.Activation(activation='relu')(x) # ReLU
x = layers.BatchNormalization()(x) # BatchNormalization

x = layers.SpatialDropout2D(0.2)(x) # Spartial Dropout 0.2
x = layers.MaxPool2D(pool_size=(2,2))(x) # Max-pooling
x = layers.Flatten()(x) # 4608 확인
x = layers.Dense(512, activation='relu')(x)
output_layer = layers.Dense(class_num, activation='softmax')(x)

CNN_WDI = Model(input_layer, output_layer)

In [None]:
CNN_WDI.summary()

In [None]:
# learning_rate 설정 ~ compile? fit?
# X, y 설정
# loss_function, metrics 설정 확인
# checkpointer patience 값 확인

# train / validation / test set 설정
# X_train, X_valid = train_test_split(df, test_size=TEST_SIZE, random_state=2021)

# model 설정
LEARNING_RATE = 0.001
BATCH_SIZE = 100
EPOCH = 20  # 논문 그래프에 20 epoch 까지 있음

CNN_WDI.compile(loss='categorical_crossentropy', optimizer=optimizers.Adam(LEARNING_RATE), metrics=['categorical_crossentropy'], learning_rate = LEARNING_RATE)
checkpointer = ModelCheckpoint(filepath="{0}.h5".format(model_id), verbose=1, save_best_only=True)

# 학습 실행
hist = CNN_WDI.fit(X_train, X_test, epochs=EPOCH, batch_size=BATCH_SIZE, shuffle=True,
                   validation_data=(X_valid, X_valid), callbacks=[checkpointer])

In [None]:
# predict 시에도 test 데이터 padding 하는 과정 필요