# 반도체 박막 두께 분석 경진대회

## 배경
    최근 고사양 반도체 수요가 많아지면서 반도체를 수직으로 적층하는 3차원 공정이 많이 연구되고 있습니다. 반도체 박막을 수십 ~ 수백 층 쌓아 올리는 공정에서는 박막의 결함으로 인한 두께와 균일도가 저하되는 문제가 있습니다. 이는 소자 구조의 변형을 야기하며 성능 하락의 주요 요인이 됩니다. 이를 사전에 방지하기 위해서는 박막의 두께를 빠르면서도 정확히 측정하는 것이 중요합니다. 

    박막의 두께를 측정하기 위해 광스펙트럼 분석이 널리 사용되고 있습니다. 하지만 광 스펙트럼을 분석하기 위해서는 관련 지식을 많이 가진 전문가가 필요하며 분석과정에 많은 컴퓨팅자원이 필요합니다. 빅데이터 분석을 통해 이를 해결하고자 반도체 소자의 두께 분석 알고리즘 경진대회를 개최합니다. 

    평가: 본 대회에서는 mean absolute error (MAE) 로 제출 파일을 평가합니다.
    a) 가채점 순위 : 대회 중 test 데이터의 50% 로 채점합니다.
    b) 최종 순위 : 가채점에서 사용하지 않은 나머지 test 데이터로 채점합니다. 대회 종료 후에 공개됩니다.
    최종순위는 참가자가 선택한 파일로 채점되므로, 참가자는 자신이 채점 받고 싶은 제출 파일을 최종적으로 선택해야 합니다.

## 데이터
    배경 자료
    반도체 박막은 얇은 반도체 막으로 박막의 종류와 두께는 반도체 소자의 특성을 결정짓는 중요한 요소 중 하나입니다. 박막의 두께를 측정하는 방법으로 반사율 측정이 널리 사용되며 반사율은 입사광 세기에 대한 반사광 세기의 비율로 정해집니다. (반사율 = 반사광/입사광) 반사율은 빛의 파장에 따라 변하며 파장에 따른 반사율의 분포를 반사율 스펙트럼이라고 합니다. 

    구조 설명
    이번 대회에서 분석할 소자는 질화규소(layer_1)/이산화규소(layer_2)/질화규소(layer_3)/이산화규소(layer_4)/규소(기판) 총 5층 구조로 되어 있습니다. 대회의 목적은 기판인 규소를 제외한 layer_1 ~ layer_4의 두께를 예측하는 것으로 train.csv 파일에는 각 층의 두께와 반사율 스펙트럼이 포함되어 있습니다. 
    
    데이터 설명
    train.csv 파일에는 4층 박막의 두께와 파장에 따른 반사율 스펙트럼이 주어집니다. 헤더의 이름에 따라 layer_1 ~ 4는 해당 박막의 두께, 0~225은 빛의 파장에 해당하는 반사율이 됩니다. 헤더 이름인 0~225은 파장을 뜻하며 비식별화 처리가 되어있어 실제 값과는 다릅니다.
    
    train.csv
    layer_1~4: 해당 박막의 두께
    0~225: 반사율 스펙트럼, 빛의 파장은 비식별화 되어 제공됩니다.
    
    test.csv
    id: 스펙트럼의 아이디
    0~225: 반사율 스펙트럼, 빛의 파장은 비식별화 되어 제공됩니다.
    
    sample_submission.csv
    id: 스펙트럼의 아이디
    layer_1~4: test.csv 파일에 없는 항목인 layer_1~4을 예측하여 제출

In [10]:
import numpy as np
import pandas as pd
import warnings
warnings.filterwarnings(action='ignore')

train = pd.read_csv('D:\\test1\\train.csv')
test = pd.read_csv('D:\\test1\\test.csv')

train.head()

Unnamed: 0,layer_1,layer_2,layer_3,layer_4,0,1,2,3,4,5,...,216,217,218,219,220,221,222,223,224,225
0,10,10,10,10,0.254551,0.258823,0.254659,0.252085,0.247678,0.253614,...,0.35475,0.369223,0.388184,0.408496,0.414564,0.429403,0.419225,0.44325,0.433414,0.465502
1,10,10,10,20,0.205062,0.225544,0.217758,0.202169,0.199633,0.20738,...,0.557203,0.573656,0.587998,0.612754,0.627825,0.633393,0.637706,0.625981,0.653231,0.637853
2,10,10,10,30,0.189196,0.165869,0.177655,0.156822,0.175094,0.177755,...,0.699864,0.708688,0.721982,0.713464,0.74303,0.741709,0.747743,0.746037,0.737356,0.750391
3,10,10,10,40,0.131003,0.120076,0.138975,0.117931,0.130566,0.131262,...,0.764786,0.763788,0.770017,0.787571,0.778866,0.776969,0.774712,0.801526,0.805305,0.784057
4,10,10,10,50,0.091033,0.086893,0.108125,0.080405,0.105917,0.077083,...,0.786677,0.802271,0.806557,0.799614,0.789333,0.804087,0.787763,0.794948,0.819105,0.801781


In [8]:
train.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 810000 entries, 0 to 809999
Columns: 230 entries, layer_1 to 225
dtypes: float64(226), int64(4)
memory usage: 1.4 GB


In [9]:
train.describe()

Unnamed: 0,layer_1,layer_2,layer_3,layer_4,0,1,2,3,4,5,...,216,217,218,219,220,221,222,223,224,225
count,810000.0,810000.0,810000.0,810000.0,810000.0,810000.0,810000.0,810000.0,810000.0,810000.0,...,810000.0,810000.0,810000.0,810000.0,810000.0,810000.0,810000.0,810000.0,810000.0,810000.0
mean,155.0,155.0,155.0,155.0,0.292653,0.292893,0.293125,0.293363,0.293666,0.293994,...,0.600336,0.606206,0.612238,0.618456,0.623942,0.625395,0.6271,0.628997,0.631166,0.633594
std,86.554468,86.554468,86.554468,86.554468,0.181642,0.181857,0.182055,0.182197,0.182361,0.182529,...,0.199727,0.198644,0.197473,0.196177,0.195028,0.194909,0.19473,0.194493,0.194146,0.193725
min,10.0,10.0,10.0,10.0,-0.014902,-0.014798,-0.014897,-0.014709,-0.014903,-0.014662,...,-0.011992,-0.008661,-0.01143,-0.009827,-0.007632,-0.007411,-0.007073,-0.007101,-0.005519,-0.006074
25%,80.0,80.0,80.0,80.0,0.135139,0.13518,0.135258,0.135478,0.135585,0.135705,...,0.469345,0.47697,0.484727,0.492739,0.500232,0.50165,0.503811,0.506252,0.509036,0.512067
50%,155.0,155.0,155.0,155.0,0.28651,0.286874,0.287194,0.287553,0.28783,0.288151,...,0.643685,0.649886,0.656258,0.66286,0.668727,0.670287,0.672145,0.674283,0.676692,0.679339
75%,230.0,230.0,230.0,230.0,0.435696,0.435956,0.436112,0.436326,0.436634,0.437142,...,0.760737,0.765462,0.770333,0.775263,0.779555,0.780846,0.782387,0.783979,0.785774,0.787759
max,300.0,300.0,300.0,300.0,0.748205,0.753103,0.749494,0.747389,0.748827,0.750392,...,0.935423,0.934867,0.938873,0.937817,0.942214,0.940367,0.940387,0.941548,0.942411,0.943648


In [11]:
test.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Columns: 227 entries, id to 225
dtypes: float64(226), int64(1)
memory usage: 17.3 MB


In [12]:
test.describe()

Unnamed: 0,id,0,1,2,3,4,5,6,7,8,...,216,217,218,219,220,221,222,223,224,225
count,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,...,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0
mean,4999.5,0.29862,0.298523,0.298266,0.298122,0.29784,0.297781,0.29768,0.297628,0.297761,...,0.601752,0.607483,0.613193,0.618859,0.623699,0.624553,0.625722,0.627125,0.628886,0.631377
std,2886.89568,0.182323,0.182546,0.182835,0.183146,0.183721,0.183892,0.184473,0.18529,0.18556,...,0.20011,0.198799,0.198037,0.19715,0.196688,0.197329,0.197532,0.197626,0.197022,0.196373
min,0.0,-0.014062,-0.014153,-0.013073,-0.013437,-0.013738,-0.013458,-0.013132,-0.014418,-0.013239,...,-0.000326,-1.8e-05,-0.001248,-0.006506,0.007479,0.007074,0.003891,0.000749,0.010466,0.001458
25%,2499.75,0.143776,0.142043,0.141642,0.140117,0.139429,0.13831,0.138377,0.13658,0.136497,...,0.470463,0.475866,0.48642,0.493853,0.497811,0.50031,0.504392,0.502767,0.504647,0.509042
50%,4999.5,0.292133,0.291344,0.291543,0.29132,0.291687,0.291404,0.2906,0.290009,0.29285,...,0.644588,0.651759,0.65802,0.663477,0.669354,0.670082,0.672555,0.674006,0.675324,0.677706
75%,7499.25,0.440783,0.441728,0.441065,0.442691,0.443197,0.443525,0.443969,0.44548,0.446458,...,0.762421,0.767317,0.770799,0.775913,0.780854,0.78297,0.78346,0.784899,0.785543,0.787843
max,9999.0,0.738145,0.735195,0.730482,0.72711,0.739487,0.74053,0.739367,0.746795,0.739782,...,0.922739,0.92225,0.924994,0.920316,0.930631,0.928891,0.935706,0.935085,0.937052,0.940716
