# 1. 데이터셋 생성 방법

국민건강영양조사 데이터는 질병관리청에서 이용 동의를 하고 이용해야 하기 때문에 원본 데이터를 제공할 수 없습니다.
따라서, 데이터 처리 코드를 제공하니 아래와 같은 순서로 실행하세요.

1. 데이터 다운로드(2010~2020년 11년 치)
2. 데이터셋 생성

## [생성방법1] 데이터 다운로드

국민건강영양조사 사이트( https://knhanes.kdca.go.kr/knhanes/sub03/sub03_02_05.do )에서
2010년 ~ 2020년에 대한 '**기본DB**'의 '**SAS**'파일 11개를 다운로드 받습니다.

![데이터다운로드](fig1_download.png)

다운로드 받은 후에<br>
preprocessing.ipynb(주피터 코드 파일),<br>
meta_data20.xlsx(메타데이터 엑셀 파일),<br>
DB 파일 11개를 동일한 작업 폴더에 위치시킵니다.
>preprocessing.ipynb<br>
meta_data20.xlsx<br>
hn10_all.sas7bdat<br>
hn11_all.sas7bdat<br>
hn12_all.sas7bdat<br>
hn13_all.sas7bdat<br>
hn14_all.sas7bdat<br>
hn15_all.sas7bdat<br>
hn16_all.sas7bdat<br>
hn17_all.sas7bdat<br>
hn18_all.sas7bdat<br>
hn19_all.sas7bdat<br>
hn20_all.sas7bdat

## [생성방법2] 프로세싱 코드 실행

DB 파일과 프로세싱 코드 파일이 있는 작업 폴더에서
데이터셋 구축의 주피터 코드 파일(preprocessing.ipynb)을 전체 실행하면
최종적으로 nationalhealth_2010to2022.csv 파일이 생성됩니다.

# 2. 데이터셋 프로세싱 코드

구체적인 데이터셋 프로세싱 코드는 다음과 같습니다.

(나중에 데이터셋 프로세싱이 어떻게 되었는지 참고하고자 할 때, 아래 코드를 참고하면 됩니다)

* 2.1. 데이터셋 통합
* 2.2. 질병 변수 추가
* 2.3. 분석 변수 선택
* 2.4. 기타값(해당없음, 모름) 처리
* 2.5. 결측치값(nan) 데이터 제거
* 2.6. 데이터셋 파일 저장

## [프로세싱코드1] 데이터셋 통합

DB파일 11개를 읽어와서<br>
데이터셋 df로 통합합니다.

In [20]:
from IPython.display import Image
import pandas as pd
import glob
files=sorted(glob.glob("/data/national_nutrition/*_all.sas7bdat"),reverse=True)
df_merged = None
for file in files:
    year = int(file.split('/')[-1].split('_')[0][-2:])
    if not (10 <= year <= 20):
        continue
    df = pd.read_sas(file)
    if year == 20:
        dic_upper2original = dict([(colID.upper(),colID) for colID in df.columns])
    df.columns = list(map(lambda x:x.upper(), df.columns))
    if df_merged is None:
        df_merged = df
        continue
    df_merged = pd.concat([df_merged,df], axis=0, join='inner')
df_merged.columns = df_merged.columns.map(dic_upper2original)
df_merged = df_merged.reset_index(drop=True, inplace=False)
df_merged

Unnamed: 0,mod_d,ID,ID_fam,year,region,town_t,apt_t,psu,sex,age,...,N_FE,N_NA,N_K,N_CAROT,N_RETIN,N_B1,N_B2,N_NIAC,N_VITC,LF_SAFE
0,b'2022.02.28.',b'A801169401',b'A8011694',2020.0,1.0,1.0,2.0,b'A801',1.0,39.0,...,12.995169,7067.605901,3052.902340,1136.872433,442.876949,1.269509,1.811096,9.785905,47.834947,1.0
1,b'2022.02.28.',b'A801169402',b'A8011694',2020.0,1.0,1.0,2.0,b'A801',2.0,39.0,...,11.842457,2728.359541,1696.700298,646.615445,124.496672,0.849576,0.782538,6.590455,73.954247,1.0
2,b'2022.02.28.',b'A801169403',b'A8011694',2020.0,1.0,1.0,2.0,b'A801',1.0,10.0,...,8.666752,2733.815739,2311.557661,679.688212,227.509319,1.081528,1.355003,8.778320,56.426013,1.0
3,b'2022.02.28.',b'A801169404',b'A8011694',2020.0,1.0,1.0,2.0,b'A801',1.0,7.0,...,9.266191,2616.965216,1488.056137,402.774821,378.553678,0.980508,1.915447,4.749639,9.440222,1.0
4,b'2022.02.28.',b'A801169405',b'A8011694',2020.0,1.0,1.0,2.0,b'A801',1.0,4.0,...,6.579997,2073.267874,1537.678134,463.220334,164.123683,0.733760,1.398546,4.047234,64.740298,1.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
91299,b'2022.03.08',b'P311960702',b'P3119607',2010.0,16.0,1.0,1.0,b'P311',1.0,6.0,...,6.251979,1290.204722,1472.190079,182.547766,148.865131,1.211490,0.857966,10.361169,52.886191,2.0
91300,b'2022.03.08',b'P311960703',b'P3119607',2010.0,16.0,1.0,1.0,b'P311',2.0,13.0,...,,,,,,,,,,
91301,b'2022.03.08',b'P311960704',b'P3119607',2010.0,16.0,1.0,1.0,b'P311',1.0,47.0,...,,,,,,,,,,
91302,b'2022.03.08',b'P311960902',b'P3119609',2010.0,16.0,1.0,1.0,b'P311',2.0,20.0,...,15.897780,3782.087331,3136.324143,1349.265389,56.079971,1.205168,0.971106,18.673687,125.990819,2.0


## [프로세싱코드2] 질병 변수 추가

데이터셋 df에 질병변수 13개를 추가합니다.


>**질병 변수**<br>
    1.비만<br>
    2.고혈압<br>
    3.당뇨병<br>
    4.고콜레스테롤혈증<br>
    5.고중성지방혈증<br>
    6.B형간염<br>
    7.빈혈<br>
    8.뇌졸중<br>
    9.협심증또는심근경색증<br>
    10.천식<br>
    11.아토피피부염<br>
    12.골관절염<br>
    13.우울증
    
질병 여부의 기준은 질병관리청(https://knhanes.kdca.go.kr/knhanes/sub04/sub04_04_05.do)의 아래의 통계자료를 참고하였습니다.

![질병통계](fig2_disease_statistics.png)

In [21]:
data=df_merged
import numpy as np

# 1.비만
index1 = data[data["age"] >= 19].index
index2 = data.dropna(subset=["HE_BMI"]).index
index3 = data[data["HE_dprg"].isnull()].index
intersection_index = list(set(index1) & set(index2) & set(index3))
data.loc[:,"비만"] = np.NaN
data.loc[intersection_index, "비만"] = (data.loc[intersection_index]["HE_BMI"] >= 25).astype(int)

# 2.고혈압
index1 = data[data["age"] >= 19].index
index2 = data.dropna(subset=["HE_sbp1", "HE_sbp2", "HE_sbp3", "HE_dbp1", "HE_dbp2", "HE_dbp3"]).index
index3 = data[data["DI1_2"].isin([1, 2, 3, 4, 5, 8])].index
intersection_index = list(set(index1) & set(index2) & set(index3))
data.loc[:, "고혈압"] = np.NaN
data.loc[intersection_index, "고혈압"] = data.loc[intersection_index]["HE_HP"].map({1: 0, 2: 0, 3: 1})

# 3.당뇨병
index1 = data[data["HE_dprg"].isnull()].index
index2 = data.dropna(subset=["HE_glu"]).index
index3 = data[data["HE_fst"] >= 8].index
index4 = data[(data["DE1_dg"].isin([0, 1, 8])) & (data["DE1_31"].isin([0, 1, 8])) & (data["DE1_32"].isin([0, 1, 8]))].index
index5 = data.dropna(subset=["HE_HbA1c"]).index
intersection_index = list(set(index1) & set(index2) & set(index3) & set(index4) & set(index5))

index6 = data[(data["HE_glu"] >= 126)].index
index7 = data[(data["DE1_31"] == 1)].index
index8 = data[(data["DE1_32"] == 1)].index
index9 = data[(data["DE1_dg"] == 1)].index
index10 = data[(data["HE_HbA1c"] >= 6.5)].index
union_index = list(set(index6) | set(index7) | set(index8) | set(index9) | set(index10))

diabetes_index = list(set(intersection_index) & set(union_index))
complement_index = list(set(intersection_index) - set(diabetes_index))

data.loc[:, "당뇨병"] = np.NaN
data.loc[diabetes_index, "당뇨병"] = 1
data.loc[complement_index, "당뇨병"] = 0

# 4.고콜레스테롤혈증
index1 = data[(data["age"] >= 19) & (data["HE_fst"] >= 8)].index
index2 = data.dropna(subset=["HE_chol", "DI2_2"]).index
index3 = data[data["DI2_2"].isin([1, 2, 3, 4, 5, 8])].index
intersection_index = list(set(index1) & set(index2) & set(index3))
data.loc[:, "고콜레스테롤혈증"] = np.NaN
data.loc[intersection_index, "고콜레스테롤혈증"] = ((data.loc[intersection_index]["HE_chol"] >= 240) | (data.loc[intersection_index]["DI2_2"] == 1)).astype(int)

# 5.고중성지방혈증
index1 = data[(data["age"] >= 19) & (data["HE_fst"] >= 12)].index
index2 = data.dropna(subset=["HE_TG"]).index
intersection_index = list(set(index1) & set(index2))
data.loc[:, "고중성지방혈증"] = np.NaN
data.loc[intersection_index, "고중성지방혈증"] = (data.loc[intersection_index]["HE_TG"] >= 200).astype(int)

# 6.B형간염
index1 = data[data["age"] >= 10].index
index2 = data.dropna(subset=["HE_hepaB"]).index
index3 = data[data["HE_hepaB"].isin([0, 1])].index
intersection_index = list(set(index1) & set(index2) & set(index3))
data.loc[:, "B형간염"] = np.NaN
data.loc[intersection_index, "B형간염"] = data.loc[intersection_index]["HE_hepaB"]

# 7.빈혈
index1 = data[data["age"] >= 10].index
index2 = data.dropna(subset=["HE_anem"]).index
index3 = data[data["HE_anem"].isin([0, 1])].index
intersection_index = list(set(index1) & set(index2) & set(index3))
data.loc[:, "빈혈"] = np.NaN
data.loc[intersection_index, "빈혈"] = data.loc[intersection_index]["HE_anem"]

# 8.뇌졸중
index1 = data[data["age"] >= 30].index
index2 = data.dropna(subset=["DI3_dg"]).index
intersection_index = list(set(index1) & set(index2))
data.loc[:, "뇌졸중"] = np.NaN
data.loc[intersection_index, "뇌졸중"] = data.loc[intersection_index]["DI3_dg"]

# 8.협심증또는심근경색증
index1 = data[data["age"] >= 30].index
index2 = data.dropna(subset=["DI4_dg"]).index
intersection_index = list(set(index1) & set(index2))
data.loc[:, "협심증또는심근경색증"] = np.NaN
data.loc[intersection_index, "협심증또는심근경색증"] = data.loc[intersection_index]["DI4_dg"]

# 9.천식
index1 = data[data["age"] >= 19].index
index2 = data.dropna(subset=["DJ4_dg"]).index
intersection_index = list(set(index1) & set(index2))
data.loc[:, "천식"] = np.NaN
data.loc[intersection_index, "천식"] = data.loc[intersection_index]["DJ4_dg"]

# 10.아토피피부염
index1 = data[data["age"] >= 19].index
index2 = data.dropna(subset=["DL1_dg"]).index
intersection_index = list(set(index1) & set(index2))
data.loc[:, "아토피피부염"] = np.NaN
data.loc[intersection_index, "아토피피부염"] = data.loc[intersection_index]["DL1_dg"]

# 8.골관절염
index1 = data[data["age"] >= 30].index
index2 = data.dropna(subset=["DM2_dg"]).index
intersection_index = list(set(index1) & set(index2))
data.loc[:, "골관절염"] = np.NaN
data.loc[intersection_index, "골관절염"] = data.loc[intersection_index]["DM2_dg"]

# 8.우울증
index1 = data[data["age"] >= 19].index
index2 = data.dropna(subset=["DF2_dg"]).index
intersection_index = list(set(index1) & set(index2))
data.loc[:, "우울증"] = np.NaN
data.loc[intersection_index, "우울증"] = data.loc[intersection_index]["DF2_dg"]
data

Unnamed: 0,mod_d,ID,ID_fam,year,region,town_t,apt_t,psu,sex,age,...,고콜레스테롤혈증,고중성지방혈증,B형간염,빈혈,뇌졸중,협심증또는심근경색증,천식,아토피피부염,골관절염,우울증
0,b'2022.02.28.',b'A801169401',b'A8011694',2020.0,1.0,1.0,2.0,b'A801',1.0,39.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,b'2022.02.28.',b'A801169402',b'A8011694',2020.0,1.0,1.0,2.0,b'A801',2.0,39.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,b'2022.02.28.',b'A801169403',b'A8011694',2020.0,1.0,1.0,2.0,b'A801',1.0,10.0,...,,,0.0,0.0,,,,,,
3,b'2022.02.28.',b'A801169404',b'A8011694',2020.0,1.0,1.0,2.0,b'A801',1.0,7.0,...,,,,,,,,,,
4,b'2022.02.28.',b'A801169405',b'A8011694',2020.0,1.0,1.0,2.0,b'A801',1.0,4.0,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
91299,b'2022.03.08',b'P311960702',b'P3119607',2010.0,16.0,1.0,1.0,b'P311',1.0,6.0,...,,,,,,,,,,
91300,b'2022.03.08',b'P311960703',b'P3119607',2010.0,16.0,1.0,1.0,b'P311',2.0,13.0,...,,,0.0,0.0,,,,,,
91301,b'2022.03.08',b'P311960704',b'P3119607',2010.0,16.0,1.0,1.0,b'P311',1.0,47.0,...,0.0,0.0,0.0,0.0,8.0,8.0,8.0,8.0,8.0,0.0
91302,b'2022.03.08',b'P311960902',b'P3119609',2010.0,16.0,1.0,1.0,b'P311',2.0,20.0,...,0.0,0.0,0.0,1.0,,,8.0,8.0,,8.0


## [프로세싱코드3] 분석 변수 선택

데이터셋 df에서 분석할 변수 122개를 선택합니다.

선택하는 분석 변수는 1) 신상 정보, 2) 채혈 검사 정보, 3) 영양 검사 정보, 4) 질병 정보 등입니다.

변수의 ID 및 자세한 정보의 내용은 질병관리청(https://knhanes.kdca.go.kr/knhanes/sub04/sub04_04_05.do)의 '원시자료 이용지침서' 파일과 이 파일을 정리한 meta_data20.xlsx를 참고하면 됩니다.

![질병통계](fig3_metadata.png)

In [22]:
variables="""ID
ID_fam
year
region
town_t
sex
age
incm
ho_incm
incm5
ho_incm5
edu
occp
cfam
genertn
allownc
marri_1
marri_2
fam_rela
tins
D_1_1
educ
EC1_1
EC_wht_23
EC_wht_5
EC_pedu_1
EC_pedu_2
BD1_11
BD2_1
BD2_31
dr_month
BP6_10
BP7
mh_stress
BS3_1
BE3_31
BE5_1
LW_mt
LW_mt_a1
LW_br
HE_fst
HE_HPdr
HE_DMdr
HE_mens
HE_prg
HE_HPfh1
HE_HPfh2
HE_HPfh3
HE_HLfh1
HE_HLfh2
HE_HLfh3
HE_IHDfh1
HE_IHDfh2
HE_IHDfh3
HE_STRfh1
HE_STRfh2
HE_STRfh3
HE_DMfh1
HE_DMfh2
HE_DMfh3
HE_rPLS
HE_sbp
HE_dbp
HE_ht
HE_wt
HE_wc
HE_BMI
HE_glu
HE_HbA1c
HE_chol
HE_HDL_st2
HE_TG
HE_ast
HE_alt
HE_hepaB
HE_HB
HE_HCT
HE_BUN
HE_crea
HE_WBC
HE_RBC
HE_Bplt
HE_Uph
HE_Unitr
HE_Usg
HE_Upro
HE_Uglu
HE_Uket
HE_Ubil
HE_Ubld
HE_Uro
HE_Ucrea
N_INTK
N_EN
N_WATER
N_PROT
N_FAT
N_CHO
N_CA
N_PHOS
N_FE
N_NA
N_K
N_CAROT
N_RETIN
N_B1
N_B2
N_NIAC
N_VITC
비만
고혈압
당뇨병
고콜레스테롤혈증
고중성지방혈증
B형간염
빈혈
뇌졸중
협심증또는심근경색증
천식
아토피피부염
골관절염
우울증
""".strip().split()
df_merged_selected=data[variables]
df_merged_selected

Unnamed: 0,ID,ID_fam,year,region,town_t,sex,age,incm,ho_incm,incm5,...,고콜레스테롤혈증,고중성지방혈증,B형간염,빈혈,뇌졸중,협심증또는심근경색증,천식,아토피피부염,골관절염,우울증
0,b'A801169401',b'A8011694',2020.0,1.0,1.0,1.0,39.0,1.0,2.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,b'A801169402',b'A8011694',2020.0,1.0,1.0,2.0,39.0,1.0,2.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,b'A801169403',b'A8011694',2020.0,1.0,1.0,1.0,10.0,1.0,2.0,1.0,...,,,0.0,0.0,,,,,,
3,b'A801169404',b'A8011694',2020.0,1.0,1.0,1.0,7.0,1.0,2.0,1.0,...,,,,,,,,,,
4,b'A801169405',b'A8011694',2020.0,1.0,1.0,1.0,4.0,1.0,2.0,1.0,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
91299,b'P311960702',b'P3119607',2010.0,16.0,1.0,1.0,6.0,1.0,1.0,1.0,...,,,,,,,,,,
91300,b'P311960703',b'P3119607',2010.0,16.0,1.0,2.0,13.0,1.0,1.0,1.0,...,,,0.0,0.0,,,,,,
91301,b'P311960704',b'P3119607',2010.0,16.0,1.0,1.0,47.0,1.0,1.0,1.0,...,0.0,0.0,0.0,0.0,8.0,8.0,8.0,8.0,8.0,0.0
91302,b'P311960902',b'P3119609',2010.0,16.0,1.0,2.0,20.0,1.0,1.0,1.0,...,0.0,0.0,0.0,1.0,,,8.0,8.0,,8.0


## [프로세싱코드4] 기타값(해당없음, 모름) 처리

원본데이터에서 '해당없음' 값은 8, 88, 888 등으로 되어 있으며, '모름' 값은 9, 99, 999 등으로 변수마다 다르게 되어 있습니다.

이것을 해당없음은 -1값으로 모름 값은 -2값으로 통일합니다.

In [23]:
df = df_merged_selected
df.loc[df['sex']==1,'LW_mt'] = 8   #남성(sex=1)일 경우, 출산경험(LW_mt) 값을 해당없음(8)로 변경
df.loc[df['sex']==1,'LW_mt_a1'] = 8   #남성(sex=1)일 경우, 첫출산연령(LW_mt_a1) 값을 해당없음(8)로 변경
df.loc[df['sex']==1,'LW_br'] = 8   #남성(sex=1)일 경우, 모유수유경험(LW_br) 값을 해당없음(8)로 변경

lst_8 = ["HE_HPdr", "HE_DMdr", "HE_mens", "HE_prg"]
lst_9 = ["cfam", "genertn", "marri_1", "D_1_1", "HE_HPfh1", "HE_HPfh2", "HE_HLfh1", "HE_HLfh2", 
         "HE_IHDfh1", "HE_IHDfh2", "HE_STRfh1", "HE_STRfh2", "HE_DMfh1", "HE_DMfh2"]
lst_99 = ["allownc", "fam_rela", "tins"]
lst_8_9 = ["EC1_1", "BD1_11", "BD2_1", "BD2_31", "BP6_10", "BP7", "BS3_1", "BE5_1", "LW_mt", 
           "LW_br", "HE_HPfh3", "HE_HLfh3", "HE_IHDfh3", "HE_STRfh3", "HE_DMfh3"]
lst_88_99 = ["educ", "EC_wht_5", "EC_pedu_1", "EC_pedu_2", "BE3_31"]
lst_88_9_99 = ["marri_2"]
lst_888_999 = ["EC_wht_23", "LW_mt_a1"]

for col in df.columns:
    if col in lst_8:
        df.loc[df[col] == 8, col] = -1
    elif col in lst_9:
        df.loc[df[col] == 9, col] = -2
    elif col in lst_99:
        df.loc[df[col] == 99, col] = -2
    elif col in lst_8_9:
        df.loc[df[col] == 8, col] = -1
        df.loc[df[col] == 9, col] = -2
    elif col in lst_88_99:
        df.loc[df[col] == 88, col] = -1
        df.loc[df[col] == 99, col] = -2
    elif col in lst_88_9_99:
        df.loc[df[col] == 88, col] = -1
        df.loc[df[col] == 9, col] = -2
        df.loc[df[col] == 99, col] = -2
    elif col in lst_888_999:
        df.loc[df[col] == 888, col] = -1
        df.loc[df[col] == 999, col] = -2
    else:
        pass
    
df

Unnamed: 0,ID,ID_fam,year,region,town_t,sex,age,incm,ho_incm,incm5,...,고콜레스테롤혈증,고중성지방혈증,B형간염,빈혈,뇌졸중,협심증또는심근경색증,천식,아토피피부염,골관절염,우울증
0,b'A801169401',b'A8011694',2020.0,1.0,1.0,1.0,39.0,1.0,2.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,b'A801169402',b'A8011694',2020.0,1.0,1.0,2.0,39.0,1.0,2.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,b'A801169403',b'A8011694',2020.0,1.0,1.0,1.0,10.0,1.0,2.0,1.0,...,,,0.0,0.0,,,,,,
3,b'A801169404',b'A8011694',2020.0,1.0,1.0,1.0,7.0,1.0,2.0,1.0,...,,,,,,,,,,
4,b'A801169405',b'A8011694',2020.0,1.0,1.0,1.0,4.0,1.0,2.0,1.0,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
91299,b'P311960702',b'P3119607',2010.0,16.0,1.0,1.0,6.0,1.0,1.0,1.0,...,,,,,,,,,,
91300,b'P311960703',b'P3119607',2010.0,16.0,1.0,2.0,13.0,1.0,1.0,1.0,...,,,0.0,0.0,,,,,,
91301,b'P311960704',b'P3119607',2010.0,16.0,1.0,1.0,47.0,1.0,1.0,1.0,...,0.0,0.0,0.0,0.0,8.0,8.0,8.0,8.0,8.0,0.0
91302,b'P311960902',b'P3119609',2010.0,16.0,1.0,2.0,20.0,1.0,1.0,1.0,...,0.0,0.0,0.0,1.0,,,8.0,8.0,,8.0


## [프로세싱코드5] 결측값(nan)값 제거

나이 30이상으로 선택하고
결측값을 제거합니다.

In [28]:
df2 = df.loc[df["age"] >= 30]   #나이(age) >= 30이상 선택
df2 = df2.dropna()
df2 = df2.reset_index(drop=True, inplace=False)
df2

Unnamed: 0,ID,ID_fam,year,region,town_t,sex,age,incm,ho_incm,incm5,...,고콜레스테롤혈증,고중성지방혈증,B형간염,빈혈,뇌졸중,협심증또는심근경색증,천식,아토피피부염,골관절염,우울증
0,b'A801169401',b'A8011694',2020.0,1.0,1.0,1.0,39.0,1.0,2.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,b'A801169402',b'A8011694',2020.0,1.0,1.0,2.0,39.0,1.0,2.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,b'A801172802',b'A8011728',2020.0,1.0,1.0,2.0,58.0,4.0,4.0,5.0,...,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,b'A801177902',b'A8011779',2020.0,1.0,1.0,2.0,53.0,4.0,4.0,5.0,...,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,b'A801179602',b'A8011796',2020.0,1.0,1.0,2.0,53.0,3.0,3.0,3.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
32363,b'P310460201',b'P3104602',2010.0,16.0,2.0,2.0,74.0,2.0,1.0,2.0,...,0.0,0.0,0.0,1.0,8.0,8.0,8.0,8.0,1.0,0.0
32364,b'P310482401',b'P3104824',2010.0,16.0,2.0,1.0,77.0,2.0,1.0,2.0,...,0.0,0.0,0.0,0.0,8.0,8.0,0.0,8.0,8.0,8.0
32365,b'P310660901',b'P3106609',2010.0,16.0,2.0,1.0,75.0,1.0,1.0,1.0,...,0.0,1.0,0.0,0.0,8.0,8.0,8.0,8.0,8.0,8.0
32366,b'P311520701',b'P3115207',2010.0,16.0,1.0,1.0,70.0,4.0,4.0,5.0,...,1.0,1.0,0.0,0.0,8.0,8.0,8.0,8.0,8.0,8.0


## [프로세싱코드6] 결과 파일 저장

프로세싱된 파일을 저장합니다

In [25]:
df2.to_csv('nationalhealth_2010to2022.csv',index=False)