- [미국 중환자실 데이터 MIMIC-III 정리](https://baeseongsu.github.io/posts/mimiciii/)
- [MIMIC-III 기초 테이블 확인](https://mimic.mit.edu/docs/iii/tables/)
- [MIMIC-III 스키마 확인](https://mit-lcp.github.io/mimic-schema-spy/)
- [ICD9_CODE 검색하는 사이트](http://www.icd9data.com/2012/Volume1/460-519/480-488/482/482.41.htm)

In [1]:
import pandas as pd
import numpy as np

# 결과 확인을 용이하게 하기 위한 코드
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = 'all'

# 경고 메시지 무시
import warnings
warnings.filterwarnings('ignore')

# 폐렴환자 추출

In [2]:
# patient = pd.read_csv('/data/MIMIC_III/PATIENTS.csv')
lab = pd.read_csv('/data/MIMIC_III/LABEVENTS.csv')
icd = pd.read_csv('/data/MIMIC_III/D_ICD_DIAGNOSES.csv') # 질병 정보
patient_icd = pd.read_csv('/data/MIMIC_III/DIAGNOSES_ICD.csv') # 환자별 질병 정보 

In [3]:
# 1. 폐렴 관련 질병 추출 -> 95가지
pneu_list = icd[(icd['SHORT_TITLE'].str.contains('pneum'))|icd['SHORT_TITLE'].str.contains('Pneum')]['ICD9_CODE']

# 2. 폐렴 관련 질병을 가진 환자 추출 -> 14159명
all_pneu_patient = patient_icd[patient_icd['ICD9_CODE'].isin(pneu_list)]

# 3. top3_pneu에 해당하는 환자 추출
top3_pneu = all_pneu_patient['ICD9_CODE'].value_counts()[:3].index
top3_pneu_patient = all_pneu_patient[all_pneu_patient['ICD9_CODE'].isin(top3_pneu)]

# ICD9_CODE : 486, 5070, 48241의 정확한 병명 확인
# -> 486 : 상세 불명의 유기체
# -> 5070 : 음식 또는 구토물에 의한 폐렴
# -> 48241 : 황색포도상구균에 의한 메티실린 감수성 폐렴
icd[icd['ICD9_CODE'].isin(top3_pneu)]

# 4. top3_pneu_patient에 한 환자 당 여러 병원 기록을 가지고 있을 수 있음 
# -> unique한 환자 수 : 7807명
환자list = top3_pneu_patient['SUBJECT_ID'].unique()

# 5. labevents에서 top3_pneu_patient의 정보 추출
환자lab = lab[lab['SUBJECT_ID'].isin(환자list)].reset_index(drop=True)

# 6. 'FLAG'열의 결측치를 'nan'으로 채운 후 분포 확인
# 'nan'으로 설정해주지 않으면 value_counts()에서 NaN의 개수가 안 잡힘
환자lab['FLAG'] = 환자lab['FLAG'].fillna('nan')
환자lab['FLAG'].value_counts()

# 7. patient의 폐렴 환자수가 labevent의 폐렴 환자수보다 8명 더 많음
# 외래환자는 labevent의 'HADM_ID'를 가지지 않음 
환자lab_list = 환자lab['SUBJECT_ID'].unique()
외래환자idx = list(set(환자list) - set(환자lab['SUBJECT_ID'].unique()))
외래환자idx

Unnamed: 0,ROW_ID,ICD9_CODE,SHORT_TITLE,LONG_TITLE
5129,5509,48241,Meth sus pneum d/t Staph,Methicillin susceptible pneumonia due to Staph...
5147,5528,486,"Pneumonia, organism NOS","Pneumonia, organism unspecified"
5407,5136,5070,Food/vomit pneumonitis,Pneumonitis due to inhalation of food or vomitus


nan         5931403
abnormal    3406609
delta         18337
Name: FLAG, dtype: int64

[60961, 48968, 17674, 9388, 19097, 93114, 58012, 95230]

# LAB, PRE, PRO에서 ITEMID 추출 후 X data 생성
- `X` : (7727, 10, 4068) = (PATIENTS, TIMEPOINTS, FEATURES)


- LAB : `SUBJECT_ID`, `CHARTTIME`, `ITEMID`, `FLAG`
- PRO : `SUBJECT_ID`, `STARTTIME`, `ENDTIME`, `ITEMID`
- PRE : `SUBJECT_ID`, `STARTDATE`, `ENDDATE`, `NDC`


- LAB과 달리 PRO, PRE는 시작과 끝 날짜를 기준으로 추가적인 전처리 해줘야 함
- 최종 columns = [`SUBJECT_ID`, `ITEMID`, `CHARTTIME`, `TYPE`]

In [4]:
# 날짜만 남기는 함수
import datetime as dt
def date_only(df, x):
    df[x] = pd.to_datetime(df[x])
    df[x] = df[x].dt.date

## LAB

In [5]:
lab = pd.read_csv('/data/MIMIC_III/LABEVENTS.csv')

In [6]:
lab2 = lab[lab['SUBJECT_ID'].isin(환자lab_list)]
lab2 = lab2[['SUBJECT_ID','CHARTTIME', 'ITEMID', 'FLAG']]

# FLAG가 'abnormal'인 것만 추출
lab2 = lab2[lab2['FLAG']=='abnormal']
lab2 = lab2.drop(columns = ['FLAG'])
date_only(lab2, 'CHARTTIME')
lab2 = lab2.sort_values(by = 'SUBJECT_ID')
lab2['TYPE'] = 'LAB'

lab2 = lab2.reset_index(drop=True)

### Issue
- 1) date_only 함수 적용 후 pre2에서 중복되는 행 제거를 하지 않았음
    - 중복 제거 시 3406609 rows -> 2667170 rows
    - 결론적으로 x 만들 때 덮어씌워지므로 상관없음 (다만 불필요한 연산을 함)

In [7]:
lab2 = lab2.drop_duplicates()
lab2

Unnamed: 0,SUBJECT_ID,CHARTTIME,ITEMID,TYPE
0,9,2149-11-14,50821,LAB
1,9,2149-11-13,50910,LAB
2,9,2149-11-13,50893,LAB
3,9,2149-11-13,50882,LAB
4,9,2149-11-13,50818,LAB
...,...,...,...,...
3406603,99985,2181-02-05,50931,LAB
3406604,99985,2181-02-05,51006,LAB
3406605,99985,2181-02-05,51221,LAB
3406607,99985,2181-02-01,51244,LAB


## PRE

In [8]:
pre = pd.read_csv('/data/MIMIC_III/PRESCRIPTIONS.csv')

In [9]:
pre2 = pre[pre['SUBJECT_ID'].isin(환자lab_list)]
pre2 = pre2[['SUBJECT_ID', 'STARTDATE', 'ENDDATE', 'NDC']]
date_only(pre2, 'STARTDATE')
date_only(pre2, 'ENDDATE')

# STARTDATE, ENDDATE가 모두 null인 경우 삭제
both_null = pre2[(pre2['STARTDATE'].isnull())&(pre2['ENDDATE'].isnull())].index
pre2 = pre2.drop(index=both_null)

# NDC가 null인 경우 삭제
ndc_null = pre2[(pre2['NDC'].isnull())].index
pre2 = pre2.drop(index=ndc_null)

# ENDDATE가 null인 경우 STARTDATE 삽입
end_null = pre2[pre2['ENDDATE'].isnull()]
end_null['ENDDATE'] = end_null['STARTDATE']
pre2.loc[end_null.index] = end_null

# STARTDATE가 null인 경우 ENDDATE 삽입
start_null = pre2[pre2['STARTDATE'].isnull()]
start_null['STARTDATE'] = start_null['ENDDATE']
pre2.loc[start_null.index] = start_null

pre2 = pre2.sort_values(by = ['SUBJECT_ID','STARTDATE'])
pre2 = pre2.reset_index(drop = True)

pre2

Unnamed: 0,SUBJECT_ID,STARTDATE,ENDDATE,NDC
0,9,2149-11-09,2149-11-09,0.0
1,9,2149-11-09,2149-11-09,85036207.0
2,9,2149-11-09,2149-11-09,456066270.0
3,9,2149-11-09,2149-11-09,456066270.0
4,9,2149-11-09,2149-11-09,338001702.0
...,...,...,...,...
1327244,99985,2181-02-09,2181-02-12,0.0
1327245,99985,2181-02-09,2181-02-12,338101948.0
1327246,99985,2181-02-09,2181-02-12,8084199.0
1327247,99985,2181-02-09,2181-02-12,0.0


### Issue
- 1) CHARTTIME열 생성 전 pre2에서 중복되는 행 제거를 하지 않았음
    - 중복 제거 시 1327249 rows -> 1161151 rows


- 2) CHARTTIME열 생성 후 중복 제거
    - 5052211 rows -> 4086549 rows
    
    
- 결과적으로 CHARTTIME열 생성 후 중복 제거 한 번만 해줘도 ok
    - 다만 CHARTTIME열 생성하기 위한 작업을 할 때 불필요한 연산을 하게 됨
    
   
- 기존 코드의 이후에서 이 중복 제거 작업이 들어가는지 확인해 봐야 함
    - 결론적으로 x 만들 때 덮어씌워지므로 상관없음 (다만 불필요한 연산을 함)

In [10]:
from datetime import datetime, timedelta
from tqdm import tqdm

def date_range(start, end):
    dates = [(start + timedelta(days=i)).strftime('%Y-%m-%d') for i in range((end-start).days+1)]
    return dates

lst_time = []
lst_itemid = []
lst_subid = []

pre2_list = pre2.values.tolist()
for idx, row in enumerate(tqdm(pre2_list)):
    sub, start, end, itemid = row[0], row[1], row[2], row[3]
    
    # CHARTTIME
    day_list = date_range(start, end)    
    lst_time.extend(day_list)
    
    # ITEMID 
    lst_itemid.extend([itemid] * len(day_list))
    
    # SUBJECT_ID
    lst_subid.extend([sub] * len(day_list))

100%|██████████| 1327249/1327249 [00:13<00:00, 98633.78it/s] 


In [11]:
pre2sub = pd.DataFrame(lst_subid)
pre2time = pd.DataFrame(lst_time)
pre2item = pd.DataFrame(lst_itemid)

pre2 = pd.concat([pre2sub, pre2time, pre2item], axis = 1)
pre2.columns = ['SUBJECT_ID', 'CHARTTIME', 'ITEMID']
pre2 = pre2.drop_duplicates()
pre2['TYPE'] = 'PRE'

pre2

Unnamed: 0,SUBJECT_ID,CHARTTIME,ITEMID,TYPE
0,9,2149-11-09,0.0,PRE
1,9,2149-11-09,85036207.0,PRE
2,9,2149-11-09,456066270.0,PRE
4,9,2149-11-09,338001702.0,PRE
5,9,2149-11-09,74302401.0,PRE
...,...,...,...,...
5650360,99985,2181-02-10,8084199.0,PRE
5650361,99985,2181-02-11,8084199.0,PRE
5650362,99985,2181-02-12,8084199.0,PRE
5650367,99985,2181-02-11,781305714.0,PRE


## PRO

In [12]:
pro = pd.read_csv('/data/MIMIC_III/PROCEDUREEVENTS_MV.csv')

In [13]:
pro2 = pro[pro['SUBJECT_ID'].isin(환자lab_list)]
pro2 = pro2[['SUBJECT_ID', 'STARTTIME', 'ENDTIME', 'ITEMID']]
date_only(pro2, 'STARTTIME')
date_only(pro2, 'ENDTIME')

# STARTTIME, ENDTIME이 모두 null인 경우 삭제
both_null = pro2[(pro2['STARTTIME'].isnull())&(pro2['ENDTIME'].isnull())].index
pro2 = pro2.drop(index=both_null)

# ITEMID가 null인 경우 삭제
ndc_null = pro2[(pro2['ITEMID'].isnull())].index
pro2 = pro2.drop(index=ndc_null)

# ENDTIME이 null인 경우 STARTTIME 삽입
end_null = pro2[pro2['ENDTIME'].isnull()]
end_null['ENDDATE'] = end_null['STARTTIME']
pro2.loc[end_null.index] = end_null

# STARTTIME이 null인 경우 ENDTIME 삽입
start_null = pro2[pro2['STARTTIME'].isnull()]
start_null['STARTTIME'] = start_null['ENDTIME']
pro2.loc[start_null.index] = start_null

pro2 = pro2.sort_values(by = ['SUBJECT_ID','STARTTIME'])
pro2 = pro2.reset_index(drop = True)

### Issue
- 1) date_only 함수 적용 후 pro2에서 중복되는 행 제거를 하지 않았음
    - 중복 제거 시 79678 rows -> 71052 rows
    

- 결론적으로 x 만들 때 덮어씌워지므로 상관없음 (다만 불필요한 연산을 함)

In [14]:
pro2 = pro2.drop_duplicates()
pro2

Unnamed: 0,SUBJECT_ID,STARTTIME,ENDTIME,ITEMID
0,36,2134-05-12,2134-05-15,224275
1,36,2134-05-12,2134-05-12,225402
2,36,2134-05-12,2134-05-12,221214
4,36,2134-05-12,2134-05-12,225432
5,36,2134-05-12,2134-05-12,224385
...,...,...,...,...
79672,99985,2181-02-03,2181-02-03,225454
79673,99985,2181-02-04,2181-02-04,225814
79674,99985,2181-02-05,2181-02-05,225459
79675,99985,2181-02-07,2181-02-07,227194


In [15]:
from datetime import datetime, timedelta
from tqdm import tqdm

def date_range(start, end):
    dates = [(start + timedelta(days=i)).strftime('%Y-%m-%d') for i in range((end-start).days+1)]
    return dates

lst_time = []
lst_itemid = []
lst_subid = []

pro2_list = pro2.values.tolist()
for idx, row in enumerate(tqdm(pro2_list)):
    sub, start, end, itemid = row[0], row[1], row[2], row[3]
    
    # CHARTTIME
    day_list = date_range(start, end)    
    lst_time.extend(day_list)
    
    # ITEMID 
    lst_itemid.extend([itemid] * len(day_list))
    
    # SUBJECT_ID
    lst_subid.extend([sub] * len(day_list))

100%|██████████| 71052/71052 [00:00<00:00, 162183.35it/s]


In [16]:
pro2sub = pd.DataFrame(lst_subid)
pro2time = pd.DataFrame(lst_time)
pro2item = pd.DataFrame(lst_itemid)

pro2 = pd.concat([pro2sub, pro2time, pro2item], axis = 1)
pro2.columns = ['SUBJECT_ID', 'CHARTTIME', 'ITEMID']
pro2 = pro2.drop_duplicates()
pro2['TYPE'] = 'PRO'

pro2

Unnamed: 0,SUBJECT_ID,CHARTTIME,ITEMID,TYPE
0,36,2134-05-12,224275,PRO
1,36,2134-05-13,224275,PRO
2,36,2134-05-14,224275,PRO
3,36,2134-05-15,224275,PRO
4,36,2134-05-12,225402,PRO
...,...,...,...,...
152377,99985,2181-02-04,225814,PRO
152378,99985,2181-02-05,225459,PRO
152379,99985,2181-02-07,227194,PRO
152380,99985,2181-02-08,224264,PRO


# `X` : (7799, 10, 4069)
- total_data - 6895982 rows
- LAB, PRE, PRO에서 중복 제거 안 한 예전 코드 기준으론 9221920 rows

In [17]:
m1 = pd.merge(lab2, pre2, how = 'outer')
total_data = pd.merge(m1, pro2, how = 'outer')
total_data = total_data.sort_values(['SUBJECT_ID','CHARTTIME']).reset_index(drop=True)
total_data = total_data.astype({'ITEMID':'int'})
total_data

# total_data.to_csv('total_data.csv', index = False)

Unnamed: 0,SUBJECT_ID,CHARTTIME,ITEMID,TYPE
0,9,2149-11-09,50822,LAB
1,9,2149-11-09,50821,LAB
2,9,2149-11-09,50813,LAB
3,9,2149-11-09,50809,LAB
4,9,2149-11-09,50808,LAB
...,...,...,...,...
6895977,99985,2181-03-06,51256,LAB
6895978,99985,2182-03-14,51279,LAB
6895979,99985,2182-03-14,51222,LAB
6895980,99985,2182-03-14,51221,LAB


In [3]:
total_data = pd.read_csv('total_data.csv')

# 날짜만 남기는 함수
import datetime as dt
def date_only(df, x):
    df[x] = pd.to_datetime(df[x])
    df[x] = df[x].dt.date

dic_sub2idx = {}
for i, j in enumerate(total_data['SUBJECT_ID'].unique()):
    dic_sub2idx[j] = i

dic_item2idx = {}
for i, j in enumerate(total_data['ITEMID'].sort_values().unique()):
    dic_item2idx[j] = i

admission = pd.read_csv('/data/MIMIC_III/ADMISSIONS.csv')
admission = admission[admission['SUBJECT_ID'].isin(total_data['SUBJECT_ID'].unique())]

# DISCHTIME열 시간 제거
date_only(admission, 'DISCHTIME')
dic_sub2final_date = dict(admission.groupby('SUBJECT_ID')['DISCHTIME'].max())

In [4]:
from datetime import timedelta
from tqdm import tqdm
import datetime

x = np.zeros((7799,10,4069))

IF = open('total_data.csv','r')
line = IF.readline()
for line in tqdm(IF):
    ss = line.strip('\n').split(',')
    sub, item, charttime = int(ss[0]), int(ss[2]), datetime.date.fromisoformat(ss[1])
    subidx = dic_sub2idx[sub]
    itemidx = dic_item2idx[item]
    finaldate = dic_sub2final_date[sub]
    dateidx = -(finaldate - charttime).days + 10
    
    if (dateidx < 0) or (dateidx > 9):
        continue
    else:
        x[subidx, dateidx, itemidx] = 1
        
IF.close()
print(x.sum())

6895982it [00:11, 585769.68it/s]


2042722.0


# `X` : (7727, 10, 4068)
- total_data를 기반으로 3차원 데이터 구성 시 퇴원 시점과의 간격으로 D-1 ~ D-10만 고려하기 때문에 제외되는 SUBJECT_ID가 있을 수 있음
- 모든 timepoint, item_id 대해서 0값을 가지는 sub_id는 총 72개
- 이 72명에 해당하는 sub_id를 제외하고, 남은 사람 중에 누구에게도 해당되지 않는 item_id를 제거해주면 `X` : (7727, 10, 4068)이 된다.


- 본래 1과 2를 모두 수행해야 하나 2를 추가로 실행했을 때 LSTM 모델 예측력의 차이는 거의 없었음 -> (7727, 10, 4068)로 유지해도 ok
    - 아래 x_(7727, 10, 3595) test 참고

In [5]:
# 1. sub_id에 대해서 모두 0값인 데이터 제거 (7799명 -> 7727명)
sub_sum = x.sum(axis=1).sum(axis=1)
sub_sum[sub_sum > 0] = 1
sub_sum = pd.DataFrame(sub_sum)
zero_index = sub_sum[sub_sum[0]==0].index
new_x = np.delete(x, zero_index, axis = 0)
new_x.shape
new_x.sum()

(7727, 10, 4069)

2042722.0

In [6]:
# 2. feature에 대해서 모두 0값인 데이터 제거 (4069개 -> 3595개)
f_sum = new_x.sum(axis=0).sum(axis=0)
f_sum[f_sum > 0] = 1
f_sum = pd.DataFrame(f_sum)
zero_index2 = f_sum[f_sum[0]==0].index
final_x = np.delete(new_x, zero_index2, axis = 2)
final_x.shape
final_x.sum()

(7727, 10, 3595)

2042722.0

In [38]:
# x save
# np.save('x_(7727,10,3595).npy', final_x)
# np.save('x_(7727,10,4069).npy', new_x)

# `x` : (7727, 10, 3595) test

- x_(7727, 10, 3595) - Single LSTM
    - `accuracy` : 0.754, `precision` : 0.787, `recall` : 0.810, `f1` : 0.798, `roc_auc` : 0.740


- x_(7727, 10, 4068) - Single LSTM
    - `accuracy` : 0.755, `precision` : 0.799, `recall` : 0.792, `f1` : 0.795, `roc_auc` : 0.746
    
    
- x_(7799, 10, 4069) - Single LSTM
    - `accuracy` : 0.747, `precision` : 0.799, `recall` : 0.796, `f1` : 0.798, `roc_auc` : 0.730

In [2]:
import numpy as np
import pandas as pd
import tensorflow as tf
import warnings 
warnings.filterwarnings(action='ignore')

gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
    try:
        for i in range(len(gpus)):
            tf.config.experimental.set_memory_growth(gpus[i], True)
    except RuntimeError as e:
        # 프로그램 시작시에 메모리 증가가 설정되어야만 합니다
        print(e)

# 결과 확인을 용이하게 하기 위한 코드
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = 'all'

In [3]:
import random  
seed_num = 42
random.seed(seed_num)

x = np.load('/project/LSH/x_(7799,10,4069).npy')
y = np.load('/project/LSH/y_(7799,1).npy')

idx = list(range(len(x)))
random.shuffle(idx)

i = round(x.shape[0]*0.8)
X_train, y_train = x[idx[:i],:,:], y[idx[:i]]
X_test, y_test = x[idx[i:],:,:], y[idx[i:]]

X_train.shape, y_train.shape, X_test.shape, y_test.shape

((6239, 10, 4069), (6239,), (1560, 10, 4069), (1560,))

In [4]:
from sklearn.model_selection import train_test_split
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense, GRU, Dropout, LSTM, InputLayer
from sklearn.ensemble import VotingClassifier, AdaBoostClassifier
from sklearn.metrics import accuracy_score
from sklearn.metrics import precision_score
from sklearn.metrics import recall_score
from sklearn.metrics import f1_score
from sklearn.metrics import roc_auc_score
from sklearn import metrics 
from tensorflow import keras
import random  
from tensorflow.keras.callbacks import EarlyStopping

# LSTM
def get_model():
    lstm = Sequential()
    lstm.add(InputLayer(input_shape=(x.shape[1],x.shape[2])))
    lstm.add(LSTM(units=128, activation='hard_sigmoid', return_sequences=True))
    lstm.add(LSTM(units=64, activation='hard_sigmoid', return_sequences=True))
    lstm.add(Dropout(0.2))
    lstm.add(LSTM(units=64, activation='hard_sigmoid', return_sequences=True))
    lstm.add(LSTM(units=32, activation='hard_sigmoid', return_sequences=False))
    lstm.add(Dropout(0.2))
    lstm.add(Dense(units=1, activation='sigmoid'))

    lstm.compile(optimizer= keras.optimizers.Adam(learning_rate = 0.01), 
                 loss = "binary_crossentropy", metrics=['acc'])
    return lstm

with tf.device('/device:GPU:0'):
    print("Single LSTM Start")
#     tf.random.set_seed(0)
    model = get_model()
    
    early_stop = EarlyStopping(monitor='val_loss', patience=30, verbose=1, restore_best_weights=False)
    model.fit(X_train, y_train, epochs=300, batch_size=1024, validation_split=0.25, callbacks=[early_stop])
    preds = model.predict(X_test)

    preds[preds>0.5]=1
    preds[preds<=0.5]=0
    precision = precision_score(y_test, preds)
    recall = recall_score(y_test, preds)
    f1 = f1_score(y_test, preds)
    roc_auc = roc_auc_score(y_test, preds)
    acc = accuracy_score(y_test, preds)

    print(f'accuracy : {acc:.3f}, precision : {precision:.3f}, recall : {recall:.3f}, f1 : {f1:.3f}, roc_auc : {roc_auc:.3f}')

Single LSTM Start
Epoch 1/300
Epoch 2/300
Epoch 3/300
Epoch 4/300
Epoch 5/300
Epoch 6/300
Epoch 7/300
Epoch 8/300
Epoch 9/300
Epoch 10/300
Epoch 11/300
Epoch 12/300
Epoch 13/300
Epoch 14/300
Epoch 15/300
Epoch 16/300
Epoch 17/300
Epoch 18/300
Epoch 19/300
Epoch 20/300
Epoch 21/300
Epoch 22/300
Epoch 23/300
Epoch 24/300
Epoch 25/300
Epoch 26/300
Epoch 27/300
Epoch 28/300
Epoch 29/300
Epoch 30/300
Epoch 31/300
Epoch 32/300
Epoch 33/300
Epoch 34/300
Epoch 35/300
Epoch 36/300
Epoch 37/300
Epoch 38/300
Epoch 39/300
Epoch 40/300
Epoch 00040: early stopping


<tensorflow.python.keras.callbacks.History at 0x7faad8252130>

accuracy : 0.747, precision : 0.799, recall : 0.796, f1 : 0.798, roc_auc : 0.730
