## Reprodukcja modeli
### Artykuł: "Predictive modeling in urgent care: a comparative study of machine learning approaches"
**Task**: klasyfikacja binarna (in-hospital mortality)  
**Modele**: regresja logistyczna, las losowy  
**Zmienne**: W48 == w2v(embedding diagnoz historycznych) + x48(statystyki opisowe szeregów czasowych)

In [42]:
import pandas as pd
def raw_to_df(path):
    result = pd.read_pickle(path)
    result_df = [None]*5
    for i in range(len(result)):
        result_df[i] = [result[i+1]['tr_auc'], 
                        result[i+1]['te_auc'], 
                        result[i+1]['f1_score'], 
                        result[i+1]['sen'], 
                        result[i+1]['spec']]
    df = pd.DataFrame(result_df, columns=["train_auc", "test_auc", "F1", "Sn", "Sp"])
    return df

In [43]:
result_lr = raw_to_df('./results/try1_LR/lr-mort-x48-w2v/scores/raw_stats')
result_rf = raw_to_df('./results/try2_RF/rf-mort-x48-w2v/scores/raw_stats')

#### Regresja logistyczna

In [44]:
result_lr.describe()

Unnamed: 0,train_auc,test_auc,F1,Sn,Sp
count,5.0,5.0,5.0,5.0,5.0
mean,0.799982,0.798273,0.490097,0.791432,0.805115
std,0.005154,0.022988,0.026431,0.038751,0.00772
min,0.793615,0.77368,0.46233,0.749621,0.797739
25%,0.796195,0.785897,0.476323,0.767477,0.798109
50%,0.801231,0.78828,0.477541,0.778452,0.804317
75%,0.802181,0.813005,0.50683,0.816388,0.809622
max,0.806688,0.830505,0.527462,0.84522,0.815789


#### Random Forest

In [45]:
result_rf.describe()

Unnamed: 0,train_auc,test_auc,F1,Sn,Sp
count,5.0,5.0,5.0,5.0,5.0
mean,0.903169,0.816402,0.488737,0.856715,0.776089
std,0.003008,0.015643,0.023032,0.019958,0.0178
min,0.899265,0.791164,0.455993,0.825493,0.756835
25%,0.901943,0.812236,0.481197,0.854325,0.765673
50%,0.903202,0.821286,0.485895,0.855842,0.770148
75%,0.903921,0.8286,0.505059,0.871017,0.786184
max,0.907514,0.828723,0.515539,0.8769,0.801604


AUC - F1 - Sn - Sp
![tabela](./results/tabela.png)

### "Best model"

In [46]:
lr = pd.read_pickle('./results/try1_LR/lr-mort-x48-w2v/best_model')
lr.get_params()

{'C': 0.001,
 'class_weight': None,
 'dual': False,
 'fit_intercept': True,
 'intercept_scaling': 1,
 'l1_ratio': None,
 'max_iter': 100,
 'multi_class': 'auto',
 'n_jobs': None,
 'penalty': 'l2',
 'random_state': None,
 'solver': 'lbfgs',
 'tol': 0.0001,
 'verbose': 1,
 'warm_start': False}

In [47]:
rf = pd.read_pickle('./results/try2_RF/rf-mort-x48-w2v/best_model')
rf.get_params()

{'bootstrap': True,
 'ccp_alpha': 0.0,
 'class_weight': None,
 'criterion': 'gini',
 'max_depth': None,
 'max_features': 'auto',
 'max_leaf_nodes': None,
 'max_samples': None,
 'min_impurity_decrease': 0.0,
 'min_impurity_split': None,
 'min_samples_leaf': 1,
 'min_samples_split': 2,
 'min_weight_fraction_leaf': 0.0,
 'n_estimators': 450,
 'n_jobs': None,
 'oob_score': False,
 'random_state': None,
 'verbose': 1,
 'warm_start': False}