## Business problem

#### Stakeholder: NASA

- Want a prediction of whether a star is variable that can be used in absence of a true classification.

#### True business problem:  

- Predict whether a star is variable or not using other information about the star.

#### Deliverables: Inference or Prediction?

- Prediction
    - If a star isn't classified as variable or not, then our prediction can be used in absence of a true classification

#### Context:

- **False negative** Predicts star is not variable, star is variable
    - **Outcome**: A variable star is overlooked.
- **False positive** Predicts star is variable, star is not variable
    - **Outcome**: Resources are put into looking at a star which is not variable.

In [24]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from sklearn.impute import SimpleImputer
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import VotingClassifier
from sklearn.ensemble import BaggingClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import ExtraTreesClassifier
from sklearn.pipeline import Pipeline
from sklearn.metrics import accuracy_score, f1_score, recall_score, roc_auc_score, plot_confusion_matrix, confusion_matrix, plot_roc_curve, precision_score
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.linear_model import LinearRegression
from sklearn.neighbors import KNeighborsRegressor
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import StackingRegressor
from sklearn.ensemble import AdaBoostClassifier
import xgboost as xgb

pd.set_option('display.max_columns', None)

In [25]:
df = pd.read_csv("../../data/UsableData.csv")
holdoutdf = pd.read_csv("../../data/UntestableData.csv")
df = df.drop(columns=['Unnamed: 0'])
df['HvarType'] = df['HvarType'].mask(df['HvarType'] != 'C', other='V')

mask = {
    'C': 0,
    'V': 1
}
df['Target'] = df['HvarType'].map(mask)

x = df.drop(columns=['HvarType','Target'])
y = df['Target']

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=.25, random_state=42)

In [51]:
pd.set_option('display.max_rows', None)

In [67]:
nullCounts = df.isna().sum()

In [68]:
nullsHoldout = holdoutdf.isna().sum()

In [75]:
dfnullperc = nullCounts / len(df)

In [76]:
holdoutnullperc = nullsHoldout / len(holdoutdf)

In [77]:
holdoutnullperc - dfnullperc

(V-I)red      0.000000
---           0.000000
AstroRef      0.142837
B-V          -0.004779
BD            0.062602
BTmag         0.008277
CCDM          0.141226
CPD          -0.062782
Catalog       0.000000
Chart        -0.014734
CoD          -0.024194
CombMag       0.111837
DE:RA        -0.001901
DEdeg        -0.001901
DEdms         0.000000
F1           -0.001901
F2            0.000222
HD            0.005415
HIP           0.000000
HPmin        -0.012636
Hpmag         0.000265
Hpmax        -0.012636
Hpscat       -0.012636
HvarType      1.000000
MultFlag      0.137054
Ncomp        -0.001901
Notes         0.039276
Nsys          0.141226
Period        0.035478
Plx          -0.001901
Plx:DE       -0.001901
Plx:RA       -0.001901
Proxy         0.113430
Qual          0.155798
RAdeg        -0.001901
RAhms         0.000000
Source        0.033705
SpType        0.000820
Survey        0.027230
Target             NaN
Unnamed: 0         NaN
V-I          -0.004908
VTmag         0.008322
VarFlag    

In [28]:
y_train.value_counts(normalize=True)

0    0.650421
1    0.349579
Name: Target, dtype: float64

Guessing not variable every time would be correct 65% of the time.

In [29]:
numCols = [col for col in x_train.columns if x_train[col].dtype != 'O']
numCols

['HIP',
 'Vmag',
 'VarFlag',
 'RAdeg',
 'DEdeg',
 'Plx',
 'pmRA',
 'pmDE',
 'e_RAdeg',
 'e_DEdeg',
 'e_Plx',
 'e_pmRA',
 'e_pmDE',
 'DE:RA',
 'Plx:RA',
 'Plx:DE',
 'pmRA:RA',
 'pmRA:DE',
 'pmRA:Plx',
 'pmDE:RA',
 'pmDE:DE',
 'pmDE:Plx',
 'pmDE:pmRA',
 'F1',
 'F2',
 '---',
 'BTmag',
 'e_BTmag',
 'VTmag',
 'e_VTmag',
 'B-V',
 'e_B-V',
 'V-I',
 'e_V-I',
 'Hpmag',
 'e_Hpmag',
 'Hpscat',
 'o_Hpmag',
 'Hpmax',
 'HPmin',
 'Period',
 'moreVar',
 'Nsys',
 'Ncomp',
 'theta',
 'rho',
 'e_rho',
 'dHp',
 'e_dHp',
 'HD',
 '(V-I)red']

In [30]:
x_train[numCols].isna().sum()

HIP              0
Vmag             0
VarFlag      45034
RAdeg          165
DEdeg          165
Plx            165
pmRA           165
pmDE           165
e_RAdeg        165
e_DEdeg        165
e_Plx          165
e_pmRA         165
e_pmDE         165
DE:RA          165
Plx:RA         165
Plx:DE         165
pmRA:RA        165
pmRA:DE        165
pmRA:Plx       165
pmDE:RA        165
pmDE:DE        165
pmDE:Plx       165
pmDE:pmRA      165
F1             165
F2             858
---              0
BTmag         1376
e_BTmag       1376
VTmag         1348
e_VTmag       1348
B-V            689
e_B-V          689
V-I            689
e_V-I          689
Hpmag            1
e_Hpmag          1
Hpscat         707
o_Hpmag        707
Hpmax          707
HPmin          707
Period       51831
moreVar      47530
Nsys         41951
Ncomp          165
theta        46366
rho          46366
e_rho        46366
dHp          46366
e_dHp        46366
HD            8717
(V-I)red         0
dtype: int64

### Simple Models, model each numeric variable with a logistic regression:

In [31]:
preprocessor = Pipeline(steps=[
    ("imputer", SimpleImputer(strategy="median"))
])

Run a simple model for each numeric column and print metrics for it. If those metrics are all above 0, then add that column to a list to be used later.

In [32]:
len(numCols)

51

In [33]:
used_num_cols = []
for col in numCols:
    lg = Pipeline(steps=[
        ('preprocessor',preprocessor),
        ('model',LogisticRegression(max_iter=5000, random_state=42))
    ])
    lg.fit(x_train[[col]], y_train)
    
    train_preds = lg.predict(x_train[[col]])
    test_preds = lg.predict(x_test[[col]])
    train_probs = lg.predict_proba(x_train[[col]])[:,1]
    test_probs = lg.predict_proba(x_test[[col]])[:,1]
    
    train_accuracy = accuracy_score(y_train, train_preds)
    test_accuracy = accuracy_score(y_test, test_preds)
    train_recall = recall_score(y_train, train_preds, zero_division=0)
    test_recall = recall_score(y_test, test_preds, zero_division=0)
    train_precision = precision_score(y_train, train_preds, zero_division=0)
    test_precision = precision_score(y_test, test_preds, zero_division=0)
    train_f1 = f1_score(y_train, train_preds, zero_division=0)
    test_f1 = f1_score(y_test, test_preds, zero_division=0)
    train_rocauc = roc_auc_score(y_train, train_probs)
    test_rocauc = roc_auc_score(y_test, test_probs)
    
    print(col)
    if (train_accuracy > 0) and (test_accuracy > 0) and (train_recall > 0) and (test_recall > 0) and (train_precision > 0) and (test_precision > 0) and (train_f1 > 0) and (test_f1 > 0) and (train_rocauc > 0) and (test_rocauc > 0):
        used_num_cols.append(col)
        print(f'{col} added to used column list')
    print(F'\nTrain Accuracy:\t\t{train_accuracy}')
    print(F'Test Accuracy:\t\t{test_accuracy}')
    print(F"\nTrain Recall:\t\t{train_recall}")
    print(F'Test Recall:\t\t{test_recall}')
    print(F"\nTrain Precision:\t{train_precision}")
    print(F'Test Precision:\t\t{test_precision}')
    print(F"\nTrain f1:\t\t{train_f1}")
    print(F'Test f1:\t\t{test_f1}')
    print(F"\nTrain ROC-AUC:\t\t{train_rocauc}")
    print(F'Test ROC-AUC:\t\t{test_rocauc}')
    print("\nTrain Matrix:\n")
    print(confusion_matrix(y_train, train_preds))
    print("\nTest Matrix:\n")
    print(confusion_matrix(y_test, test_preds))
    print('\n')
    print("****"*20)
    print('\n')

HIP

Train Accuracy:		0.6504207312532578
Test Accuracy:		0.6486094046688261

Train Recall:		0.0
Test Recall:		0.0

Train Precision:	0.0
Test Precision:		0.0

Train f1:		0.0
Test f1:		0.0

Train ROC-AUC:		0.49096253710321636
Test ROC-AUC:		0.49019535851846385

Train Matrix:

[[34938     0]
 [18778     0]]

Test Matrix:

[[11614     0]
 [ 6292     0]]


********************************************************************************


Vmag
Vmag added to used column list

Train Accuracy:		0.6512770869014819
Test Accuracy:		0.6497263487099296

Train Recall:		0.0025029289594205987
Test Recall:		0.003178639542275906

Train Precision:	0.9791666666666666
Test Precision:		1.0

Train f1:		0.004993094656326358
Test f1:		0.0063371356147021544

Train ROC-AUC:		0.5400225289000143
Test ROC-AUC:		0.5422443083631775

Train Matrix:

[[34937     1]
 [18731    47]]

Test Matrix:

[[11614     0]
 [ 6272    20]]


********************************************************************************


VarFlag
Var

pmRA:DE

Train Accuracy:		0.6504207312532578
Test Accuracy:		0.6486094046688261

Train Recall:		0.0
Test Recall:		0.0

Train Precision:	0.0
Test Precision:		0.0

Train f1:		0.0
Test f1:		0.0

Train ROC-AUC:		0.504168333648942
Test ROC-AUC:		0.5049197411305447

Train Matrix:

[[34938     0]
 [18778     0]]

Test Matrix:

[[11614     0]
 [ 6292     0]]


********************************************************************************


pmRA:Plx

Train Accuracy:		0.6504207312532578
Test Accuracy:		0.6486094046688261

Train Recall:		0.0
Test Recall:		0.0

Train Precision:	0.0
Test Precision:		0.0

Train f1:		0.0
Test f1:		0.0

Train ROC-AUC:		0.506560926108621
Test ROC-AUC:		0.4997325292785709

Train Matrix:

[[34938     0]
 [18778     0]]

Test Matrix:

[[11614     0]
 [ 6292     0]]


********************************************************************************


pmDE:RA

Train Accuracy:		0.6504207312532578
Test Accuracy:		0.6486094046688261

Train Recall:		0.0
Test Recall:		0.0

Trai

Hpmag
Hpmag added to used column list

Train Accuracy:		0.6524871546652766
Test Accuracy:		0.650787445548978

Train Recall:		0.006390456917669613
Test Recall:		0.006357279084551812

Train Precision:	0.9302325581395349
Test Precision:		0.975609756097561

Train f1:		0.01269371132384831
Test f1:		0.012632243802305383

Train ROC-AUC:		0.5471327566484631
Test ROC-AUC:		0.5489584864858829

Train Matrix:

[[34929     9]
 [18658   120]]

Test Matrix:

[[11613     1]
 [ 6252    40]]


********************************************************************************


e_Hpmag
e_Hpmag added to used column list

Train Accuracy:		0.6827388487601459
Test Accuracy:		0.6813358650731598

Train Recall:		0.0925551176909149
Test Recall:		0.09345200254291164

Train Precision:	0.9988505747126437
Test Precision:		0.9966101694915255

Train f1:		0.16941222341358805
Test f1:		0.1708805579773322

Train ROC-AUC:		0.6707624275605396
Test ROC-AUC:		0.6723953999332853

Train Matrix:

[[34936     2]
 [17040  1738]]

T

(V-I)red
(V-I)red added to used column list

Train Accuracy:		0.6904460495941619
Test Accuracy:		0.6897687925834916

Train Recall:		0.11950154436042178
Test Recall:		0.12285441830896376

Train Precision:	0.9597946963216424
Test Precision:		0.9555006180469716

Train f1:		0.2125402538359538
Test f1:		0.2177158146739896

Train ROC-AUC:		0.5266268946781988
Test ROC-AUC:		0.5299715821852115

Train Matrix:

[[34844    94]
 [16534  2244]]

Test Matrix:

[[11578    36]
 [ 5519   773]]


********************************************************************************




### Sink model, using all of the numeric columns that have all metrics above 0

In [34]:
len(used_num_cols)

32

In [35]:
sink_model = Pipeline(steps=[
    ('preprocessor',preprocessor),
    ('logreg', LogisticRegression(max_iter=5000, random_state=42))
])

sink_model.fit(x_train[used_num_cols], y_train)

train_preds = sink_model.predict(x_train[used_num_cols])
test_preds = sink_model.predict(x_test[used_num_cols])
train_probs = sink_model.predict_proba(x_train[used_num_cols])[:,1]
test_probs = sink_model.predict_proba(x_test[used_num_cols])[:,1]
    
train_accuracy = accuracy_score(y_train, train_preds)
test_accuracy = accuracy_score(y_test, test_preds)
train_recall = recall_score(y_train, train_preds, zero_division=0)
test_recall = recall_score(y_test, test_preds, zero_division=0)
train_precision = precision_score(y_train, train_preds, zero_division=0)
test_precision = precision_score(y_test, test_preds, zero_division=0)
train_f1 = f1_score(y_train, train_preds, zero_division=0)
test_f1 = f1_score(y_test, test_preds, zero_division=0)
train_rocauc = roc_auc_score(y_train, train_probs)
test_rocauc = roc_auc_score(y_test, test_probs)

print(F'\nTrain Accuracy:\t\t{train_accuracy}')
print(F'Test Accuracy:\t\t{test_accuracy}')
print(F"\nTrain Recall:\t\t{train_recall}")
print(F'Test Recall:\t\t{test_recall}')
print(F"\nTrain Precision:\t{train_precision}")
print(F'Test Precision:\t\t{test_precision}')
print(F"\nTrain f1:\t\t{train_f1}")
print(F'Test f1:\t\t{test_f1}')
print(F"\nTrain ROC-AUC:\t\t{train_rocauc}")
print(F'Test ROC-AUC:\t\t{test_rocauc}')


Train Accuracy:		0.926130017127113
Test Accuracy:		0.9261141516810008

Train Recall:		0.8259665566087975
Test Recall:		0.829942784488239

Train Precision:	0.9568167797655768
Test Precision:		0.9537899543378996

Train f1:		0.8865896878929918
Test f1:		0.8875669244497323

Train ROC-AUC:		0.9438095233391877
Test ROC-AUC:		0.9473591674383821


In [50]:
x_train.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 53716 entries, 9619 to 15795
Data columns (total 77 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   Catalog    53716 non-null  object 
 1   HIP        53716 non-null  int64  
 2   Proxy      7346 non-null   object 
 3   RAhms      53716 non-null  object 
 4   DEdms      53716 non-null  object 
 5   Vmag       53716 non-null  float64
 6   VarFlag    8682 non-null   float64
 7   r_Vmag     53716 non-null  object 
 8   RAdeg      53551 non-null  float64
 9   DEdeg      53551 non-null  float64
 10  AstroRef   9232 non-null   object 
 11  Plx        53551 non-null  float64
 12  pmRA       53551 non-null  float64
 13  pmDE       53551 non-null  float64
 14  e_RAdeg    53551 non-null  float64
 15  e_DEdeg    53551 non-null  float64
 16  e_Plx      53551 non-null  float64
 17  e_pmRA     53551 non-null  float64
 18  e_pmDE     53551 non-null  float64
 19  DE:RA      53551 non-null  float64
 20  Plx

In [48]:
used_num_cols = []
for col in numCols:
    dt = Pipeline(steps=[
        ('preprocessor',preprocessor),
        ('model',DecisionTreeClassifier(random_state=42))
    ])
    dt.fit(x_train[[col]], y_train)
    
    train_preds = dt.predict(x_train[[col]])
    test_preds = dt.predict(x_test[[col]])
    train_probs = dt.predict_proba(x_train[[col]])[:,1]
    test_probs = dt.predict_proba(x_test[[col]])[:,1]
    
    train_accuracy = accuracy_score(y_train, train_preds)
    test_accuracy = accuracy_score(y_test, test_preds)
    train_recall = recall_score(y_train, train_preds, zero_division=0)
    test_recall = recall_score(y_test, test_preds, zero_division=0)
    train_precision = precision_score(y_train, train_preds, zero_division=0)
    test_precision = precision_score(y_test, test_preds, zero_division=0)
    train_f1 = f1_score(y_train, train_preds, zero_division=0)
    test_f1 = f1_score(y_test, test_preds, zero_division=0)
    train_rocauc = roc_auc_score(y_train, train_probs)
    test_rocauc = roc_auc_score(y_test, test_probs)
    
    print(col)
    if (train_accuracy > .9) and (test_accuracy > 0) and (train_recall > .9) and (test_recall > 0) and (train_precision > .9) and (test_precision > 0) and (train_f1 > .9) and (test_f1 > 0) and (train_rocauc > .9) and (test_rocauc > 0):
        used_num_cols.append(col)
        print(f'{col} added to used column list')
    print(F'\nTrain Accuracy:\t\t{train_accuracy}')
    print(F'Test Accuracy:\t\t{test_accuracy}')
    print(F"\nTrain Recall:\t\t{train_recall}")
    print(F'Test Recall:\t\t{test_recall}')
    print(F"\nTrain Precision:\t{train_precision}")
    print(F'Test Precision:\t\t{test_precision}')
    print(F"\nTrain f1:\t\t{train_f1}")
    print(F'Test f1:\t\t{test_f1}')
    print(F"\nTrain ROC-AUC:\t\t{train_rocauc}")
    print(F'Test ROC-AUC:\t\t{test_rocauc}')
    print("\nTrain Matrix:\n")
    print(confusion_matrix(y_train, train_preds))
    print("\nTest Matrix:\n")
    print(confusion_matrix(y_test, test_preds))
    print('\n')
    print("****"*20)
    print('\n')

HIP
HIP added to used column list

Train Accuracy:		1.0
Test Accuracy:		0.552663911538032

Train Recall:		1.0
Test Recall:		0.35966306420851873

Train Precision:	1.0
Test Precision:		0.36242793081358105

Train f1:		1.0
Test f1:		0.36104020421186983

Train ROC-AUC:		1.0
Test ROC-AUC:		0.5084435520801506

Train Matrix:

[[34938     0]
 [    0 18778]]

Test Matrix:

[[7633 3981]
 [4029 2263]]


********************************************************************************


Vmag

Train Accuracy:		0.6706567875493336
Test Accuracy:		0.6564280129565508

Train Recall:		0.10757269144743849
Test Recall:		0.08868404322949777

Train Precision:	0.6840501185235354
Test Precision:		0.5717213114754098

Train f1:		0.1859095301642814
Test f1:		0.15354980737479362

Train ROC-AUC:		0.6073401476867797
Test ROC-AUC:		0.5652492809881229

Train Matrix:

[[34005   933]
 [16758  2020]]

Test Matrix:

[[11196   418]
 [ 5734   558]]


****************************************************************************

pmRA:RA

Train Accuracy:		0.6508861419316405
Test Accuracy:		0.6480509326482743

Train Recall:		0.002662690382362339
Test Recall:		0.0012714558169103624

Train Precision:	0.6666666666666666
Test Precision:		0.3076923076923077

Train f1:		0.005304195618734419
Test f1:		0.0025324469768914218

Train ROC-AUC:		0.5388425702091658
Test ROC-AUC:		0.512784123409818

Train Matrix:

[[34913    25]
 [18728    50]]

Test Matrix:

[[11596    18]
 [ 6284     8]]


********************************************************************************


pmRA:DE

Train Accuracy:		0.6508861419316405
Test Accuracy:		0.6481067798503295

Train Recall:		0.002183406113537118
Test Recall:		0.0007946598855689765

Train Precision:	0.7192982456140351
Test Precision:		0.2631578947368421

Train f1:		0.004353597026811787
Test f1:		0.001584534938995405

Train ROC-AUC:		0.5363080064638155
Test ROC-AUC:		0.5164439858245923

Train Matrix:

[[34922    16]
 [18737    41]]

Test Matrix:

[[11600    14]
 [ 6287     5]]


*******

V-I

Train Accuracy:		0.7150197334127634
Test Accuracy:		0.7111024237685692

Train Recall:		0.25998508893385874
Test Recall:		0.25762873490146215

Train Precision:	0.7756593581188433
Test Precision:		0.7635421573245408

Train f1:		0.38943841735800894
Test f1:		0.3852644087938205

Train ROC-AUC:		0.6878535746913322
Test ROC-AUC:		0.6770665686599827

Train Matrix:

[[33526  1412]
 [13896  4882]]

Test Matrix:

[[11112   502]
 [ 4671  1621]]


********************************************************************************


e_V-I

Train Accuracy:		0.6699307468910567
Test Accuracy:		0.6701664246621244

Train Recall:		0.13185642773458303
Test Recall:		0.13429752066115702

Train Precision:	0.6342213114754098
Test Precision:		0.6480061349693251

Train f1:		0.21832289921523676
Test f1:		0.2224855186940495

Train ROC-AUC:		0.5831557482398975
Test ROC-AUC:		0.578468476237822

Train Matrix:

[[33510  1428]
 [16302  2476]]

Test Matrix:

[[11155   459]
 [ 5447   845]]


**************************

e_dHp

Train Accuracy:		0.7583773922108868
Test Accuracy:		0.7553892549983245

Train Recall:		0.33454041964000425
Test Recall:		0.3350286077558805

Train Precision:	0.9286031042128603
Test Precision:		0.9149305555555556

Train f1:		0.4918764436440512
Test f1:		0.4904606793857609

Train ROC-AUC:		0.6634938292558121
Test ROC-AUC:		0.6621233500988734

Train Matrix:

[[34455   483]
 [12496  6282]]

Test Matrix:

[[11418   196]
 [ 4184  2108]]


********************************************************************************


HD

Train Accuracy:		0.9359781070816889
Test Accuracy:		0.5681335865073159

Train Recall:		0.8168601555011183
Test Recall:		0.29211697393515573

Train Precision:	1.0
Test Precision:		0.3591948407269885

Train f1:		0.8991998124102354
Test f1:		0.32220177053203614

Train ROC-AUC:		0.9861641301861928
Test ROC-AUC:		0.5183535711826411

Train Matrix:

[[34938     0]
 [ 3439 15339]]

Test Matrix:

[[8335 3279]
 [4454 1838]]


************************************************

In [49]:
len(used_num_cols)

4

In [47]:
sink_model = Pipeline(steps=[
    ('preprocessor',preprocessor),
    ('tree', DecisionTreeClassifier(random_state=42, max_depth = 8))
])

sink_model.fit(x_train[used_num_cols], y_train)

train_preds = sink_model.predict(x_train[used_num_cols])
test_preds = sink_model.predict(x_test[used_num_cols])
train_probs = sink_model.predict_proba(x_train[used_num_cols])[:,1]
test_probs = sink_model.predict_proba(x_test[used_num_cols])[:,1]
    
train_accuracy = accuracy_score(y_train, train_preds)
test_accuracy = accuracy_score(y_test, test_preds)
train_recall = recall_score(y_train, train_preds, zero_division=0)
test_recall = recall_score(y_test, test_preds, zero_division=0)
train_precision = precision_score(y_train, train_preds, zero_division=0)
test_precision = precision_score(y_test, test_preds, zero_division=0)
train_f1 = f1_score(y_train, train_preds, zero_division=0)
test_f1 = f1_score(y_test, test_preds, zero_division=0)
train_rocauc = roc_auc_score(y_train, train_probs)
test_rocauc = roc_auc_score(y_test, test_probs)

print(F'\nTrain Accuracy:\t\t{train_accuracy}')
print(F'Test Accuracy:\t\t{test_accuracy}')
print(F"\nTrain Recall:\t\t{train_recall}")
print(F'Test Recall:\t\t{test_recall}')
print(F"\nTrain Precision:\t{train_precision}")
print(F'Test Precision:\t\t{test_precision}')
print(F"\nTrain f1:\t\t{train_f1}")
print(F'Test f1:\t\t{test_f1}')
print(F"\nTrain ROC-AUC:\t\t{train_rocauc}")
print(F'Test ROC-AUC:\t\t{test_rocauc}')


Train Accuracy:		0.6668590364137315
Test Accuracy:		0.6571540265832682

Train Recall:		0.0859516455426563
Test Recall:		0.07803560076287348

Train Precision:	0.688272921108742
Test Precision:		0.5922798552472859

Train f1:		0.15281920181792358
Test f1:		0.13790198005898047

Train ROC-AUC:		0.6071116888824578
Test ROC-AUC:		0.5766171595519405
