# Reto NDS
##### Equipo 31 - Team Cheems
---

<img src="https://s3.amazonaws.com/cdn.wp.m4ecmx/wp-content/uploads/2018/10/18141821/RoboDatosFraude750.jpg" width="700px" height="400px" style="float:left"/>


## Problemática
---
El comercio electrónico ha crecido exponencialmente en los últimos años y la pandemia está ayudando a que más personas prueben este servicio. Es por esto que el fraude en el e-commerce está aumentando y, por lo tanto, la seguridad que el usuario tiene para seguir consumiendo en este tipo de servicio disminuye.
<br>
En un intento para detectar este tipo de fraude se están implementando distintas soluciones como la gráfica, modelos de aprendizaje automático, aprendizaje profundo, entre otros. Nuestro equipo considera que la solución a este problema es de valor ya que muchas empresas destinarán parte de sus ingresos a combatir este tipo de ataques mientras el comercio electrónico siga expandiéndose.

## Colección de datos
---
El dataset se obtuvo de kaggle y se encuentra [aquí](https://www.kaggle.com/kartik2112/fraud-detection) si se desea conocer más sobre él.

## EDA
---

In [None]:
# se cargan las librerías que se ocuparán

# algebra lineal
import numpy as np
# procesamiento de datos
import pandas as pd
# visualización
import matplotlib.pyplot as plt

import warnings
warnings.filterwarnings('ignore')

import plotly.express as px

In [None]:
# Se lee el archivo csv de entrenamiento utilizando los paths de drive.
df = pd.read_csv('/content/drive/MyDrive/hack/fraudTrain.csv', index_col=0, 
                 parse_dates=['trans_date_trans_time','dob'], 
                 infer_datetime_format=True,)
df.head(3)

Unnamed: 0,trans_date_trans_time,cc_num,merchant,category,amt,first,last,gender,street,city,state,zip,lat,long,city_pop,job,dob,trans_num,unix_time,merch_lat,merch_long,is_fraud
0,2019-01-01 00:00:18,2703186189652095,"fraud_Rippin, Kub and Mann",misc_net,4.97,Jennifer,Banks,F,561 Perry Cove,Moravian Falls,NC,28654,36.0788,-81.1781,3495,"Psychologist, counselling",1988-03-09,0b242abb623afc578575680df30655b9,1325376018,36.011293,-82.048315,0
1,2019-01-01 00:00:44,630423337322,"fraud_Heller, Gutmann and Zieme",grocery_pos,107.23,Stephanie,Gill,F,43039 Riley Greens Suite 393,Orient,WA,99160,48.8878,-118.2105,149,Special educational needs teacher,1978-06-21,1f76529f8574734946361c461b024d99,1325376044,49.159047,-118.186462,0
2,2019-01-01 00:00:51,38859492057661,fraud_Lind-Buckridge,entertainment,220.11,Edward,Sanchez,M,594 White Dale Suite 530,Malad City,ID,83252,42.1808,-112.262,4154,Nature conservation officer,1962-01-19,a1a22d70485983eac12b5b88dad1cf95,1325376051,43.150704,-112.154481,0


In [None]:
df['is_fraud'].value_counts().values

array([1289169,    7506])

### Diccionario de datos

| Variable | Descripción |
|:---:|:---:|
|transdatetrans_time|Día y tiempo de la transacción|
|cc_num|Número de la tarjeta de crédito del cliente|
|merchant|Nombre del vendedor|
|category|Giro del vendedor|
|amt|Monto de la transacción|
|first|Nombre del dueño de la tarjeta|
|last|Apellido del dueño de la tarjeta|
|gender|Género del dueño de la tarjeta|
|street|Calle del dueño de la tarjeta|
|city|Ciudad del dueño de la tarjeta|
|state|Estado del dueño de la tarjeta|
|zip|Código postal del dueño de la tarjeta|
|lat|Latitud del dueño de la tarjeta|
|long|Longitud del dueño de la tarjeta|
|city_pop|Número de habitantes de la ciudad donde vive el dueño de la tarjeta|
|job|Trabajo del dueño de la tarjeta|
|dob|Fecha de nacimiento del dueño de la tarjeta|
|trans_num|Numero de transaccion|
|unix_time|Cantidad de segundos transcurridos desde la medianoche UTC del 1 de enero de 1970|
|merch_lat|Latitud del comercio|
|merch_long|Longitud del comercio|
|is_fraud|Determina si la transacción fue fraudulenta|





In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1296675 entries, 0 to 1296674
Data columns (total 22 columns):
 #   Column                 Non-Null Count    Dtype         
---  ------                 --------------    -----         
 0   trans_date_trans_time  1296675 non-null  datetime64[ns]
 1   cc_num                 1296675 non-null  int64         
 2   merchant               1296675 non-null  object        
 3   category               1296675 non-null  object        
 4   amt                    1296675 non-null  float64       
 5   first                  1296675 non-null  object        
 6   last                   1296675 non-null  object        
 7   gender                 1296675 non-null  object        
 8   street                 1296675 non-null  object        
 9   city                   1296675 non-null  object        
 10  state                  1296675 non-null  object        
 11  zip                    1296675 non-null  int64         
 12  lat                    12966

Se puede observar que no hay ningún registro con un valor nulo. Esto se debe a la naturaleza del dataset. Como explicamos anteriormente en nuestra investigación, esto fue creado usando una herramienta de simulación de transacciones con tarjeta de crédito, si se desea conocer más pueden visitar el [repositorio del creador](https://github.com/namebrandon/Sparkov_Data_Generation).
Se tienen alrededor de un millón doscientos mil registros y 22 campos.
Hay 5 variables tipo flotante, 5 tipo entero, 10 tipo string y 2 tipo datetime. 

In [None]:
df.describe(include='all')

Unnamed: 0,trans_date_trans_time,cc_num,merchant,category,amt,first,last,gender,street,city,state,zip,lat,long,city_pop,job,dob,trans_num,unix_time,merch_lat,merch_long,is_fraud
count,1296675,1296675.0,1296675,1296675,1296675.0,1296675,1296675,1296675,1296675,1296675,1296675,1296675.0,1296675.0,1296675.0,1296675.0,1296675,1296675,1296675,1296675.0,1296675.0,1296675.0,1296675.0
unique,1274791,,693,14,,352,481,2,983,894,51,,,,,494,968,1296675,,,,
top,2019-04-22 16:02:01,,fraud_Kilback LLC,gas_transport,,Christopher,Smith,F,864 Reynolds Plains,Birmingham,TX,,,,,Film/video editor,1977-03-23 00:00:00,e9054e61f635001c4c99861959fd856b,,,,
freq,4,,4403,131659,,26669,28794,709863,3123,5617,94876,,,,,9779,5636,1,,,,
first,2019-01-01 00:00:18,,,,,,,,,,,,,,,,1924-10-30 00:00:00,,,,,
last,2020-06-21 12:13:37,,,,,,,,,,,,,,,,2005-01-29 00:00:00,,,,,
mean,,4.17192e+17,,,70.35104,,,,,,,48800.67,38.53762,-90.22634,88824.44,,,,1349244000.0,38.53734,-90.22646,0.005788652
std,,1.308806e+18,,,160.316,,,,,,,26893.22,5.075808,13.75908,301956.4,,,,12841280.0,5.109788,13.77109,0.07586269
min,,60416210000.0,,,1.0,,,,,,,1257.0,20.0271,-165.6723,23.0,,,,1325376000.0,19.02779,-166.6712,0.0
25%,,180042900000000.0,,,9.65,,,,,,,26237.0,34.6205,-96.798,743.0,,,,1338751000.0,34.73357,-96.89728,0.0


En la tabla anterior hay mucha información que puede ser considerada. Para los datos categóricos, la información relevante es el número de valores únicos, la moda de esa columna y la frecuencia con la que se presentó.

En el caso de las fechas no se puede obtener información útil de esta forma. De igual forma no hay mucho que ver en las variables numéricas porque se trata en su mayoría de valores únicos.

### Visualización de datos
---
Para tener una experencia interactiva con las visualizaciones creamos este [dashboard]().
<br>
<br>

**Algoritmos de detección de anomalías.**

* Isolation Forest
* Local Outlier Factor
* One-Class SVM
* DBSCAN
* Covariance Elliptic Envelope
* BRM
* Autoencoder
* Red de prediccion


https://ff12.fastforwardlabs.com/#how-to-decide-on-a-modeling-approach%3F

https://machinelearningmastery.com/model-based-outlier-detection-and-removal-in-python/

https://scikit-learn.org/stable/modules/outlier_detection.html

https://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html



# DatasetBanco



**Bibliotecas**

In [None]:
import pandas as pd
import numpy as np
import sys
from progressbar import progressbar
from datetime import date

from sklearn import metrics
from sklearn.ensemble import IsolationForest
from sklearn.covariance import  EllipticEnvelope
from sklearn.decomposition import PCA
from sklearn.cluster import DBSCAN
from sklearn.preprocessing import OneHotEncoder, LabelEncoder
from sklearn.model_selection import train_test_split

from tensorflow.keras.layers import Dense
from tensorflow.keras.models import Sequential
import tensorflow.keras.backend as K

path = '/content/drive/MyDrive/Cuarto Semestre/HackMX/'

## Tratamiento de Datos

In [None]:
path = '/content/drive/MyDrive/Cuarto Semestre/HackMX/'
features = ["merchant", "category", "amt", "gender", "lat", "long", "city_pop", "dob", "merch_lat", "merch_long", "is_fraud"]
all_data = pd.read_csv(path+"fraudData.csv")[features]

In [None]:
all_data.shape

(1852394, 11)

In [None]:
all_data.head()

Unnamed: 0,merchant,category,amt,gender,lat,long,city_pop,dob,merch_lat,merch_long,is_fraud
0,"fraud_Rippin, Kub and Mann",misc_net,4.97,F,36.0788,-81.1781,3495,1988-03-09,36.011293,-82.048315,0
1,"fraud_Heller, Gutmann and Zieme",grocery_pos,107.23,F,48.8878,-118.2105,149,1978-06-21,49.159047,-118.186462,0
2,fraud_Lind-Buckridge,entertainment,220.11,M,42.1808,-112.262,4154,1962-01-19,43.150704,-112.154481,0
3,"fraud_Kutch, Hermiston and Farrell",gas_transport,45.0,M,46.2306,-112.1138,1939,1967-01-12,47.034331,-112.561071,0
4,fraud_Keeling-Crist,misc_pos,41.96,M,38.4207,-79.4629,99,1986-03-28,38.674999,-78.632459,0


In [None]:
all_data_new = pd.DataFrame(None)
merchEnc = LabelEncoder()
all_data.merchant = merchEnc.fit_transform(all_data.merchant)
all_data_new["merchant"] = (all_data.merchant-all_data.merchant.min())/(all_data.merchant.max()-all_data.merchant.min())

all_data_new = pd.concat([all_data_new,pd.get_dummies(all_data.category)], axis =1)
all_data_new["amt"] = (all_data.amt-all_data.amt.min())/(all_data.amt.max()-all_data.amt.min())
all_data_new["gender"] = all_data.gender.apply(lambda x: 1 if x =="M" else 0)
all_data_new = pd.concat([all_data_new, all_data[["lat","long"]]], axis =1)
all_data_new["city_pop"] = (all_data.city_pop-all_data.city_pop.min())/(all_data.city_pop.max()-all_data.city_pop.min())


def calculate_age(born):
    today = date.today()
    return today.year - born.year - ((today.month, today.day) < (born.month, born.day))

all_data["age"] = pd.to_datetime(all_data.dob).apply(calculate_age)

all_data_new["age"] = (all_data.age-all_data.age.min())/(all_data.age.max()-all_data.age.min())
all_data_new = pd.concat([all_data_new, all_data[["merch_lat","merch_long", "is_fraud"]]], axis =1)

In [None]:
all_data_new

Unnamed: 0,merchant,entertainment,food_dining,gas_transport,grocery_net,grocery_pos,health_fitness,home,kids_pets,misc_net,misc_pos,personal_care,shopping_net,shopping_pos,travel,amt,gender,lat,long,city_pop,age,merch_lat,merch_long,is_fraud
0,0.742775,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0.000137,0,36.0788,-81.1781,0.001194,0.2125,36.011293,-82.048315,0
1,0.348266,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0.003670,0,48.8878,-118.2105,0.000043,0.3250,49.159047,-118.186462,0
2,0.563584,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0.007569,1,42.1808,-112.2620,0.001421,0.5375,43.150704,-112.154481,0
3,0.520231,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0.001520,1,46.2306,-112.1138,0.000659,0.4750,47.034331,-112.561071,0
4,0.429191,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0.001415,1,38.4207,-79.4629,0.000026,0.2375,38.674999,-78.632459,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1852389,0.732659,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0.001477,1,40.4931,-91.8912,0.000171,0.4875,39.946837,-91.333331,0
1852390,0.381503,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0.003829,1,29.0393,-95.4401,0.009879,0.0625,29.661049,-96.186633,0
1852391,0.716763,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0.002967,0,46.1966,-118.9017,0.001260,0.2875,46.658340,-119.715054,0
1852392,0.108382,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0.000241,1,44.6255,-116.4493,0.000036,0.4875,44.470525,-117.080888,0


In [None]:
X_train, X_test, y_train, y_test = train_test_split(all_data_new.drop("is_fraud", axis = 1), all_data_new.is_fraud, test_size = 0.15, stratify = all_data_new.is_fraud)
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size = 0.3, stratify = y_train)

In [None]:
print(X_train.shape)
print(X_val.shape)
print(X_test.shape)

(1102173, 23)
(472361, 23)
(277860, 23)


In [None]:
pd.concat([X_train, y_train], axis = 1).to_csv(path+"fraudTrain_treated.csv", index = False)
pd.concat([X_test, y_test], axis = 1).to_csv(path+"fraudTest_treated.csv", index = False)
pd.concat([X_val, y_val], axis = 1).to_csv(path+"fraudVal_treated.csv", index = False)

##Benchmark

In [None]:
###########################################################
#                 LOAD DATA
train = pd.read_csv(path+"fraudTrain_treated.csv")
cols = train.columns
smote = SMOTE(sampling_strategy= "minority")
re_sample = smote.fit_resample(train.drop("is_fraud", axis = 1), train.is_fraud)
train_sm = pd.DataFrame(re_sample[0])
train_sm["is_fraud"] = re_sample[1]
train_sm.columns = cols
X_train = train_sm.drop("is_fraud", axis = 1)
y_train = train_sm.is_fraud.astype("float32")

test = pd.read_csv(path+"fraudTest_treated.csv")
X_test = test.drop("is_fraud", axis = 1)
y_test = test.is_fraud.astype("float32")

val = pd.read_csv(path+"fraudVal_treated.csv")
X_val = val.drop("is_fraud", axis = 1)
y_val = val.is_fraud.astype("float32")



"""
X_train, X_test, y_train, y_test = train_test_split(X_train, y_train, test_size = 0.20, stratify = y_train)
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size = 0.3, stratify = y_train)
"""

'\nX_train, X_test, y_train, y_test = train_test_split(X_train, y_train, test_size = 0.20, stratify = y_train)\nX_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size = 0.3, stratify = y_train)\n'

In [None]:
###########################################################
#                 FUNCTIONS

def AE_model_train_test(model,metrics_dict, X_train, y_train, X_val, y_val, X_test, y_test):

  model.fit(X_train, X_train, epochs = 10, batch_size =256)

  preds_AE = AE.predict(X_val, batch_size = 256)

  errors = abs(X_val-preds_AE).mean(axis = 1)
  thres = np.linspace(errors.min(),errors.max(), 100)
  res = []
  for th in thres:
    temp_preds = (errors>th).astype("int")
    res.append(metrics.f1_score(y_val, temp_preds))

  selected_th = thres[np.argmax(res)]
  best_acc = max(res)

  test_preds_AE = AE.predict(X_test, batch_size = 256)
  test_preds_AE = abs(X_test-test_preds_AE).mean(axis = 1)
  test_preds_AE = (test_preds_AE>selected_th).astype("int")


  print(f"\nAE Results:\nTH: {selected_th}, best_roc_auc_score: {max(res)}, n_components: {n_components+1}\nTest:")
  for metric in metrics_dict:
    
    temp_res = metrics_dict[metric](y_test,test_preds_AE)
    print(f"{metric}: {temp_res}")

  
  report = metrics.classification_report(y_test,test_preds_AE )
  print(report)

  return report, test_preds_AE, selected_th

def regular_model_train_test(model, metrics_dict, X_train,y_train, X_test, y_test, name):

  if name == "DBSCAN":
    pred = (model.fit_predict(X_test)<0).astype("int")
  elif name == "NN_clf":
    model.fit(X_train, y_train, epochs = 10, batch_size =256)
    pred_val = model.predict(X_val, batch_size = 256).flatten()
    thres = np.linspace(min(pred_val),max(pred_val), 100)
    res = []
    for th in thres:
      res.append(metrics.f1_score(y_val, pred_val>th))
    selected_th = thres[np.argmax(res)]

    pred = (model.predict(X_test, batch_size = 256).flatten()>selected_th).astype("int")

  else:
    model.fit(X_train)
    pred = (model.predict(X_test)<0).astype("int")


  print(f"\n{name} results\nTest:")
  for metric in metrics_dict:
    temp_res = metrics_dict[metric](y_test, pred)
    print(f"{metric}: {temp_res}")
  report = metrics.classification_report(y_test,pred )
  print(report)

  return report, pred


In [None]:
pca = PCA(n_components = 10)
X_train_pca = pca.fit(X_train)
acc_var = 0
for n_components,comp in enumerate(pca.explained_variance_ratio_):
  acc_var += comp
  if acc_var>0.90:
    break

X_train_pca = pca.transform(X_train)[:, :n_components+1]
X_test_pca = pca.transform(X_test)[:,:n_components+1]
X_val_pca = pca.transform(X_val)[:,:n_components+1]

In [None]:
seed= 1416

AE = Sequential()
AE.add(Dense(128, input_dim = n_components+1, activation="relu", kernel_initializer = "normal"))
AE.add(Dense(64, activation = "relu", kernel_initializer = "normal"))
AE.add(Dense(32, activation = "relu", kernel_initializer = "normal"))
AE.add(Dense(64, activation = "relu", kernel_initializer = "normal"))
AE.add(Dense(128, activation = "relu", kernel_initializer = "normal"))
AE.add(Dense(n_components+1, activation = "linear", kernel_initializer = "normal"))

AE.compile(loss= "mae", optimizer = "adam")

m = X_train.shape[0]

def log_loss_penalized(y_true, y_pred):
    loss = -(1/m)*K.sum( 10*y_true * K.log(K.abs(y_pred+1*10**-8))+ (1-y_true)*K.log(K.abs(1-y_pred+ 1*10**-8)))
    return loss

NN_clf = Sequential()

NN_clf.add(Dense(128, input_dim = X_train.shape[1], activation="relu", kernel_initializer = "normal"))
NN_clf.add(Dense(64, activation = "relu", kernel_initializer = "normal"))
NN_clf.add(Dense(32, activation = "relu", kernel_initializer = "normal"))
NN_clf.add(Dense(64, activation = "relu", kernel_initializer = "normal"))
NN_clf.add(Dense(128, activation = "relu", kernel_initializer = "normal"))
NN_clf.add(Dense(1, activation = "sigmoid", kernel_initializer = "normal"))

NN_clf.compile(loss= log_loss_penalized, optimizer = "adam")


models = {"NN_clf": NN_clf,
          "Autoencoder":AE,
          "IsolationForest":IsolationForest(random_state = seed),
          "DBSCAN":DBSCAN(),
          "CovElipticEnv":EllipticEnvelope(random_state =seed)}

metrics_dict = {"accuracy": metrics.accuracy_score,
           "ROC AUC": metrics.roc_auc_score,
           "precision": metrics.precision_score,
           "recall": metrics.recall_score, 
           "f1 score": metrics.f1_score}


results = {"NN_clf": None,
          "Autoencoder":None,
          "IsolationForest":None,
          "DBSCAN": None,
          "CovElipticEnv":None}

reg_pred = []
for model in progressbar(models):
  print()
  if model == "BRMiner":
    temp_res,pred_BR,_ = BRMiner_model_train_test(models[model], metrics_dict,  X_train, y_train, X_val, y_val, X_test, y_test)
  elif model == "Autoencoder":
    temp_res,pred_AE,_ = AE_model_train_test(models[model], metrics_dict, X_train_pca, y_train, X_val_pca, y_val, X_test_pca, y_test)
  else:
    temp_res,pred= regular_model_train_test(models[model], metrics_dict, X_train, y_train,  X_test, y_test, model)
    reg_pred.append(pred)
  
  results[model] = temp_res


                                                                               N/A% (0 of 5) |                          | Elapsed Time: 0:00:00 ETA:  --:--:--


Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10

NN_clf results
Test:
accuracy: 0.9927769380263441
ROC AUC: 0.6006633404071414
precision: 0.2571676802780191
recall: 0.20441988950276244
f1 score: 0.2277799153520585


                                                                                20% (1 of 5) |#####                     | Elapsed Time: 0:01:52 ETA:   0:07:30

              precision    recall  f1-score   support

         0.0       1.00      1.00      1.00    276412
         1.0       0.26      0.20      0.23      1448

    accuracy                           0.99    277860
   macro avg       0.63      0.60      0.61    277860
weighted avg       0.99      0.99      0.99    277860


Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10

AE Results:
TH: 0.0910762089450937, best_roc_auc_score: 0.011137803013758463, n_components: 2
Test:
accuracy: 0.8285071618800834
ROC AUC: 0.5026409102418417
precision: 0.005374156942511509
recall: 0.17334254143646408
f1 score: 0.010425103316511952


                                                                                40% (2 of 5) |##########                | Elapsed Time: 0:03:51 ETA:   0:05:57

              precision    recall  f1-score   support

         0.0       0.99      0.83      0.91    276412
         1.0       0.01      0.17      0.01      1448

    accuracy                           0.83    277860
   macro avg       0.50      0.50      0.46    277860
weighted avg       0.99      0.83      0.90    277860



IsolationForest results
Test:
accuracy: 0.9869502627222342
ROC AUC: 0.5201048770739618
precision: 0.030198446937014668
recall: 0.04834254143646409
f1 score: 0.03717472118959108


                                                                                60% (3 of 5) |###############           | Elapsed Time: 0:04:10 ETA:   0:00:38

              precision    recall  f1-score   support

         0.0       0.99      0.99      0.99    276412
         1.0       0.03      0.05      0.04      1448

    accuracy                           0.99    277860
   macro avg       0.51      0.52      0.52    277860
weighted avg       0.99      0.99      0.99    277860



DBSCAN results
Test:
accuracy: 0.5824803858058015
ROC AUC: 0.5747752444245491
precision: 0.007065039670929212
recall: 0.5669889502762431
f1 score: 0.013956176585581452


                                                                                80% (4 of 5) |####################      | Elapsed Time: 0:04:41 ETA:   0:00:30

              precision    recall  f1-score   support

         0.0       1.00      0.58      0.74    276412
         1.0       0.01      0.57      0.01      1448

    accuracy                           0.58    277860
   macro avg       0.50      0.57      0.37    277860
weighted avg       0.99      0.58      0.73    277860



CovElipticEnv results
Test:
accuracy: 0.9013280069099546
ROC AUC: 0.7216379017213714
precision: 0.028402280899284496
recall: 0.5400552486187845
f1 score: 0.053966391773920847


                                                                               100% (5 of 5) |##########################| Elapsed Time: 0:08:06 Time:  0:08:06


              precision    recall  f1-score   support

         0.0       1.00      0.90      0.95    276412
         1.0       0.03      0.54      0.05      1448

    accuracy                           0.90    277860
   macro avg       0.51      0.72      0.50    277860
weighted avg       0.99      0.90      0.94    277860



In [None]:
print("ALL REPORTS")
for model in results:
  print(f"Model: {model}")
  print(results[model])

ALL REPORTS
Model: NN_clf
              precision    recall  f1-score   support

         0.0       1.00      1.00      1.00    276412
         1.0       0.26      0.20      0.23      1448

    accuracy                           0.99    277860
   macro avg       0.63      0.60      0.61    277860
weighted avg       0.99      0.99      0.99    277860

Model: Autoencoder
              precision    recall  f1-score   support

         0.0       0.99      0.83      0.91    276412
         1.0       0.01      0.17      0.01      1448

    accuracy                           0.83    277860
   macro avg       0.50      0.50      0.46    277860
weighted avg       0.99      0.83      0.90    277860

Model: IsolationForest
              precision    recall  f1-score   support

         0.0       0.99      0.99      0.99    276412
         1.0       0.03      0.05      0.04      1448

    accuracy                           0.99    277860
   macro avg       0.51      0.52      0.52    277860
weight