# DataWig Example Demo
- Link to article: *Coming Soon*
- DataWig - Imputation for Tables. DataWig learns Machine Learning models to impute missing values in tables.
- Data source: https://archive.ics.uci.edu/ml/datasets/heart+disease
___

## (1) Initial Setup

In [1]:
# Import dependencies
import numpy as np
import pandas as pd
from sklearn.metrics import f1_score, classification_report, matthews_corrcoef, mean_squared_error

from datawig import SimpleImputer, Imputer
from datawig.utils import random_split
from datawig.column_encoders import *
from datawig.mxnet_input_symbols import *

In [2]:
# Read input data
df = pd.read_csv('../data/Heart.csv')
df.head()

Unnamed: 0,Age,Sex,ChestPain,RestBP,Chol,Fbs,RestECG,MaxHR,ExAng,Oldpeak,Slope,Ca,Thal,AHD
0,63,1,typical,145,233,1,2,150,0,2.3,3,0.0,fixed,No
1,67,1,asymptomatic,160,286,0,2,108,1,1.5,2,3.0,normal,Yes
2,67,1,asymptomatic,120,229,0,2,129,1,2.6,2,2.0,reversable,Yes
3,37,1,nonanginal,130,250,0,0,187,0,3.5,3,0.0,normal,No
4,41,0,nontypical,130,204,0,2,172,0,1.4,1,0.0,normal,No


#### Data Dictionary
- Age: The person’s age in years  
- Sex: The person’s sex (1 = male, 0 = female)  
- ChestPain: chest pain type  
    - Value 0: asymptomatic
    - Value 1: atypical angina
    - Value 2: non-anginal pain  
    - Value 3: typical angina  
- RestBP: The person’s resting blood pressure (mmHg on admission)  
- Chol: The person’s cholesterol measurement in mg/dl  
- Fbs: The person’s fasting blood sugar (> 120 mg/dl, 1 = true; 0 = false)  
- RestECG: resting electrocardiographic results  
    - Value 0: showing probable or definite left ventricular hypertrophy by Estes’ criteria  
    - Value 1: normal  
    - Value 2: having ST-T wave abnormality (T wave inversions and/or ST elevation or depression of > 0.05 mV)
- MaxHR: The person’s maximum heart rate achieved
- ExAng: Exercise induced angina (1 = yes; 0 = no)
- OldPeak: ST depression induced by exercise relative to rest (‘ST’ relates to positions on the ECG plot)
- Slope: the slope of the peak exercise ST segment 
    - 0: downsloping; 
    - 1: flat; 
    - 2: upsloping
- Ca: The number of major vessels (0–3)
- Thal: A blood disorder called thalassemia Value 0: NULL (dropped from the dataset previously
    - Value 1: fixed defect (no blood flow in some part of the heart)
    - Value 2: normal blood flow
    - Value 3: reversible defect (a blood flow is observed but it is not normal)
- AHD: Heart disease (1 = no, 0= yes) # Target variable

In [3]:
df.dtypes

Age            int64
Sex            int64
ChestPain     object
RestBP         int64
Chol           int64
Fbs            int64
RestECG        int64
MaxHR          int64
ExAng          int64
Oldpeak      float64
Slope          int64
Ca           float64
Thal          object
AHD           object
dtype: object

In [4]:
df.shape

(303, 14)

___
## (2) Data Processing
- Train test split and data masking (to simulate missing values)

In [5]:
# Perform train-test split (Default is 80/20 split)
df_train, df_test = random_split(df, split_ratios=[0.8, 0.2])
df_train.shape

(242, 14)

In [6]:
# Randomly hide a x% of cells in test dataframe
hide_proportion = 0.25 # 25% hidden
df_test_missing = df_test.mask(np.random.rand(*df_test.shape) > (1 - hide_proportion))

In [7]:
df_test_missing.sample(5)

Unnamed: 0,Age,Sex,ChestPain,RestBP,Chol,Fbs,RestECG,MaxHR,ExAng,Oldpeak,Slope,Ca,Thal,AHD
182,,,typical,,,0.0,,178.0,,,1.0,2.0,normal,No
216,46.0,,nontypical,105.0,204.0,0.0,0.0,172.0,,0.0,1.0,0.0,normal,No
177,56.0,1.0,asymptomatic,132.0,184.0,0.0,2.0,105.0,1.0,,2.0,1.0,,
290,,,nonanginal,,212.0,0.0,2.0,,0.0,0.8,2.0,0.0,reversable,Yes
50,,0.0,nontypical,105.0,198.0,0.0,0.0,168.0,0.0,0.0,,1.0,normal,No


___
## (3) Simple Imputer with Hyperparameter Optimization (HPO)
- Using SimpleImputer is the easiest way to deploy an imputation model on your dataset with DataWig. As the name suggests, the SimpleImputer is straightforward to call from a python script and uses default encoders and featurizers that usually yield good results on a variety of datasets.
- Objectives: 
    - Numerical imputation: Predict missing values in `MaxHR` column
    - Categorical imputation: Predict missing values in `ChestPain` column
- DataWig also enables hyperparameter optimization to find the best model on a particular dataset.

### (i) Numerical Imputation - Default HPO

In [8]:
# Define columns with useful info for to-be-imputed column
input_cols = ['Age', 'Sex', 'RestBP', 'Chol', 'Fbs', 'ExAng', 'RestECG']

# Define column to be imputed
output_col_num = 'MaxHR' 

In [9]:
# Initialize SimpleImputer model for numerical imputation
imputer_num = SimpleImputer(
            input_columns=input_cols,
            output_column=output_col_num,  # Column to be imputed
            output_path='../artifacts/imputer_model_num'  # Store model data and metrics
            )

In [10]:
# Fit and imputer model with default basic hyperparameter - random search
imputer_num.fit_hpo(train_df=df_train)

2022-08-22 17:21:55,919 [INFO]  
2022-08-22 17:21:55,948 [INFO]  Epoch[0] Batch [0-7]	Speed: 10183.00 samples/sec	cross-entropy=16.275757	MaxHR-accuracy=0.000000
2022-08-22 17:21:55,958 [INFO]  Epoch[0] Train-cross-entropy=16.985390
2022-08-22 17:21:55,959 [INFO]  Epoch[0] Train-MaxHR-accuracy=0.000000
2022-08-22 17:21:55,960 [INFO]  Epoch[0] Time cost=0.027
2022-08-22 17:21:55,997 [INFO]  Saved checkpoint to "../artifacts/imputer_model_num0\model-0000.params"
2022-08-22 17:21:56,001 [INFO]  Epoch[0] Validation-cross-entropy=10.823458
2022-08-22 17:21:56,002 [INFO]  Epoch[0] Validation-MaxHR-accuracy=0.000000
2022-08-22 17:21:56,015 [INFO]  Epoch[1] Batch [0-7]	Speed: 11200.81 samples/sec	cross-entropy=13.016701	MaxHR-accuracy=0.000000
2022-08-22 17:21:56,027 [INFO]  Epoch[1] Train-cross-entropy=13.192816
2022-08-22 17:21:56,028 [INFO]  Epoch[1] Train-MaxHR-accuracy=0.000000
2022-08-22 17:21:56,029 [INFO]  Epoch[1] Time cost=0.026
2022-08-22 17:21:56,058 [INFO]  Saved checkpoint to "..

<datawig.simple_imputer.SimpleImputer at 0x2adffc90fc8>

In [11]:
# Impute missing values and return original dataframe with predictions
predictions_num = imputer_num.predict(df_test_missing)
predictions_num.head()

Unnamed: 0,Age,Sex,ChestPain,RestBP,Chol,Fbs,RestECG,MaxHR,ExAng,Oldpeak,Slope,Ca,Thal,AHD,MaxHR_imputed
85,44.0,,nonanginal,140.0,235.0,0.0,2.0,,0.0,0.0,1.0,,,No,164.152615
134,43.0,0.0,nonanginal,122.0,213.0,0.0,0.0,165.0,0.0,0.2,2.0,,normal,No,167.516667
143,64.0,1.0,nonanginal,125.0,309.0,,,,1.0,1.8,2.0,0.0,reversable,Yes,128.50675
253,51.0,,nonanginal,120.0,295.0,0.0,2.0,157.0,,,,,normal,,152.829997
205,45.0,1.0,asymptomatic,142.0,309.0,0.0,,147.0,1.0,0.0,2.0,,reversable,,144.024412


In [12]:
# Evaluate performance (compare actual and predicted) with MSE
cols_num = pd.concat([df_test[[output_col_num]], 
                      predictions_num[[f'{output_col_num}_imputed']]], axis=1)
cols_num.head()

Unnamed: 0,MaxHR,MaxHR_imputed
85,180,164.152615
134,165,167.516667
143,131,128.50675
253,157,152.829997
205,147,144.024412


In [13]:
# Calculate MSE (test set)
mse_datawig = mean_squared_error(df_test[output_col_num],
                                 predictions_num[f'{output_col_num}_imputed'])
mse_datawig

349.8234049693482

### (ii) Numerical Imputation - Custom HPO

In [14]:
# Initialize SimpleImputer model for numerical imputation
imputer_num = SimpleImputer(
            input_columns=input_cols,
            output_column=output_col_num,  # Column to be imputed
            output_path='../artifacts/imputer_model_num',  # Store model data and metrics
)

In [15]:
# Fit and imputer model with custom hyperparameter - random search
imputer_num.fit_hpo(train_df=df_train,
                    learning_rate_candidates=[1e-2, 1e-3, 1e-4, 1e-5],
                    numeric_latent_dim_candidates=[10, 20, 50, 100],
                    numeric_hidden_layers_candidates=[0, 1, 2],
                    final_fc_hidden_units=[[100], [150]],
                   )

<datawig.simple_imputer.SimpleImputer at 0x2adffd4b388>

In [16]:
# Impute missing values and return original dataframe with predictions
predictions_num = imputer_num.predict(df_test_missing)

# Evaluate performance (compare actual and predicted) with MSE
cols_num = pd.concat([df_test[[output_col_num]], 
                      predictions_num[[f'{output_col_num}_imputed']]], axis=1)

# Calculate MSE (test set)
mse_datawig = mean_squared_error(df_test[output_col_num],
                                 predictions_num[f'{output_col_num}_imputed'])
mse_datawig

342.4216738580532

___
### (iii) Categorical Imputation

In [17]:
# Define column to be imputed
output_col_cat = 'ChestPain' 

In [18]:
# Initialize SimpleImputer model for categorical imputation
imputer_cat = SimpleImputer(
                input_columns=input_cols,
                output_column=output_col_cat,  # Column to be imputed
                output_path='../artifacts/imputer_model_cat'  # Store model data and metrics
                )

In [19]:
# Fit and imputer model with default basic hyperparameter (random search)
imputer_cat.fit_hpo(train_df=df_train)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


<datawig.simple_imputer.SimpleImputer at 0x2adffd4d348>

In [20]:
# Impute missing values and return original dataframe with predictions
predictions_cat = imputer_cat.predict(df_test_missing)
predictions_cat.head()

Unnamed: 0,Age,Sex,ChestPain,RestBP,Chol,Fbs,RestECG,MaxHR,ExAng,Oldpeak,Slope,Ca,Thal,AHD,ChestPain_imputed,ChestPain_imputed_proba
85,44.0,,nonanginal,140.0,235.0,0.0,2.0,,0.0,0.0,1.0,,,No,nonanginal,0.403463
134,43.0,0.0,nonanginal,122.0,213.0,0.0,0.0,165.0,0.0,0.2,2.0,,normal,No,nonanginal,0.568032
143,64.0,1.0,nonanginal,125.0,309.0,,,,1.0,1.8,2.0,0.0,reversable,Yes,asymptomatic,0.763901
253,51.0,,nonanginal,120.0,295.0,0.0,2.0,157.0,,,,,normal,,asymptomatic,0.421679
205,45.0,1.0,asymptomatic,142.0,309.0,0.0,,147.0,1.0,0.0,2.0,,reversable,,asymptomatic,0.732073


In [21]:
# Evaluate performance (compare actual and predicted) with MSE
cols_cat = pd.concat([df_test[[output_col_cat]], 
                      predictions_cat[[f'{output_col_cat}_imputed']]], axis=1)
cols_cat.head()

Unnamed: 0,ChestPain,ChestPain_imputed
85,nonanginal,nonanginal
134,nonanginal,nonanginal
143,nonanginal,asymptomatic
253,nonanginal,asymptomatic
205,asymptomatic,asymptomatic


In [22]:
# Calculate F1 score (test set)
f1_datawig = f1_score(df_test[output_col_cat],
                      predictions_cat[f'{output_col_cat}_imputed'],
                      average='macro')
f1_datawig

0.2865128020194735

In [23]:
# Calculate MCC - classification metric (test set)
mcc_datawig = matthews_corrcoef(df_test[output_col_cat],
                                predictions_cat[f'{output_col_cat}_imputed'])
mcc_datawig

0.22355364030295422

___
## (4) Imputer - Flexible Specifications
Imputer is the backbone of the SimpleImputer and is responsible for running the preprocessing code, creating the model, executing training, and making predictions. Using the Imputer enables more flexibility with specifying model parameters, such as using particular encoders and featurizers rather than the default ones that SimpleImputer uses.

In [24]:
input_cols

['Age', 'Sex', 'RestBP', 'Chol', 'Fbs', 'ExAng', 'RestECG']

In [25]:
data_encoder_cols = [NumericalEncoder('Age'),
                     CategoricalEncoder('Sex'),
                     NumericalEncoder('RestBP'),
                     NumericalEncoder('Chol'),
                     CategoricalEncoder('Fbs'),
                     CategoricalEncoder('ExAng'),
                     CategoricalEncoder('RestECG')]

# To-be-imputed column label
label_encoder_cols = [NumericalEncoder('MaxHR')]

data_featurizer_cols = [NumericalFeaturizer('Age'),
                        EmbeddingFeaturizer('Sex'),
                        NumericalFeaturizer('RestBP'),
                        NumericalFeaturizer('Chol'),
                        EmbeddingFeaturizer('Fbs'),
                        EmbeddingFeaturizer('ExAng'),
                        EmbeddingFeaturizer('RestECG')]

imputer = Imputer(
            data_featurizers = data_featurizer_cols,
            data_encoders = data_encoder_cols,
            label_encoders = label_encoder_cols,
            output_path = '../artifacts/imputer_model'
)

In [26]:
imputer.fit(train_df=df_train)

<datawig.imputer.Imputer at 0x2adfff6a208>

In [27]:
predictions, metrics = imputer.transform_and_compute_metrics(df_test_missing)

In [28]:
metrics

{'MaxHR': 22323.12896550879}

In [29]:
predictions

{'MaxHR': array([[161.67606726],
        [160.279462  ],
        [140.63259111],
        [149.94994551],
        [162.33983454],
        [147.51295859],
        [152.32582515],
        [158.42997912],
        [147.84389749],
        [145.20492449],
        [141.18047495],
        [155.36163496],
        [151.35035181],
        [147.00561099],
        [164.07373295],
        [151.97403232],
        [144.14041524],
        [146.14595012],
        [151.46443752],
        [150.86146928],
        [144.57072496],
        [140.86410713],
        [151.46350626],
        [147.01838804],
        [143.11294232],
        [148.50352057],
        [149.21930599],
        [146.66622991],
        [153.22922616],
        [144.15657572],
        [143.868475  ],
        [159.97056481],
        [159.80240895],
        [138.26811372],
        [143.46699868],
        [151.16638941],
        [143.85203239],
        [151.73689768],
        [149.81239587],
        [163.38838949],
        [155.11682712],
       

In [30]:
prob_dict_topk = imputer.predict_proba_top_k(df_test_missing, top_k=5)
prob_dict_topk

{'MaxHR': array([[161.67606726],
        [160.279462  ],
        [140.63259111],
        [149.94994551],
        [162.33983454],
        [147.51295859],
        [152.32582515],
        [158.42997912],
        [147.84389749],
        [145.20492449],
        [141.18047495],
        [155.36163496],
        [151.35035181],
        [147.00561099],
        [164.07373295],
        [151.97403232],
        [144.14041524],
        [146.14595012],
        [151.46443752],
        [150.86146928],
        [144.57072496],
        [140.86410713],
        [151.46350626],
        [147.01838804],
        [143.11294232],
        [148.50352057],
        [149.21930599],
        [146.66622991],
        [153.22922616],
        [144.15657572],
        [143.868475  ],
        [159.97056481],
        [159.80240895],
        [138.26811372],
        [143.46699868],
        [151.16638941],
        [143.85203239],
        [151.73689768],
        [149.81239587],
        [163.38838949],
        [155.11682712],
       

___
## (5) Label shift Detection and Correction

In [51]:
# Setup Simple Imputer
imputer_cat = SimpleImputer(
                input_columns=input_cols,
                output_column=output_col_cat,  # Column to be imputed
                output_path='../artifacts/imputer_model_cat'  # Store model data and metrics
                )

imputer_cat.fit(train_df=df_train)

<datawig.simple_imputer.SimpleImputer at 0x2ad90891248>

In [52]:
# Detect shift (obtain weights for labels)
weights = imputer_cat.check_for_label_shift(df_test)


	The estimated label marginals are [('asymptomatic', 4.214792323529591), ('nonanginal', -0.6248284561591907), ('nontypical', -1.5197777450465078), ('typical', -1.0701861223238924)]
	Marginals in the training data are [('asymptomatic', 0.0328715205041643), ('nonanginal', 0.00866391721107109), ('nontypical', 0.8944515865631408), ('typical', 0.06401297572162376)]
	Reweighing factors for empirical risk minimization{'asymptomatic': 128.22018144842565, 'nonanginal': 0, 'nontypical': 0, 'typical': 0}


In [53]:
# Fix shift with reweighing factors
imputer_cat.fit(train_df=df_train, class_weights=weights)

  allow_missing=allow_missing, force_init=force_init)
INFO:root:Epoch[0] Batch [0-7]	Speed: 27995.35 samples/sec	cross-entropy=18.939670	ChestPain-accuracy=0.390625
INFO:root:Epoch[0] Train-cross-entropy=11.272403
INFO:root:Epoch[0] Train-ChestPain-accuracy=0.486607
INFO:root:Epoch[0] Time cost=0.013
INFO:root:Saved checkpoint to "../artifacts/imputer_model_cat\model-0000.params"
INFO:root:Epoch[0] Validation-cross-entropy=0.182258
INFO:root:Epoch[0] Validation-ChestPain-accuracy=0.343750
INFO:root:Epoch[1] Batch [0-7]	Speed: 28008.71 samples/sec	cross-entropy=0.218769	ChestPain-accuracy=0.406250
INFO:root:Epoch[1] Train-cross-entropy=0.150407
INFO:root:Epoch[1] Train-ChestPain-accuracy=0.495536
INFO:root:Epoch[1] Time cost=0.012
INFO:root:Saved checkpoint to "../artifacts/imputer_model_cat\model-0001.params"
INFO:root:Epoch[1] Validation-cross-entropy=0.021090
INFO:root:Epoch[1] Validation-ChestPain-accuracy=0.343750
INFO:root:Epoch[2] Batch [0-7]	Speed: 18664.31 samples/sec	cross-ent

INFO:root:Epoch[17] Validation-ChestPain-accuracy=0.343750
INFO:root:Epoch[18] Batch [0-7]	Speed: 22403.76 samples/sec	cross-entropy=0.008683	ChestPain-accuracy=0.406250
INFO:root:Epoch[18] Train-cross-entropy=0.008766
INFO:root:Epoch[18] Train-ChestPain-accuracy=0.495536
INFO:root:Epoch[18] Time cost=0.012
INFO:root:Saved checkpoint to "../artifacts/imputer_model_cat\model-0018.params"
INFO:root:Epoch[18] Validation-cross-entropy=0.004190
INFO:root:Epoch[18] Validation-ChestPain-accuracy=0.343750
INFO:root:Epoch[19] Batch [0-7]	Speed: 18665.79 samples/sec	cross-entropy=0.008347	ChestPain-accuracy=0.406250
INFO:root:Epoch[19] Train-cross-entropy=0.008413
INFO:root:Epoch[19] Train-ChestPain-accuracy=0.495536
INFO:root:Epoch[19] Time cost=0.014
INFO:root:Saved checkpoint to "../artifacts/imputer_model_cat\model-0019.params"
INFO:root:Epoch[19] Validation-cross-entropy=0.004009
INFO:root:Epoch[19] Validation-ChestPain-accuracy=0.343750
INFO:root:Epoch[20] Batch [0-7]	Speed: 22403.76 sampl

INFO:root:Epoch[36] Train-ChestPain-accuracy=0.495536
INFO:root:Epoch[36] Time cost=0.013
INFO:root:Saved checkpoint to "../artifacts/imputer_model_cat\model-0036.params"
INFO:root:Epoch[36] Validation-cross-entropy=0.002110
INFO:root:Epoch[36] Validation-ChestPain-accuracy=0.343750
INFO:root:Epoch[37] Batch [0-7]	Speed: 18668.76 samples/sec	cross-entropy=0.004466	ChestPain-accuracy=0.406250
INFO:root:Epoch[37] Train-cross-entropy=0.004407
INFO:root:Epoch[37] Train-ChestPain-accuracy=0.495536
INFO:root:Epoch[37] Time cost=0.013
INFO:root:Saved checkpoint to "../artifacts/imputer_model_cat\model-0037.params"
INFO:root:Epoch[37] Validation-cross-entropy=0.002041
INFO:root:Epoch[37] Validation-ChestPain-accuracy=0.343750
INFO:root:Epoch[38] Batch [0-7]	Speed: 18665.05 samples/sec	cross-entropy=0.004330	ChestPain-accuracy=0.406250
INFO:root:Epoch[38] Train-cross-entropy=0.004270
INFO:root:Epoch[38] Train-ChestPain-accuracy=0.495536
INFO:root:Epoch[38] Time cost=0.014
INFO:root:Saved checkp

INFO:root:Epoch[54] Validation-ChestPain-accuracy=0.343750
INFO:root:Epoch[55] Batch [0-7]	Speed: 28008.71 samples/sec	cross-entropy=0.002704	ChestPain-accuracy=0.406250
INFO:root:Epoch[55] Train-cross-entropy=0.002642
INFO:root:Epoch[55] Train-ChestPain-accuracy=0.495536
INFO:root:Epoch[55] Time cost=0.009
INFO:root:Saved checkpoint to "../artifacts/imputer_model_cat\model-0055.params"
INFO:root:Epoch[55] Validation-cross-entropy=0.001205
INFO:root:Epoch[55] Validation-ChestPain-accuracy=0.343750
INFO:root:Epoch[56] Batch [0-7]	Speed: 22404.83 samples/sec	cross-entropy=0.002638	ChestPain-accuracy=0.406250
INFO:root:Epoch[56] Train-cross-entropy=0.002577
INFO:root:Epoch[56] Train-ChestPain-accuracy=0.495536
INFO:root:Epoch[56] Time cost=0.012
INFO:root:Saved checkpoint to "../artifacts/imputer_model_cat\model-0056.params"
INFO:root:Epoch[56] Validation-cross-entropy=0.001175
INFO:root:Epoch[56] Validation-ChestPain-accuracy=0.343750
INFO:root:Epoch[57] Batch [0-7]	Speed: 18666.54 sampl

INFO:root:Epoch[73] Train-ChestPain-accuracy=0.495536
INFO:root:Epoch[73] Time cost=0.011
INFO:root:Saved checkpoint to "../artifacts/imputer_model_cat\model-0073.params"
INFO:root:Epoch[73] Validation-cross-entropy=0.000785
INFO:root:Epoch[73] Validation-ChestPain-accuracy=0.343750
INFO:root:Epoch[74] Batch [0-7]	Speed: 18676.18 samples/sec	cross-entropy=0.001747	ChestPain-accuracy=0.406250
INFO:root:Epoch[74] Train-cross-entropy=0.001699
INFO:root:Epoch[74] Train-ChestPain-accuracy=0.495536
INFO:root:Epoch[74] Time cost=0.013
INFO:root:Saved checkpoint to "../artifacts/imputer_model_cat\model-0074.params"
INFO:root:Epoch[74] Validation-cross-entropy=0.000768
INFO:root:Epoch[74] Validation-ChestPain-accuracy=0.343750
INFO:root:Epoch[75] Batch [0-7]	Speed: 22400.56 samples/sec	cross-entropy=0.001712	ChestPain-accuracy=0.406250
INFO:root:Epoch[75] Train-cross-entropy=0.001664
INFO:root:Epoch[75] Train-ChestPain-accuracy=0.495536
INFO:root:Epoch[75] Time cost=0.012
INFO:root:Saved checkp

INFO:root:Epoch[91] Validation-ChestPain-accuracy=0.343750
INFO:root:Epoch[92] Batch [0-7]	Speed: 18666.54 samples/sec	cross-entropy=0.001227	ChestPain-accuracy=0.406250
INFO:root:Epoch[92] Train-cross-entropy=0.001189
INFO:root:Epoch[92] Train-ChestPain-accuracy=0.495536
INFO:root:Epoch[92] Time cost=0.014
INFO:root:Saved checkpoint to "../artifacts/imputer_model_cat\model-0092.params"
INFO:root:Epoch[92] Validation-cross-entropy=0.000536
INFO:root:Epoch[92] Validation-ChestPain-accuracy=0.343750
INFO:root:Epoch[93] Batch [0-7]	Speed: 22401.62 samples/sec	cross-entropy=0.001204	ChestPain-accuracy=0.406250
INFO:root:Epoch[93] Train-cross-entropy=0.001166
INFO:root:Epoch[93] Train-ChestPain-accuracy=0.495536
INFO:root:Epoch[93] Time cost=0.013
INFO:root:Saved checkpoint to "../artifacts/imputer_model_cat\model-0093.params"
INFO:root:Epoch[93] Validation-cross-entropy=0.000525
INFO:root:Epoch[93] Validation-ChestPain-accuracy=0.343750
INFO:root:Epoch[94] Batch [0-7]	Speed: 18667.28 sampl

<datawig.simple_imputer.SimpleImputer at 0x2ad90891248>