# Deep learning Project (continued)

## Predicting County Suitability for Various CO2 Removal Technologies in Virginia

**Motivation:**

Carbon removal is a broad set of approaches, some natural and some engineered, used to remove CO2 from the atmosphere. The approaches can be as simple as reforestation, or as complex as growing bioenergy crops for producing bioenergy with carbon capture. We will develop a various Neural Networks to predict how suitable land is for various CO2 removal techniques in VA at the county scale. This will help researchers with the goal of reducing carbon emissions by providing information on how to utilize available and suitable land for reforestation plans,  enhanced weathering (EW), and biochar.


**Technical plan:**

We will input GGR technique specific predictor variables into an unsupervised algorithm, KMeans, to assign a ‘suitability level’ to each county in VA. Next, these labeled counties will be fed into a feed-forward neural network, a simple ANN, and a deep and cross neutal network. The neural nets will output the ‘suitability level’ for implementing the CO2 removal techniques of reforestation, EW, and biochar respectively. We will create our own models and utilize pre-trained moels like the MLPClassifier from sklearn. 


# Code for Hierarchical Clustering to obtain labels (suitability category)

In [1]:
import pandas as pd
import numpy as np
import sklearn
from sklearn import metrics
from sklearn.cluster import KMeans
from sklearn.cluster import AgglomerativeClustering
import scipy.cluster.hierarchy as sch
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
# To get model performance 
from sklearn.metrics import classification_report
from sklearn.metrics import accuracy_score
# loading everything that might be helpful
import torch
from torch.utils.data import DataLoader, random_split
from torch import Generator
from torchvision.transforms import ToTensor
from torchvision.datasets import ImageFolder
from torch import manual_seed as torch_manual_seed
from torch.cuda import max_memory_allocated, set_device, manual_seed_all
from torch.backends import cudnn
from sklearn.model_selection import train_test_split
from torch.utils.data import TensorDataset, DataLoader

from deeptables.models import deeptable, deepnets
from tensorflow.keras.optimizers import RMSprop

# Set seed for whole notebook
import random
random.seed(10)

  from .autonotebook import tqdm as notebook_tqdm
2024-04-17 16:11:01.935986: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-04-17 16:11:02.521066: I tensorflow/core/util/port.cc:104] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-04-17 16:11:06.588039: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /apps/software/standard/core/jupyterl

In [2]:
#Loading locational dataset
df = pd.read_csv('../final-dataset/final-scaled-df.csv', index_col='FIPS')
df.head(20)

TEST_RATIO = 0.2
BATCH_SIZE = 64

size_all = len(df)

# training/val/test dataset
size_train = size_all - 6
size_val = int(size_train * TEST_RATIO)
size_train_sub = size_train - size_val

# split data into train and test
# Test = 6 counties of interst: Accomack, Fauquier, Greensville, Hanover, Rockingham, Wise
dataset_val = df.loc[[51001, 51061, 51081, 51085, 51165, 51195]]

# Train = all except 6 counties of interest
dataset_train = df.drop([51001, 51061, 51081, 51085, 51165, 51195])

cols = ['Income', 'DSCI', 'PQ1', 'PQ2', 'PQ3', 'PQ4', 'TQ1', 'TQ2', 'TQ3', 'TQ4', 'Forest', 
        'Agriculture', 'Biomass', 'Power', 'Water Availability', 'Watershed',
       'Log_Bio','Log_Water','Log_Power']
cols_rf = ['PQ1', 'PQ2', 'PQ3', 'PQ4', 'TQ1', 'TQ2', 'TQ3', 'TQ4', 'Forest','Water Availability', 'Log_Water']
cols_ew = ['PQ1', 'PQ2', 'PQ3', 'PQ4', 'TQ1', 'TQ2', 'TQ3', 'TQ4', 'Income', 'Power', 'Log_Power', 'Agriculture']
cols_bio = ['PQ1', 'PQ2', 'PQ3', 'PQ4', 'TQ1', 'TQ2', 'TQ3', 'TQ4', 'Income', 'Biomass', 'Log_Bio','Agriculture']

In [3]:
# Using the elbow method to determine the k value to be applied
k_rng = range(1, 10)
sse = []
for k in k_rng:
    km = KMeans(n_clusters=k, n_init=10)
    km.fit(df[['PQ1', 'PQ2', 'PQ3', 'PQ4', 'TQ1', 'TQ2', 'TQ3', 'TQ4', 'Forest','Log_Water']])
    sse.append(km.inertia_)

# create clusters using k value = 4
hc = AgglomerativeClustering(n_clusters=4, affinity = 'euclidean', linkage = 'ward')

# Determining mean cluster characterisitics REFORESTATION
y_hc = hc.fit_predict(dataset_train[cols_rf])
dataset_train['Reforest'] = y_hc
dataset_train.sort_values("Reforest", inplace = True, ascending=True)

#average input vars by cluster
df_rfcluster = dataset_train.groupby('Reforest').mean()

# Determining mean cluster characterisitics ENHANCED WEATHERING
y_hc = hc.fit_predict(dataset_train[cols_ew])
dataset_train['EW'] = y_hc
dataset_train.sort_values("EW", inplace = True, ascending=True)

#average input vars by cluster
df_ewcluster = dataset_train.groupby('EW').mean()

# Determining mean cluster characterisitics BIOCHAR
y_hc = hc.fit_predict(dataset_train[cols_bio])
dataset_train['Biochar'] = y_hc
dataset_train.sort_values("Biochar", inplace = True, ascending=True)

#average input vars by cluster
df_biocluster = dataset_train.groupby('Biochar').mean()

In [4]:
# creating input identity
sub_df = dataset_train[cols]
X = sub_df
y = dataset_train[['Reforest','EW','Biochar']]
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.8, random_state=1)

X_train_rf = X_train[cols_rf]
X_train_ew = X_train[cols_ew]
X_train_bio = X_train[cols_bio]

X_test_rf = X_test[cols_rf]
X_test_ew = X_test[cols_ew]
X_test_bio = X_test[cols_bio]

y_train_rf = y_train[['Reforest']]
y_train_ew = y_train[['EW']]
y_train_bio = y_train[['Biochar']]

y_test_rf = y_test[['Reforest']]
y_test_ew = y_test[['EW']]
y_test_bio = y_test[['Biochar']]

dataset_val_rf = dataset_val[cols_rf]
dataset_val_ew = dataset_val[cols_ew]
dataset_val_bio = dataset_val[cols_bio]

In [5]:
# Make data into tensor objects
X_train_rf_tensor = torch.tensor(X_train_rf.values, dtype=torch.float32)
X_train_ew_tensor = torch.tensor(X_train_ew.values, dtype=torch.float32)
X_train_bio_tensor = torch.tensor(X_train_bio.values, dtype=torch.float32)

X_test_rf_tensor = torch.tensor(X_test_rf.values, dtype=torch.float32)
X_test_ew_tensor = torch.tensor(X_test_ew.values, dtype=torch.float32)
X_test_bio_tensor = torch.tensor(X_test_bio.values, dtype=torch.float32)

y_train_rf_tensor = torch.tensor(y_train_rf['Reforest'].values, dtype=torch.float32)
y_train_ew_tensor = torch.tensor(y_train_ew['EW'].values, dtype=torch.float32)
y_train_bio_tensor = torch.tensor(y_train_bio['Biochar'].values, dtype=torch.float32)

y_test_rf_tensor = torch.tensor(y_test_rf['Reforest'].values, dtype=torch.float32)
y_test_ew_tensor = torch.tensor(y_test_ew['EW'].values, dtype=torch.float32)
y_test_bio_tensor = torch.tensor(y_test_bio['Biochar'].values, dtype=torch.float32)

# Make test data into tensor objects
X_val_rf_tensor = torch.tensor(dataset_val_rf.values, dtype=torch.float32)
X_val_ew_tensor = torch.tensor(dataset_val_ew.values, dtype=torch.float32)
X_val_bio_tensor = torch.tensor(dataset_val_bio.values, dtype=torch.float32)

#train data
training_data_rf = TensorDataset(X_train_rf_tensor, y_train_rf_tensor)
train_dataloader_rf = DataLoader(training_data_rf, batch_size=64)

training_data_ew = TensorDataset(X_train_ew_tensor, y_train_ew_tensor)
train_dataloader_ew = DataLoader(training_data_ew, batch_size=64)

training_data_bio = TensorDataset(X_train_bio_tensor, y_train_bio_tensor)
train_dataloader_bio = DataLoader(training_data_bio, batch_size=64)


#test data
test_data_rf = TensorDataset(X_test_rf_tensor,  y_test_rf_tensor)
test_dataloader_rf = DataLoader(test_data_rf, batch_size=64)

test_data_ew = TensorDataset(X_test_ew_tensor,  y_test_ew_tensor)
test_dataloader_ew = DataLoader(test_data_ew, batch_size=64)

test_data_bio = TensorDataset(X_test_bio_tensor,  y_test_bio_tensor)
test_dataloader_bio = DataLoader(test_data_bio, batch_size=64)


#validate data
val_data_rf = TensorDataset(X_val_rf_tensor)
val_dataloader_rf = DataLoader(val_data_rf, batch_size=64)

val_data_ew = TensorDataset(X_val_ew_tensor)
val_dataloader_ew = DataLoader(val_data_ew, batch_size=64)

val_data_bio = TensorDataset(X_val_bio_tensor)
val_dataloader_bio = DataLoader(val_data_bio, batch_size=64)

# Supervised Neural Net for suitability prediction
## DeepTables

In [72]:
# Constants
LEARNING_RATE = 0.0001
DNN_PARAMS = {'hidden_units': ((128, 0, False), (64, 0, False)), 'dnn_activation': 'relu'}
EPOCHS = 20
ESP = 5

In [79]:
conf = deeptable.ModelConfig(
    dnn_params=DNN_PARAMS,
    nets=['dcn_nets'],
    optimizer=RMSprop(learning_rate=LEARNING_RATE),
    earlystopping_patience=ESP)
dt_rf = deeptable.DeepTable(config=conf)
dt_rf.fit(X_train_rf_tensor, y_train_rf['Reforest'], epochs=EPOCHS)
score_rf = dt_rf.evaluate(X_test_rf_tensor, y_test_rf['Reforest'])
preds_rf = dt_rf.predict(X_test_rf_tensor)
score_rf

04-17 16:24:00 I deeptables.m.deeptable.py 337 - X.Shape=torch.Size([67, 11]), y.Shape=(67,), batch_size=128, config=ModelConfig(name='conf-1', nets=['dcn_nets'], categorical_columns='auto', exclude_columns=[], task='auto', pos_label=None, metrics=['accuracy'], auto_categorize=False, cat_exponent=0.5, cat_remain_numeric=True, auto_encode_label=True, auto_imputation=True, auto_discrete=False, auto_discard_unique=True, apply_gbm_features=False, gbm_params={}, gbm_feature_type='embedding', fixed_embedding_dim=True, embeddings_output_dim=4, embeddings_initializer='uniform', embeddings_regularizer=None, embeddings_activity_regularizer=None, dense_dropout=0, embedding_dropout=0.3, stacking_op='add', output_use_bias=True, apply_class_weight=False, optimizer=<keras.optimizers.optimizer_experimental.rmsprop.RMSprop object at 0x7f4917b6f450>, loss='auto', dnn_params={'hidden_units': ((128, 0, False), (64, 0, False)), 'dnn_activation': 'relu'}, autoint_params={'num_attention': 3, 'num_heads': 1, 

04-17 16:24:00 W hypernets.t.cache.py 210 - TypeError: can't pickle weakref objects
Traceback (most recent call last):
  File "/home/hrn4ch/.conda/envs/myenv/lib/python3.7/site-packages/hypernets/tabular/cache.py", line 165, in _cache_call
    cache_key = tb.data_hasher()(key_items)
  File "/home/hrn4ch/.conda/envs/myenv/lib/python3.7/site-packages/hypernets/tabular/data_hasher.py", line 20, in __call__
    for x in self._iter_data(data):
  File "/home/hrn4ch/.conda/envs/myenv/lib/python3.7/site-packages/hypernets/tabular/data_hasher.py", line 58, in _iter_data
    yield from self._iter_data(v)
  File "/home/hrn4ch/.conda/envs/myenv/lib/python3.7/site-packages/hypernets/tabular/data_hasher.py", line 53, in _iter_data
    yield from self._iter_data(x)
  File "/home/hrn4ch/.conda/envs/myenv/lib/python3.7/site-packages/hypernets/tabular/data_hasher.py", line 61, in _iter_data
    pickle.dump(data, buf, protocol=pickle.HIGHEST_PROTOCOL)



04-17 16:24:00 I hypernets.t.toolbox.py 346 - 4 class detected, inferred as a [multiclass classification] task


04-17 16:24:00 W deeptables.m.preprocessor.py 154 - Column index of X has been converted: Index(['x_0', 'x_1', 'x_2', 'x_3', 'x_4', 'x_5', 'x_6', 'x_7', 'x_8', 'x_9',
       'x_10'],
      dtype='object')


04-17 16:24:00 I deeptables.m.preprocessor.py 261 - Preparing features...
04-17 16:24:00 I deeptables.m.preprocessor.py 336 - Preparing features taken 0.007425069808959961s
04-17 16:24:00 I deeptables.m.preprocessor.py 341 - Data imputation...
04-17 16:24:00 I deeptables.m.preprocessor.py 383 - Imputation taken 0.013078689575195312s
04-17 16:24:00 I deeptables.m.preprocessor.py 388 - Categorical encoding...
04-17 16:24:00 I deeptables.m.preprocessor.py 393 - Categorical encoding taken 0.11405777931213379s
04-17 16:24:00 I deeptables.m.preprocessor.py 196 - fit_transform taken 0.15387868881225586s
04-17 16:24:00 I deeptables.m.deeptable.py 353 - Training...
04-17 16:24:00 I deeptables.m.deeptable.py 752 - Injected a callback [EarlyStopping]. monitor:val_accuracy, patience:5, mode:max
04-17 16:24:00 I deeptables.u.dataset_generator.py 250 - create dataset generator with _TFDGForPandas, batch_size=128, shuffle=True, drop_remainder=True
04-17 16:24:00 I deeptables.u.dataset_generator.py 25

  f"The initializer {self.__class__.__name__} is unseeded "


04-17 16:24:00 I deeptables.m.deepmodel.py 287 - >>>>>>>>>>>>>>>>>>>>>> Model Desc <<<<<<<<<<<<<<<<<<<<<<< 
---------------------------------------------------------
inputs:
---------------------------------------------------------
['all_categorical_vars: (11)']
---------------------------------------------------------
embeddings:
---------------------------------------------------------
input_dims: [69, 69, 69, 68, 69, 69, 69, 69, 69, 69, 69]
output_dims: [4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4]
dropout: 0.3
---------------------------------------------------------
dense: dropout: 0
batch_normalization: False
---------------------------------------------------------
concat_embed_dense: shape: (None, 44)
---------------------------------------------------------
nets: ['dcn_nets']
---------------------------------------------------------
dcn-widecross: input_shape (None, 44), output_shape (None, 44)
dcn-dnn2: input_shape (None, 44), output_shape (None, 64)
dcn: input_shape (None, 44), output_s

04-17 16:24:03 W deeptables.m.preprocessor.py 154 - Column index of X has been converted: Index(['x_0', 'x_1', 'x_2', 'x_3', 'x_4', 'x_5', 'x_6', 'x_7', 'x_8', 'x_9',
       'x_10'],
      dtype='object')


04-17 16:24:03 I deeptables.m.preprocessor.py 249 - transform_X taken 0.038516998291015625s
04-17 16:24:03 I deeptables.m.preprocessor.py 230 - Transform [y]...
04-17 16:24:03 I deeptables.m.preprocessor.py 236 - transform_y taken 0.0002837181091308594s
04-17 16:24:03 I deeptables.m.deepmodel.py 158 - Performing evaluation...
04-17 16:24:03 I deeptables.u.dataset_generator.py 250 - create dataset generator with _TFDGForPandas, batch_size=256, shuffle=False, drop_remainder=False
04-17 16:24:03 I deeptables.m.deeptable.py 685 - Perform prediction...
04-17 16:24:03 I deeptables.m.preprocessor.py 242 - Transform [X]...


04-17 16:24:03 W deeptables.m.preprocessor.py 154 - Column index of X has been converted: Index(['x_0', 'x_1', 'x_2', 'x_3', 'x_4', 'x_5', 'x_6', 'x_7', 'x_8', 'x_9',
       'x_10'],
      dtype='object')


04-17 16:24:03 I deeptables.m.preprocessor.py 249 - transform_X taken 0.03861689567565918s
04-17 16:24:03 I deeptables.m.deepmodel.py 130 - Performing predictions...
04-17 16:24:03 I deeptables.u.dataset_generator.py 250 - create dataset generator with _TFDGForPandas, batch_size=128, shuffle=False, drop_remainder=False
04-17 16:24:03 I deeptables.m.deeptable.py 559 - predict_proba taken 0.30714917182922363s
04-17 16:24:03 I deeptables.m.deeptable.py 594 - Reverse indicators to labels.


{'loss': 1.3612148761749268, 'accuracy': 0.4117647111415863}

In [77]:
conf = deeptable.ModelConfig(
    dnn_params=DNN_PARAMS,
    nets=['dcn_nets'],
    optimizer=RMSprop(learning_rate=LEARNING_RATE),
    earlystopping_patience=ESP)
dt_ew = deeptable.DeepTable(config=conf)
dt_ew.fit(X_train_ew_tensor, y_train_ew['EW'], epochs=EPOCHS)
score_ew = dt_ew.evaluate(X_test_ew_tensor, y_test_ew['EW'])
preds_ew = dt_ew.predict(X_test_ew_tensor)
score_ew

04-17 16:23:47 I deeptables.m.deeptable.py 337 - X.Shape=torch.Size([67, 12]), y.Shape=(67,), batch_size=128, config=ModelConfig(name='conf-1', nets=['dcn_nets'], categorical_columns='auto', exclude_columns=[], task='auto', pos_label=None, metrics=['accuracy'], auto_categorize=False, cat_exponent=0.5, cat_remain_numeric=True, auto_encode_label=True, auto_imputation=True, auto_discrete=False, auto_discard_unique=True, apply_gbm_features=False, gbm_params={}, gbm_feature_type='embedding', fixed_embedding_dim=True, embeddings_output_dim=4, embeddings_initializer='uniform', embeddings_regularizer=None, embeddings_activity_regularizer=None, dense_dropout=0, embedding_dropout=0.3, stacking_op='add', output_use_bias=True, apply_class_weight=False, optimizer=<keras.optimizers.optimizer_experimental.rmsprop.RMSprop object at 0x7f4932cdc5d0>, loss='auto', dnn_params={'hidden_units': ((128, 0, False), (64, 0, False)), 'dnn_activation': 'relu'}, autoint_params={'num_attention': 3, 'num_heads': 1, 

04-17 16:23:47 W hypernets.t.cache.py 210 - TypeError: can't pickle weakref objects
Traceback (most recent call last):
  File "/home/hrn4ch/.conda/envs/myenv/lib/python3.7/site-packages/hypernets/tabular/cache.py", line 165, in _cache_call
    cache_key = tb.data_hasher()(key_items)
  File "/home/hrn4ch/.conda/envs/myenv/lib/python3.7/site-packages/hypernets/tabular/data_hasher.py", line 20, in __call__
    for x in self._iter_data(data):
  File "/home/hrn4ch/.conda/envs/myenv/lib/python3.7/site-packages/hypernets/tabular/data_hasher.py", line 58, in _iter_data
    yield from self._iter_data(v)
  File "/home/hrn4ch/.conda/envs/myenv/lib/python3.7/site-packages/hypernets/tabular/data_hasher.py", line 53, in _iter_data
    yield from self._iter_data(x)
  File "/home/hrn4ch/.conda/envs/myenv/lib/python3.7/site-packages/hypernets/tabular/data_hasher.py", line 61, in _iter_data
    pickle.dump(data, buf, protocol=pickle.HIGHEST_PROTOCOL)



04-17 16:23:47 I hypernets.t.toolbox.py 346 - 4 class detected, inferred as a [multiclass classification] task


04-17 16:23:47 W deeptables.m.preprocessor.py 154 - Column index of X has been converted: Index(['x_0', 'x_1', 'x_2', 'x_3', 'x_4', 'x_5', 'x_6', 'x_7', 'x_8', 'x_9',
       'x_10', 'x_11'],
      dtype='object')


04-17 16:23:47 I deeptables.m.preprocessor.py 261 - Preparing features...
04-17 16:23:47 I deeptables.m.preprocessor.py 336 - Preparing features taken 0.008162260055541992s
04-17 16:23:47 I deeptables.m.preprocessor.py 341 - Data imputation...
04-17 16:23:47 I deeptables.m.preprocessor.py 383 - Imputation taken 0.01774454116821289s
04-17 16:23:47 I deeptables.m.preprocessor.py 388 - Categorical encoding...
04-17 16:23:47 I deeptables.m.preprocessor.py 393 - Categorical encoding taken 0.11747145652770996s
04-17 16:23:47 I deeptables.m.preprocessor.py 196 - fit_transform taken 0.15785646438598633s
04-17 16:23:47 I deeptables.m.deeptable.py 353 - Training...
04-17 16:23:47 I deeptables.m.deeptable.py 752 - Injected a callback [EarlyStopping]. monitor:val_accuracy, patience:5, mode:max
04-17 16:23:47 I deeptables.u.dataset_generator.py 250 - create dataset generator with _TFDGForPandas, batch_size=128, shuffle=True, drop_remainder=True
04-17 16:23:47 I deeptables.u.dataset_generator.py 250

  f"The initializer {self.__class__.__name__} is unseeded "


04-17 16:23:47 I deeptables.m.deepmodel.py 287 - >>>>>>>>>>>>>>>>>>>>>> Model Desc <<<<<<<<<<<<<<<<<<<<<<< 
---------------------------------------------------------
inputs:
---------------------------------------------------------
['all_categorical_vars: (12)']
---------------------------------------------------------
embeddings:
---------------------------------------------------------
input_dims: [69, 69, 69, 69, 69, 69, 69, 69, 69, 40, 52, 69]
output_dims: [4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4]
dropout: 0.3
---------------------------------------------------------
dense: dropout: 0
batch_normalization: False
---------------------------------------------------------
concat_embed_dense: shape: (None, 48)
---------------------------------------------------------
nets: ['dcn_nets']
---------------------------------------------------------
dcn-widecross: input_shape (None, 48), output_shape (None, 48)
dcn-dnn2: input_shape (None, 48), output_shape (None, 64)
dcn: input_shape (None, 48), o

04-17 16:23:50 W deeptables.m.preprocessor.py 154 - Column index of X has been converted: Index(['x_0', 'x_1', 'x_2', 'x_3', 'x_4', 'x_5', 'x_6', 'x_7', 'x_8', 'x_9',
       'x_10', 'x_11'],
      dtype='object')


04-17 16:23:50 I deeptables.m.preprocessor.py 249 - transform_X taken 0.03966379165649414s
04-17 16:23:50 I deeptables.m.preprocessor.py 230 - Transform [y]...
04-17 16:23:50 I deeptables.m.preprocessor.py 236 - transform_y taken 0.0002162456512451172s
04-17 16:23:50 I deeptables.m.deepmodel.py 158 - Performing evaluation...
04-17 16:23:50 I deeptables.u.dataset_generator.py 250 - create dataset generator with _TFDGForPandas, batch_size=256, shuffle=False, drop_remainder=False
04-17 16:23:50 I deeptables.m.deeptable.py 685 - Perform prediction...
04-17 16:23:50 I deeptables.m.preprocessor.py 242 - Transform [X]...


04-17 16:23:50 W deeptables.m.preprocessor.py 154 - Column index of X has been converted: Index(['x_0', 'x_1', 'x_2', 'x_3', 'x_4', 'x_5', 'x_6', 'x_7', 'x_8', 'x_9',
       'x_10', 'x_11'],
      dtype='object')


04-17 16:23:50 I deeptables.m.preprocessor.py 249 - transform_X taken 0.04038500785827637s
04-17 16:23:50 I deeptables.m.deepmodel.py 130 - Performing predictions...
04-17 16:23:50 I deeptables.u.dataset_generator.py 250 - create dataset generator with _TFDGForPandas, batch_size=128, shuffle=False, drop_remainder=False
04-17 16:23:51 I deeptables.m.deeptable.py 559 - predict_proba taken 0.2461109161376953s
04-17 16:23:51 I deeptables.m.deeptable.py 594 - Reverse indicators to labels.


{'loss': 1.3759139776229858, 'accuracy': 0.47058823704719543}

In [75]:
conf = deeptable.ModelConfig(
    dnn_params=DNN_PARAMS,
    nets=['dcn_nets'],
    optimizer=RMSprop(learning_rate=LEARNING_RATE),
    earlystopping_patience=ESP)
dt_bio = deeptable.DeepTable(config=conf)
dt_bio.fit(X_train_bio_tensor, y_train_bio['Biochar'], epochs=EPOCHS)
score_bio = dt_bio.evaluate(X_test_bio_tensor, y_test_bio['Biochar'])
preds_bio = dt_bio.predict(X_test_bio_tensor)
score_bio

04-17 16:23:14 I deeptables.m.deeptable.py 337 - X.Shape=torch.Size([67, 12]), y.Shape=(67,), batch_size=128, config=ModelConfig(name='conf-1', nets=['dcn_nets'], categorical_columns='auto', exclude_columns=[], task='auto', pos_label=None, metrics=['accuracy'], auto_categorize=False, cat_exponent=0.5, cat_remain_numeric=True, auto_encode_label=True, auto_imputation=True, auto_discrete=False, auto_discard_unique=True, apply_gbm_features=False, gbm_params={}, gbm_feature_type='embedding', fixed_embedding_dim=True, embeddings_output_dim=4, embeddings_initializer='uniform', embeddings_regularizer=None, embeddings_activity_regularizer=None, dense_dropout=0, embedding_dropout=0.3, stacking_op='add', output_use_bias=True, apply_class_weight=False, optimizer=<keras.optimizers.optimizer_experimental.rmsprop.RMSprop object at 0x7f493331d690>, loss='auto', dnn_params={'hidden_units': ((128, 0, False), (64, 0, False)), 'dnn_activation': 'relu'}, autoint_params={'num_attention': 3, 'num_heads': 1, 

04-17 16:23:14 W hypernets.t.cache.py 210 - TypeError: can't pickle weakref objects
Traceback (most recent call last):
  File "/home/hrn4ch/.conda/envs/myenv/lib/python3.7/site-packages/hypernets/tabular/cache.py", line 165, in _cache_call
    cache_key = tb.data_hasher()(key_items)
  File "/home/hrn4ch/.conda/envs/myenv/lib/python3.7/site-packages/hypernets/tabular/data_hasher.py", line 20, in __call__
    for x in self._iter_data(data):
  File "/home/hrn4ch/.conda/envs/myenv/lib/python3.7/site-packages/hypernets/tabular/data_hasher.py", line 58, in _iter_data
    yield from self._iter_data(v)
  File "/home/hrn4ch/.conda/envs/myenv/lib/python3.7/site-packages/hypernets/tabular/data_hasher.py", line 53, in _iter_data
    yield from self._iter_data(x)
  File "/home/hrn4ch/.conda/envs/myenv/lib/python3.7/site-packages/hypernets/tabular/data_hasher.py", line 61, in _iter_data
    pickle.dump(data, buf, protocol=pickle.HIGHEST_PROTOCOL)



04-17 16:23:14 I hypernets.t.toolbox.py 346 - 4 class detected, inferred as a [multiclass classification] task


04-17 16:23:14 W deeptables.m.preprocessor.py 154 - Column index of X has been converted: Index(['x_0', 'x_1', 'x_2', 'x_3', 'x_4', 'x_5', 'x_6', 'x_7', 'x_8', 'x_9',
       'x_10', 'x_11'],
      dtype='object')


04-17 16:23:14 I deeptables.m.preprocessor.py 261 - Preparing features...
04-17 16:23:14 I deeptables.m.preprocessor.py 336 - Preparing features taken 0.00841379165649414s
04-17 16:23:14 I deeptables.m.preprocessor.py 341 - Data imputation...
04-17 16:23:14 I deeptables.m.preprocessor.py 383 - Imputation taken 0.015813350677490234s
04-17 16:23:14 I deeptables.m.preprocessor.py 388 - Categorical encoding...
04-17 16:23:15 I deeptables.m.preprocessor.py 393 - Categorical encoding taken 0.16371631622314453s
04-17 16:23:15 I deeptables.m.preprocessor.py 196 - fit_transform taken 0.22676992416381836s
04-17 16:23:15 I deeptables.m.deeptable.py 353 - Training...
04-17 16:23:15 I deeptables.m.deeptable.py 752 - Injected a callback [EarlyStopping]. monitor:val_accuracy, patience:5, mode:max
04-17 16:23:15 I deeptables.u.dataset_generator.py 250 - create dataset generator with _TFDGForPandas, batch_size=128, shuffle=True, drop_remainder=True
04-17 16:23:15 I deeptables.u.dataset_generator.py 250

  f"The initializer {self.__class__.__name__} is unseeded "


Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 8: early stopping
04-17 16:23:18 I deeptables.m.deepmodel.py 122 - Training finished.
04-17 16:23:18 I deeptables.m.deeptable.py 369 - Training finished.
04-17 16:23:18 I deeptables.m.deeptable.py 704 - Model has been saved to:dt_output/dt_20240417162314_dcn_nets/dcn_nets.h5
04-17 16:23:18 I deeptables.m.preprocessor.py 242 - Transform [X]...


04-17 16:23:18 W deeptables.m.preprocessor.py 154 - Column index of X has been converted: Index(['x_0', 'x_1', 'x_2', 'x_3', 'x_4', 'x_5', 'x_6', 'x_7', 'x_8', 'x_9',
       'x_10', 'x_11'],
      dtype='object')


04-17 16:23:18 I deeptables.m.preprocessor.py 249 - transform_X taken 0.040585994720458984s
04-17 16:23:18 I deeptables.m.preprocessor.py 230 - Transform [y]...
04-17 16:23:18 I deeptables.m.preprocessor.py 236 - transform_y taken 0.00028204917907714844s
04-17 16:23:18 I deeptables.m.deepmodel.py 158 - Performing evaluation...
04-17 16:23:18 I deeptables.u.dataset_generator.py 250 - create dataset generator with _TFDGForPandas, batch_size=256, shuffle=False, drop_remainder=False
04-17 16:23:18 I deeptables.m.deeptable.py 685 - Perform prediction...
04-17 16:23:18 I deeptables.m.preprocessor.py 242 - Transform [X]...


04-17 16:23:18 W deeptables.m.preprocessor.py 154 - Column index of X has been converted: Index(['x_0', 'x_1', 'x_2', 'x_3', 'x_4', 'x_5', 'x_6', 'x_7', 'x_8', 'x_9',
       'x_10', 'x_11'],
      dtype='object')


04-17 16:23:18 I deeptables.m.preprocessor.py 249 - transform_X taken 0.041500091552734375s
04-17 16:23:18 I deeptables.m.deepmodel.py 130 - Performing predictions...
04-17 16:23:18 I deeptables.u.dataset_generator.py 250 - create dataset generator with _TFDGForPandas, batch_size=128, shuffle=False, drop_remainder=False
04-17 16:23:18 I deeptables.m.deeptable.py 559 - predict_proba taken 0.24778962135314941s
04-17 16:23:18 I deeptables.m.deeptable.py 594 - Reverse indicators to labels.


{'loss': 1.3834924697875977, 'accuracy': 0.4117647111415863}

In [80]:
print('rf:', score_rf, preds_rf)
print('ew:', score_ew, preds_ew)
print('bio:', score_bio, preds_bio)

rf: {'loss': 1.3612148761749268, 'accuracy': 0.4117647111415863} [1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1]
ew: {'loss': 1.3759139776229858, 'accuracy': 0.47058823704719543} [1 1 1 3 1 1 1 1 1 1 3 3 3 1 3 1 1]
bio: {'loss': 1.3834924697875977, 'accuracy': 0.4117647111415863} [3 1 3 3 3 3 1 3 3 3 1 3 3 3 3 3 3]


In [81]:
print('average accuracy is',(score_rf['accuracy']+score_ew['accuracy']+score_bio['accuracy'])/3) 

average accuracy is 0.4313725531101227
