In [1]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

import pandas as pd
import numpy as np

from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.tree import DecisionTreeClassifier
from sklearn.preprocessing import LabelEncoder

# Goal of This Notebook

The goal of this notebook is replicating one of the experiments in [[1]](https://arxiv.org/pdf/1806.06988.pdf) and verifying the results. This notebook was prepared mainly for learning purposes.

# Brief Introduction on Deep Neural Decision Trees

Tree algorithms are really useful when it comes to tabular data. They yield successful results (I'm sure you recognized this from the winner models of many competitions). Besides that, they are interetable which is really important for almost all the real life examples.

The Neural Networks particulary handy when the data is perceptual. Signal, semantic, picture and audio data can be processed via NN.

Currently, researchers try to combine two and create better ones. Some examples can be,

* **Tensorflow Decision Forest Applications:** Tensorflow supports Ensemble Trees and Gradient Boosted Trees. One can combine deep learning architechtures with these and get sweet results. You can look my previous [work](https://www.kaggle.com/code/egemenuurdalg/modeling-tree-algorithms-for-nlp-tasks) where I used different tree algorithms with different preprocessing layers for NLP classification task. I also compared the results with the performance of sequential cells and BERT.

* **Composed Decision Forest and Neural Network:** The aim is combine multiple decision forests and neural nets to improve predictive performance[[2]](https://www.tensorflow.org/decision_forests/tutorials/model_composition_colab).

* **Deep Neural Decision Trees:** The goal is constructing a tree which splits based on the results of multilayer perceptrons. While Neural Decision Tree is designed mainly for interpretability, the ensemble of Neural Decision Trees can be used for higer prediction performance.  


# Methodology

* Since the number of predictors is less than 12, I used Neural Decision Trees instead of Neural Decision Forests. (There is an example in [keras website](https://keras.io/examples/structured_data/deep_neural_decision_forests/) if you are curious about that.

* I used Titanic Dataset to train Neural Nets, Decision Tree Classifier and Neural Decision Tree algorithms.

* The categorical features have just couple of unique variables so I used [One Hot Encoding](https://en.wikipedia.org/wiki/One-hot) to preprocess them.

* I didn't used any normalization agent for neural nets because it did not improve the predictive performance.

* Since the goal is binary classification, I used [sigmoid](https://en.wikipedia.org/wiki/Sigmoid_function) for the outputs of Deep Neural Trees.

* In paper there is no specification regarding tree depth and number of epochs. Based on the predictive performance I find the parameters by trial & error and early stopping.

* I arranged the other parameters based on the paper.

# Results

Except decision tree the results in this notebook came similar with the ones in the paper. 
           
* DNDT (Paper): 80.4
* DT (Paper): 79.0
* NN (Paper): 76.9
* DNDT (Notebook): 81.5
* DT (Notebook): 73.89
* NN (Notebook): 78.3

# Conclusion

* It is possible to model a successful tree based algorithm learning similar to the Neural Nets.

In [2]:
titanic_file = tf.keras.utils.get_file("train.csv", "https://storage.googleapis.com/tf-datasets/titanic/train.csv")
df = pd.read_csv(titanic_file)
df.head()

Downloading data from https://storage.googleapis.com/tf-datasets/titanic/train.csv


Unnamed: 0,survived,sex,age,n_siblings_spouses,parch,fare,class,deck,embark_town,alone
0,0,male,22.0,1,0,7.25,Third,unknown,Southampton,n
1,1,female,38.0,1,0,71.2833,First,C,Cherbourg,n
2,1,female,26.0,0,0,7.925,Third,unknown,Southampton,y
3,1,female,35.0,1,0,53.1,First,C,Southampton,n
4,0,male,28.0,0,0,8.4583,Third,unknown,Queenstown,y


In [3]:
df[list(df.select_dtypes(include = 'object').columns)] = df[list(df.select_dtypes(include = 'object').columns)].astype('string')

TARGET_FEATURE_NAME = "survived"
CATEGORICAL_FEATURES = list(df.select_dtypes(include = 'string').columns)
NUMERIC_FEATURES =  list(df.select_dtypes(exclude = 'string').columns)[1:]
FEATURE_NAMES = list(df.columns)

CATEGORICAL_FEATURES_WITH_VOCABULARY = {
    'sex': sorted(list(df['sex'].unique())),
    'class': sorted(list(df['class'].unique())),
    'deck': sorted(list(df['deck'].unique())),
    'embark_town': sorted(list(df['embark_town'].unique())),
    'alone': sorted(list(df['alone'].unique())),
}

In [4]:
train,test = train_test_split(df,random_state = 42)
train.to_csv('train_dataset.csv',index = False, header = False)
test.to_csv('test_dataset.csv',index = False, header = False)

In [5]:
BATCH_SIZE = 64

train_ds = tf.data.experimental.make_csv_dataset("./train_dataset.csv",
                                                batch_size = BATCH_SIZE,
                                                column_names = FEATURE_NAMES,
                                                label_name = TARGET_FEATURE_NAME,
                                                num_epochs = 2,
                                                header = False,
                                                 shuffle = True
                                                )
test_ds = tf.data.experimental.make_csv_dataset("./test_dataset.csv",
                                                batch_size = BATCH_SIZE,
                                                column_names = FEATURE_NAMES,
                                                label_name = TARGET_FEATURE_NAME,
                                                num_epochs = 1,
                                                header = False,
                                                )

2022-07-12 09:46:22.301021: I tensorflow/core/common_runtime/process_util.cc:146] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.


In [6]:
def create_inputs():
    inputs = {}
    for feature_name in FEATURE_NAMES:
        if feature_name in NUMERIC_FEATURES:
            inputs[feature_name] = layers.Input(shape = (), dtype = tf.float32, name = feature_name)
        elif feature_name in CATEGORICAL_FEATURES:
            inputs[feature_name] = layers.Input(shape = (), dtype = tf.string, name = feature_name)
    return inputs

def encode_features(inputs):
    encoded_features = []
    for feature_name in FEATURE_NAMES:
        if feature_name in NUMERIC_FEATURES:
            encoded_feature = tf.expand_dims(inputs[feature_name],-1)
        elif feature_name in CATEGORICAL_FEATURES:
            vocab = CATEGORICAL_FEATURES_WITH_VOCABULARY[feature_name]
            lookup = layers.StringLookup(vocabulary = vocab, output_mode = "one_hot")
            encoded_feature = lookup(inputs[feature_name])
        else:
            continue
        encoded_features.append(encoded_feature)
    
    return encoded_features

# 1. Neural Networks

In [7]:
inputs = create_inputs()
encoded_features = encode_features(inputs)
encoded_features = layers.concatenate(encoded_features)
x = layers.Dense(50, activation = 'relu')(encoded_features)
x = layers.Dense(50, activation = 'relu')(x)
output = layers.Dense(1,activation ='sigmoid')(x)

nn_model = keras.Model(inputs,output)

nn_model.compile(
    optimizer = 'adam',
    loss = 'binary_crossentropy',
    metrics = ['binary_accuracy']
)

In [9]:
callback = tf.keras.callbacks.EarlyStopping(patience = 5, restore_best_weights = True)
nn_model.fit(train_ds,validation_data = test_ds,epochs = 20, callbacks = [callback])

2022-07-12 09:46:57.158535: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2)


Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20


<keras.callbacks.History at 0x7f4cc4546790>

In [10]:
nn_model.evaluate(test_ds)



[0.4626370966434479, 0.7834395170211792]

# 2. Neural Decision Trees 

In [11]:
class NeuralDecisionTree(keras.Model):
    def __init__(self,depth,num_features,used_features_rate,num_classes):
        super(NeuralDecisionTree, self).__init__()
        self.depth = depth
        self.num_leaves = 2**depth
        self.num_classes = num_classes
        self.step_counter = 1
        num_used_features = int(num_features * used_features_rate)
        one_hot = np.eye(num_features)
        sampled_feature_indices = np.random.choice(np.arange(num_features), num_used_features, replace = False)
        self.used_features_mask = one_hot[sampled_feature_indices]
        
        self.pi = tf.Variable(
        initial_value = tf.random_normal_initializer()(
        shape = [self.num_leaves,self.num_classes]),
            dtype = tf.float32,
            trainable = True)
        
        self.decision_fn = layers.Dense(units = self.num_leaves,
        activation = "sigmoid",name = "decision")
        
    def call(self,features):

        batch_size = tf.shape(features)[0]
        features = tf.matmul(
        features,self.used_features_mask,transpose_b = True)
       

        decisions = tf.expand_dims(
            self.decision_fn(features),axis = 2
        )

        decisions = layers.concatenate(
            [decisions,1-decisions], axis = 2
        )

        
        mu = tf.ones([batch_size,1,1])
        

        begin_idx = 1
        end_idx = 2

        
        for level in range(self.depth):
            mu = tf.reshape(mu,[batch_size,-1,1])
            mu = tf.tile(mu,(1,1,2))
            level_decisions = decisions[
                :,begin_idx:end_idx,:
            ]
            mu = mu * level_decisions
            begin_idx = end_idx
            end_idx = begin_idx + 2**(level+1)
        
        mu = tf.reshape(mu,[batch_size,self.num_leaves])
       

        probabilities = keras.activations.sigmoid(self.pi)
       
        outputs = tf.matmul(mu,probabilities)
        
    
        return outputs

In [12]:
neural_decision_params = {
    'depth' : 10,
'num_features' : 28,
'num_classes' : 2,
'used_features_rate' : 1.0}

inputs = create_inputs()
encoded_features = encode_features(inputs)
features = layers.concatenate(encoded_features)
outputs = NeuralDecisionTree(**neural_decision_params)(features)
neural_decision_tree_model = tf.keras.Model(inputs,outputs)

neural_decision_tree_model.compile(
    optimizer = 'adam',
    loss = 'binary_crossentropy',
    metrics = ["binary_accuracy"])

In [13]:
neural_decision_tree_model.fit(
    train_ds,epochs = 30,
    validation_data = test_ds,
    callbacks = [callback]

)

Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30


<keras.callbacks.History at 0x7f4cbc0cbf10>

In [17]:
neural_decision_tree_model.evaluate(test_ds)



[0.5732039213180542, 0.8152866363525391]

# 3.Decision Trees

In [14]:
# pandas preprocessing
df.sex = LabelEncoder().fit_transform(df.sex)
df.alone = LabelEncoder().fit_transform(df.alone)
df = pd.get_dummies(df,columns = ['class','deck','embark_town'])
train,test = train_test_split(df,random_state = 42)

In [15]:
tree_model = DecisionTreeClassifier(criterion="gini", splitter="best",random_state = 42)
tree_model.fit(train.drop('survived',axis = 1),train.survived)
print('Accuracy score tree model: ',np.round(accuracy_score(test.survived,tree_model.predict(test.drop('survived',axis = 1)))*100,2))

Accuracy score tree model:  73.89


In [16]:
tree_model = DecisionTreeClassifier(criterion="gini", splitter="best",random_state = 42)
tree_model.fit(train.drop('survived',axis = 1),train.survived)
print('Accuracy score tree model: ',np.round(accuracy_score(test.survived,tree_model.predict(test.drop('survived',axis = 1)))*100,2))

Accuracy score tree model:  73.89
