# Decision Trees

Decision Trees (DTs) are machine learning algorithms that recursively divide the dataset into smaller groups based on a given feature until all the samples are classified.

At each step in recursion, the DT tries to pick the next best feature to split the dataset. It does this by evaluating the *splitting criterion*. This notebook will investigate the **Gini Impurity** and **Entropy (Information Gain)** splitting criterion.

# Approach

1. Loading Data
2. Feature Selection
3. Splitting Data
4. Building and Training a DT Model
5. Testing the Model
6. Evaluating Model
7. Visualizing the DT

Most of the steps above can be done simply by calling a function.

# Dataset

## What is Shill Bidding?

Shill bidding is when a seller uses a fraudulent account, to bid on their auction to artificially raise the price of the auction. Here we work with the Shill Bidding Dataset to learn a decision tree that classifies auctioners into normal or suspicious behavior.

![about-dataset](https://github.com/gsethi2409/first-order-model/blob/master/Screenshot%20from%202021-01-03%2009-48-18.png?raw=true)

## Features

* Record ID
* Auction ID
* Bidder ID
* Bidder Tendency
* Bidding Ratio
* Successive Outbidding
* Last Bidding
* Auction Bids
* Auction Starting Price
* Early Bidding
* Winning Ratio
* Auction Duration

For classification, we use all features except the first three IDs.

We will use the following libraries.
* scikit-learn
* numpy
* pandas
* matplotlib

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from sklearn import metrics
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix

## Loading Data

In [None]:
dataset = pd.read_csv("/kaggle/input/shill-bidding-dataset/Shill Bidding Dataset.csv", delimiter=",")

## Feature Selection

In [None]:
X = dataset[['Bidder_Tendency', 'Bidding_Ratio', 'Successive_Outbidding', 'Last_Bidding', 'Auction_Bids', 'Starting_Price_Average', 'Early_Bidding', 'Winning_Ratio', 'Auction_Duration']].values
y = dataset['Class']

## Splitting Data

In [None]:
X_trainset, X_testset, y_trainset, y_testset = train_test_split(X, y, test_size=0.3, random_state=3)

## Building and Training a DT Model

In [None]:
def SBDecisionTreeClassifier(heuristic="gini", tree_depth = None):
    decision_tree_clfr = DecisionTreeClassifier(criterion = heuristic, max_depth = tree_depth)
    decision_tree_clfr.fit(X_trainset, y_trainset)
    return decision_tree_clfr

decision_tree = SBDecisionTreeClassifier()

## Testing The Model

In [None]:
predTree = decision_tree.predict(X_testset)

## Evaluating Model

In [None]:
def evaluate_model(decision_tree, predTree):
    print("Decision Trees's Accuracy: ", metrics.accuracy_score(y_testset, predTree))
    print("Depth of Decision Tree: ", decision_tree.tree_.max_depth)
    
evaluate_model(decision_tree, predTree)

Let us now evaluate the different heuristic options.

In [None]:
print("USING GINI")
heuristic = "gini"
decision_tree_gini = SBDecisionTreeClassifier(heuristic)
predTree_gini = decision_tree_gini.predict(X_testset)
evaluate_model(decision_tree_gini, predTree_gini)

print("USING ENTROPY")
heuristic = "entropy"
decision_tree_entropy = SBDecisionTreeClassifier(heuristic)
predTree_entropy = decision_tree_entropy.predict(X_testset)
evaluate_model(decision_tree_entropy, predTree_entropy)

## Visualization

In [None]:
!conda install -c conda-forge pydotplus -y
!conda install -c conda-forge python-graphviz -y

In [None]:
import matplotlib.pyplot as plt
from six import StringIO
import pydotplus
import matplotlib.image as mpimg
from sklearn import tree
import numpy as np

fileName_g = "decision-tree.png"
dot_data = StringIO()
featureNames = dataset.columns[3:12]
labedNames = dataset["Class"].unique().tolist()
    
# export_graphviz will convert decision tree classifier into dot file
tree.export_graphviz(decision_tree_gini,feature_names = featureNames, out_file = dot_data, class_names = str(np.unique(y_trainset)), filled = True,  special_characters = True, rotate = False) 
    
# Convert dot file int pgn using pydotplus
graph = pydotplus.graph_from_dot_data(dot_data.getvalue())
    
#write pgn into file
graph.write_png(fileName_g)

#display tree image
img_g = mpimg.imread(fileName_g)
plt.figure(figsize=(100, 200))
plt.imshow(img_g, interpolation='nearest')