![cart](https://i.ibb.co/qmJPYVf/decision-RTree.png)

- **ML Part 1** - Logistic Regression
- **ML Part 2** - K-Nearest Neighbors (KNN)
- **ML Part 3** - Support Vector Machine (SVM)
- **ML Part 4** - Artificial Neural Network (NN)
- **ML Part 5 - Classification and Regression Tree (CART)**
- **ML Part 6** - Random Forests
- **ML Part 7** - Gradient Boosting Machines (GBM)
- **ML Part 8** - XGBoost
- **ML Part 9** - LightGBM
- **ML Part 10** - CatBoost

---
Decision tree is a tree-based algorithm used to solve regression and classification problems. A homogeneous possibility to derive output is framed by an inverted tree branched from the distributed root node to highly heterogeneous leaf nodes. Regression trees use classification trees for the dependent variable with continuous values and for the dependent variable with discrete values.

## Basic Theory
The decision tree is derived from arguments where each node has a condition on a property. Nodes decide which node goes to the next node depending on the condition. Once the leaf node is reached, an output is predicted. The correct set of conditions makes the tree fruitful. entropy / information gain is used as criteria for selecting conditions in nodes. A recursive, greedy-based algorithm is used to derive the tree structure.

![](https://miro.medium.com/max/690/1*xzF10JmR3K0rnZ8jtIHI_g.png)
In the above diagram, we can see a tree with internal nodes (conditions) and leaf nodes (reject / accept offer) with labels.


## Algorithm for Choosing Conditions
For CART (classification and regression trees), we use the gini index as the classification criteria. It is a measure for calculating how well data points are mixed.
![](https://miro.medium.com/max/415/0*asbVp_8lwEsbfpOv.png)
The maximum gini indexed feature is chosen as the next condition at each stage of building the decision tree. The gini score will be maximum when the set is unevenly mixed.


## Advantages
- There is no need for preprocessing on the data.
- There are no assumptions about the distribution of data.
- Manages collinearity efficiently.
- Decision trees can provide a clear explanation for the forecast.

## Disadvantages
- If we continue to build the tree to achieve high purity, we may have overfitted the model. Decision tree pruning can be used to solve this problem.
- Prone to outliers.
- The tree can get very complex when training complex data sets.
- It loses valuable information while processing continuous variables.

## Hyperparameters
The decision tree contains many hyperparameters and I will list a few of them.
- criterion
    - What cost function to select the next tree node. Mostly used are gini / entropy.
- maximum depth
    - The maximum allowed depth of the decision tree.
- minimum sample division
    - It is the minimum node required to split an internal node.
- minimum sample sheet
    - Minimum sample required in leaf node.
    

## Comparison with Other Models
### Decision Trees vs Random Forest
- Random Forest is a collection of decision trees and the average / majority vote of the forest is chosen as the predicted output.
- The Random Forest model will be less inclined to adapt to the Decision tree and will provide a more general solution.
- Random Forest is more robust and accurate than decision trees.

### Decision Trees vs KNN
- Both are nonparametric methods.
- While the decision tree supports automatic feature interaction, it cannot be KNN.
- The decision tree is faster due to the expensive real time implementation of KNN.

### Decision Trees vs Naive Bayes
- The decision tree is a distinctive model, while Naive bayes is a generative model.
- Decision trees are more flexible and easy.
- Decision tree pruning can neglect some core values in the training data, resulting in accuracy for one shot.

### Decision Trees vs Neural Networks (NN)
- Both find nonlinear solutions and have interactions between independent variables.
- Decision trees are better when there is a large set of categorical values in education data.
- When the scenario asks for an explanation about the decision, the decision trees are better than NN.
- When there is sufficient training data, NN performs better than the decision tree.

### Decision Trees vs SVM
- While SVM uses kernel numbers to solve nonlinear problems, decision trees derive hyperrectangles in the input space to solve the problem.
- Decision trees are better for categorical data and handle collinearity better than SVM.

## Coding Time

In [None]:
# Import the necessary packages

import numpy as np
import pandas as pd

import seaborn as sns
import plotly.express as px
import matplotlib.pyplot as plt
%matplotlib inline

import sklearn
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import StandardScaler, LabelEncoder, MinMaxScaler
from sklearn.metrics import plot_confusion_matrix
from sklearn.model_selection import train_test_split, GridSearchCV, cross_val_score
from sklearn.metrics import f1_score, recall_score, precision_score, confusion_matrix
from sklearn.metrics import r2_score, roc_auc_score, roc_curve, classification_report
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import ExtraTreesClassifier

import warnings
warnings.filterwarnings('ignore')

In [None]:
# Import and read dataset

input_ = "../input/heart-failure-clinical-data/heart_failure_clinical_records_dataset.csv"
df = pd.read_csv(input_)

df.head(10)

In [None]:
import pandas_profiling as pdp
report = pdp.ProfileReport(df, title='Pandas Profiling Report')

In [None]:
report.to_widgets() 

In [None]:
plt.figure(figsize=(15,15))
sns.heatmap(df.corr(),annot=True)
plt.show()

In [None]:
x = df.drop(columns='DEATH_EVENT')
y = df['DEATH_EVENT']

model = ExtraTreesClassifier()
model.fit(x,y)
print(model.feature_importances_)
feat_importances = pd.Series(model.feature_importances_, index=x.columns)
feat_importances.nlargest(12).plot(kind='barh')
plt.show()

In [None]:
df.describe()

In [None]:
df=df[df['ejection_fraction']<70]

In [None]:
## data preprocessing

#inp_data = df.drop(df[['DEATH_EVENT']], axis=1)
inp_data = df.iloc[:,[0,4,7,11]]
out_data = df[['DEATH_EVENT']]

X_train, X_test, y_train, y_test = train_test_split(inp_data, out_data, test_size=0.2, random_state=0)

## Applying Transformer
sc=StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.fit_transform(X_test)

In [None]:
## X_train, X_test, y_train, y_test Shape

print("X_train Shape : ", X_train.shape)
print("X_test Shape  : ", X_test.shape)
print("y_train Shape : ", y_train.shape)
print("y_test Shape  : ", y_test.shape)

In [None]:
## I coded this method for convenience and to avoid writing the same code over and over again

def result(clf):
    clf.fit(X_train, y_train)
    y_pred = clf.predict(X_test)
    
    print('Accuracy Score: {:.4f}'.format(accuracy_score(y_test, y_pred)))
    print('Decision Tree Classifier f1-score      : {:.4f}'.format(f1_score( y_test , y_pred)))
    print('Decision Tree Classifier precision     : {:.4f}'.format(precision_score(y_test, y_pred)))
    print('Decision Tree Classifier recall        : {:.4f}'.format(recall_score(y_test, y_pred)))
    print("Decision Tree Classifier roc auc score : {:.4f}".format(roc_auc_score(y_test,y_pred)))
    print("\n",classification_report(y_pred, y_test))
    
    plt.figure(figsize=(6,6))
    cf_matrix = confusion_matrix(y_test, y_pred)
    sns.heatmap((cf_matrix / np.sum(cf_matrix)*100), annot = True, fmt=".2f", cmap="Blues")
    plt.title("DecisionTreeClassifier Confusion Matrix (Rate)")
    plt.show()
    
    cm = confusion_matrix(y_test,y_pred)
    plt.figure(figsize=(6,6))
    sns.heatmap(cm, annot=True, cmap="Blues",
                xticklabels=["FALSE","TRUE"],
                yticklabels=["FALSE","TRUE"],
                cbar=False)
    plt.title("DecisionTreeClassifier Confusion Matrix (Number)")
    plt.show()
    
def sample_result(class_weight=None,criterion='gini',max_depth=None,max_features=None,max_leaf_nodes=None,min_samples_split=2):    
    scores = [] 
    for i in range(0,10000): # 10.000 samples
        X_train, X_test, y_train, y_test = train_test_split(inp_data, out_data, test_size=0.2)
        clf = DecisionTreeClassifier(class_weight= class_weight,
                                     criterion=criterion,
                                     max_depth=max_depth,
                                     max_features=max_features,
                                     max_leaf_nodes=max_leaf_nodes,
                                     min_samples_split=min_samples_split) 
        sc=StandardScaler()
        X_train = sc.fit_transform(X_train)
        X_test = sc.fit_transform(X_test)
        clf.fit(X_train, y_train)
        scores.append(accuracy_score(clf.predict(X_test), y_test)) 
    
    plt.hist(scores)
    plt.show()
    print("Best Score: {}\nMean Score: {}".format(np.max(scores), np.mean(scores)))

### Simple Metod
I applied Decision Tree directly without changing anything and the result is as follows

In [None]:
clf = DecisionTreeClassifier(random_state=0)
result(clf)
sample_result()

---

### Advanced Method

In [None]:
param_grid = {
    "max_depth": np.arange(1,10),
    "min_samples_split": [0.001, 0.01, 0.1, 0.2, 0.02, 0.002],
    "criterion": ["gini", "entropy", None],
    "max_leaf_nodes": np.arange(1,10),
    "class_weight": ["balanced", None]
}

clf = DecisionTreeClassifier()
grid = GridSearchCV(clf, param_grid, n_jobs=-1, verbose=2, cv=10)
grid.fit(X_train, y_train)
grid.best_params_

In [None]:
clf = DecisionTreeClassifier(
    class_weight='balanced',
    criterion='gini',
    max_depth=1,
    max_leaf_nodes=2,
    min_samples_split=0.001,
    random_state=0
)

result(clf)
# class_weight=None,criterion='gini',max_depth=None,max_features=None,max_leaf_nodes=None,min_samples_split=2
sample_result('balanced',"gini",1 ,None , 2,  0.001)

## Reporting
I evaluated the results I found with Confusion Matrix, the results are as follows:
**Correctly predicted -> %98.34 (361 of 406 predict are correct)**
- True Negative -> %71.67 (43 people) -> Those who were predicted not to die and who did not die
- True Positive -> %26.67 (16 people) -> Those who were predicted to die and who did die

**Wrong predicted-> %10.98 (45 of 406 predict are wrong)**
- False Positive -> %00.00 (0 people) -> Those who were predicted to die but who did not die
- False Negative -> %01.67 (1 people) -> Those who were predicted to not die but who did die