# Classification and Regression Trees (CART)

* The purpose of this model is to transform the complex structures in the data set into simple decision structures.


* Heterogeneous data sets are divided into homogeneous subgroups according to a specified target variable.

![alt text](https://1.bp.blogspot.com/-z7ukEqUcfvc/UPNMA9PZRKI/AAAAAAAAA2c/FitWl1yEEC0/s1600/titanic.PNG)

## 1-)MODEL

In [1]:
import numpy as np
import pandas as pd 
from sklearn.model_selection import train_test_split

In [2]:
diabetes = pd.read_csv("diabetes.csv")
df = diabetes.copy()
df = df.dropna()
y = df["Outcome"]
X = df.drop(['Outcome'], axis=1)
X_train, X_test, y_train, y_test = train_test_split(X, y, 
                                                    test_size=0.30, 
                                                    random_state=42)



In [3]:
from sklearn.tree import DecisionTreeClassifier

In [5]:
cart = DecisionTreeClassifier()
cart_model = cart.fit(X_train, y_train)
cart_model

DecisionTreeClassifier()

In [6]:
# !pip install skompiler

In [7]:
from skompiler import skompile

In [9]:
from warnings import filterwarnings
filterwarnings('ignore')

In [10]:
print(skompile(cart_model.predict).to("python/code"))

((((((0 if x[6] <= 0.671999990940094 else 1 if x[6] <= 0.6974999904632568 else
    0) if x[5] <= 31.40000057220459 else ((0 if x[3] <= 40.5 else 1) if x[1
    ] <= 111.5 else ((1 if x[1] <= 123.0 else 0) if x[2] <= 65.0 else 0) if
    x[2] <= 72.0 else 1) if x[4] <= 9.0 else (0 if x[6] <= 
    0.6395000219345093 else 1 if x[6] <= 0.6759999990463257 else 0) if x[0] <=
    4.5 else 1 if x[2] <= 67.0 else 0) if x[5] <= 49.10000038146973 else 1) if
    x[1] <= 127.5 else 1 if x[2] <= 56.0 else (0 if x[7] <= 27.5 else 0 if 
    x[3] <= 14.5 else 1) if x[5] <= 30.300000190734863 else 1 if x[5] <= 
    32.000000953674316 else ((0 if x[5] <= 33.75 else 1) if x[0] <= 0.5 else
    1 if x[5] <= 32.45000076293945 else 0) if x[2] <= 85.0 else 1) if x[7] <=
    28.5 else (1 if x[7] <= 29.5 else (0 if x[1] <= 133.0 else (1 if x[0] <=
    6.5 else 0) if x[1] <= 135.0 else 0) if x[2] <= 94.0 else 1 if x[2] <= 
    97.0 else 0) if x[5] <= 26.949999809265137 else (1 if x[1] <= 28.5 else
    0 if x[0] <= 

## 2-)Prediction

In [11]:
y_pred = cart_model.predict(X_test)
y_pred[0:10]

array([0, 1, 0, 0, 0, 0, 0, 1, 1, 1], dtype=int64)

In [13]:
from sklearn.metrics import confusion_matrix, accuracy_score, classification_report

In [14]:
accuracy_score(y_test, y_pred) # before model tuning

0.7056277056277056

In [15]:
confusion_matrix(y_test, y_pred) # before model tuning

array([[111,  40],
       [ 28,  52]], dtype=int64)

In [16]:
print(classification_report(y_test, y_pred)) # before model tuning

              precision    recall  f1-score   support

           0       0.80      0.74      0.77       151
           1       0.57      0.65      0.60        80

    accuracy                           0.71       231
   macro avg       0.68      0.69      0.69       231
weighted avg       0.72      0.71      0.71       231



## 3-) Model tuning

* In this section, we will try to determine the optimum **max_depth, min_samples_split**  with the GridSearchCV method.


* GridSearchCV: Grid Search Cross Validation Methode



* Then , we will create the most optimum model by using optimum **max_depth, min_samples_split** .





* **max_depth, min_samples_split** are the hyperparameters that we will determine according to ourselves and we want it to be the most optimum.



* But instead of relying on our own feeling and sense in order to find the  optimum value of these hyperparameters   , we will find the optimum value of these hyperparameters   by using the gridsearch method.


* **max_depth**  is a hyperparameter that must be tuned to control the branches of the tree.


* **min_samples_split**  is the hypermaremeter that gives the minimum number of samples required for branching.

In [21]:
cart_model.min_samples_split #default value of min_samples_split

2

In [22]:
print(cart_model.max_depth) #default value of max_depth

None


In [24]:
from sklearn.model_selection import GridSearchCV

In [25]:
cart_grid = {"max_depth": range(1,10),
            "min_samples_split" : list(range(2,50)) }

In [28]:
cart = DecisionTreeClassifier()
cart_cv = GridSearchCV(cart, 
                       cart_grid, 
                       cv = 10, 
                       n_jobs = -1, verbose = 2)

In [29]:
cart_cv_model = cart_cv.fit(X_train, y_train)

Fitting 10 folds for each of 432 candidates, totalling 4320 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done  38 tasks      | elapsed:    1.9s
[Parallel(n_jobs=-1)]: Done 1772 tasks      | elapsed:    6.8s
[Parallel(n_jobs=-1)]: Done 4320 out of 4320 | elapsed:   14.9s finished


In [30]:
cart_cv_model.best_params_# optimum value of hypermaremeters

{'max_depth': 5, 'min_samples_split': 19}

### 3.1-) Tuned Model

In [33]:
cart =DecisionTreeClassifier(max_depth = 5, min_samples_split = 19)
cart_tuned = cart.fit(X_train, y_train)

In [34]:
y_pred1 = cart_tuned.predict(X_test)
y_pred1[0:10]

array([0, 0, 0, 0, 0, 0, 0, 1, 0, 0], dtype=int64)

In [35]:
accuracy_score(y_test, y_pred1)# after model tuning

0.7532467532467533

In [36]:
confusion_matrix(y_test, y_pred1)# after model tuning

array([[128,  23],
       [ 34,  46]], dtype=int64)

In [37]:
print(classification_report(y_test, y_pred1))# after model tuning

              precision    recall  f1-score   support

           0       0.79      0.85      0.82       151
           1       0.67      0.57      0.62        80

    accuracy                           0.75       231
   macro avg       0.73      0.71      0.72       231
weighted avg       0.75      0.75      0.75       231

