<h1> Les arbres de décision</h1>


Un arbre de décision est un schéma représentant les résultats possibles d'une série de choix interconnectés.

Il permet d'évaluer différentes actions possibles en fonction de leur coût, leur probabilité, de leurs bénéfices. 

Il peut être utilisé  pour générer un algorithme qui détermine le meilleur choix de façon mathématique.

<img src="images/decisionTree.png">

In [1]:
import os
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn import tree
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler


In [2]:
# On importe les données
dataset = pd.read_csv('datasets/titanic.csv')
dataset.head()

Unnamed: 0,PassengerId,Name,Pclass,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,Survived
0,1,"Braund, Mr. Owen Harris",3,male,22.0,1,0,A/5 21171,7.25,,S,0
1,2,"Cumings, Mrs. John Bradley (Florence Briggs Th...",1,female,38.0,1,0,PC 17599,71.2833,C85,C,1
2,3,"Heikkinen, Miss. Laina",3,female,26.0,0,0,STON/O2. 3101282,7.925,,S,1
3,4,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",1,female,35.0,1,0,113803,53.1,C123,S,1
4,5,"Allen, Mr. William Henry",3,male,35.0,0,0,373450,8.05,,S,0


In [3]:
# Premier tri
dataset.drop(['PassengerId','Name','SibSp','Parch','Ticket','Cabin','Embarked','Fare'],axis='columns',inplace=True)

In [4]:
#Separation
Features = dataset.drop('Survived',axis='columns')
y = dataset.Survived
Features.head()
#Features.shape

Unnamed: 0,Pclass,Sex,Age
0,3,male,22.0
1,1,female,38.0
2,3,female,26.0
3,1,female,35.0
4,3,male,35.0


In [5]:
# Ajustement des features
Features.Sex = Features.Sex.map({'male': 1, 'female': 2})
Features.Age = Features.Age.fillna(Features.Age.mean())
Features.head()

Unnamed: 0,Pclass,Sex,Age
0,3,1,22.0
1,1,2,38.0
2,3,2,26.0
3,1,2,35.0
4,3,1,35.0


<h3>Construire le Modèle d'arbre de décision</h3>


In [6]:
#Création jeu de train et de test

X_train, X_test, y_train, y_test = train_test_split(Features,y,test_size=0.2)
modelTree = tree.DecisionTreeClassifier(random_state=0, criterion='gini',max_depth=6 )
modelTree.fit(X_train,y_train)
accuracyTreeReel = modelTree.score(X_test,y_test)
accuracyTreeTrain = modelTree.score(X_train,y_train)

print('Accuracy Arbre x_test: ', accuracyTreeReel)
print('Accuracy Arbre x_train: ', accuracyTreeTrain)



Accuracy Arbre x_test:  0.8044692737430168
Accuracy Arbre x_train:  0.8356741573033708


<h3>Pour le fun comparons avec une régression logistique</h3>

In [10]:
modelRl = LogisticRegression(random_state = 0, solver='newton-cg')
modelRl.fit(X_train,y_train)
accuracyRlReel = modelRl.score(X_test,y_test)
accuracyRlTrain = modelRl.score(X_train,y_train)

print('Accuracy RL x_test: ', accuracyRlReel)
print('Accuracy RL x_train : ', accuracyRlTrain)

Accuracy RL x_test:  0.7821229050279329
Accuracy RL x_train :  0.7907303370786517


<h3> On génére une image de notre arbre</h3>

<img src='images/dtree2.png'>


<h3>Nouvelle prédiction</h3>

In [23]:
#Prédiction test
modelTree.predict([[1,1,54]])

array([0], dtype=int64)