## Overview

Decision trees are a type of supervised learning that can be used for both classification and regression problems. Their advantages are that they are easy to understand and visualize and can handle both numeric and categorical data.

### Decision Tree Basics

A decision  tree algorithm is pretty straightforward. We have a data set that we split or fork by asking a series of questions. The data is evaluated at each question, or *node*, and then split according to the answer to that question. Each split is called a *branch*, and each branch ends in a node. 

So how do we decide what to split the node on? Since decision trees can be used for both regression and classification tasks, we use two different methods to split on a node. For a regression task with continuous variables, we minimize the variance of the values. For a classification task, the Gini impurity is used to measure the "purity" of the split. If all the values belong to one class, the node has the maximum purity.

A decision tree can have as many layers as needed. Usually a node has two branches, but it can have more. We stop branching when there is no reduction in either the variance or the Gini impurity value.


## Follow Along

We'll implement a decision tree in scikit-learn with the penguins data from the previous objective. We want to classify each penguin as male or female based on the physical characteristics and the species.

In [1]:
# Imports!
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import OneHotEncoder

# Use the decision tree classifier 
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split

# Set-up the one-hot encoder method
categorical_features = ['species']
categorical_transformer = Pipeline(steps=[('onehot', OneHotEncoder())])

# Set up our preprocessor/column transformer
preprocessor = ColumnTransformer(
    transformers=[
        ('cat', categorical_transformer, categorical_features)])

# Add the classifier to the preprocessing pipeline
pipeline = Pipeline(steps=[('preprocessor', preprocessor),
                      ('classifier', DecisionTreeClassifier())])

In [2]:
# Load in the data!

import pandas as pd
import seaborn as sns

penguins = sns.load_dataset("penguins")
penguins.dropna(inplace=True)

# Select features
features = ['species', 'culmen_length_mm', 'culmen_depth_mm', 'flipper_length_mm', 'body_mass_g']
X = penguins[features]

# Encode the 'sex' column
from sklearn import preprocessing
le = preprocessing.LabelEncoder()
penguins['sex_encode'] = le.fit_transform(penguins['sex'])

# Set target array
y = penguins['sex_encode']

# Apply the pipeline

# Separate into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25)

# Fit the model with our logistic regression classifier
pipeline.fit(X_train, y_train)
print("model score: %.3f" % pipeline.score(X_test, y_test))

model score: 0.429


It looks like we have a model that performs slightly better than the logistic regression model from earlier!

## Challenge

The decision tree classifier has several parameters that can be adjusted. Another extension on this objective would be to change the features that the model is trained on; try removing the encoded categorical columns and train just using the numeric columns.

## Additonal Resources

* [Scikit-learn User Guide: Decision Tree](https://scikit-learn.org/stable/modules/tree.html#tree)
* [Scikit-learn: Decision Tree Classifier](https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html#sklearn.tree.DecisionTreeClassifier)

---

## Overview

When we evaluate a linear model, we look at the coefficients for each of the parameters and analyze the importance of each parameter to the model. But decision tree models don't have coefficients; instead we look at the *feature importance* when interpreting a model.

The overall importance of a feature in a decision tree has a relatively simple interpretation. For the tree, we go through all of the splits for which the feature was used and then determine how much it has reduced the variance or Gini index compared to the parent node. If the feature has a large share of the reduction, then it has a greater importance for the model. Another way to look at feature importance is as a measure of how early and often a feature is used for the tree's "branching" decisions.

Like most predictive modeling tools and techniques, feature importances are useful, but have trade-offs, make assumptions, and can be misinterpreted. We'll continue to discuss through this lesson and the unit.

## Follow Along

In this section, we'll implement a decision tree model on a new data set about wine quality. This data is available with the [scikit-learn dataset library](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_wine.html).

The wine dataset is a classic and very easy multi-class classification dataset.

* Three classes with samples per class of: `[59,71,48]`
* Samples total - 178
* Dimensionality - 13
* Features - real, positive

The goal is to classify the wine in one of three classes using the characteristic features such as alcohol content, flavor, hue, etc.

In [3]:
# Import libraries and data sets
from sklearn.datasets import load_wine
import pandas as pd

# Load the data and convert to a DataFrame
data = load_wine()
df_wine = pd.DataFrame(data.data, columns=data.feature_names)
df_wine['target'] = pd.Series(data.target)

display(df_wine.shape)
df_wine.head()

(178, 14)

Unnamed: 0,alcohol,malic_acid,ash,alcalinity_of_ash,magnesium,total_phenols,flavanoids,nonflavanoid_phenols,proanthocyanins,color_intensity,hue,od280/od315_of_diluted_wines,proline,target
0,14.23,1.71,2.43,15.6,127.0,2.8,3.06,0.28,2.29,5.64,1.04,3.92,1065.0,0
1,13.2,1.78,2.14,11.2,100.0,2.65,2.76,0.26,1.28,4.38,1.05,3.4,1050.0,0
2,13.16,2.36,2.67,18.6,101.0,2.8,3.24,0.3,2.81,5.68,1.03,3.17,1185.0,0
3,14.37,1.95,2.5,16.8,113.0,3.85,3.49,0.24,2.18,7.8,0.86,3.45,1480.0,0
4,13.24,2.59,2.87,21.0,118.0,2.8,2.69,0.39,1.82,4.32,1.04,2.93,735.0,0


Here we have  13 features and one target column. The features are numeric so we won't need to worry about categorical encoding for this example. We first need to create our feature matrix and target array.

In [4]:
# Separate into features and target
X = df_wine.drop('target', axis=1)
y = df_wine['target']

# Import train_test_split function
from sklearn.model_selection import train_test_split

# Split dataset into training set and test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25)

In [5]:
# Use the decision tree classifier 
from sklearn.tree import DecisionTreeClassifier

# Instantiate the classifier
classifier=DecisionTreeClassifier()

# Train the model using the training sets
classifier.fit(X_train,y_train)

# Find the model score
print("Decision tree model score: %.3f" % classifier.score(X_test, y_test))


Decision tree model score: 0.978


We fit a decision tree model! The results look good: the model seems to be able to predict the class of wine quite well given those 13 characteristics of the wine. Now, let's look at the feature importance. We do this by plotting each feature's contribution to the model on a bar chart. The total contribution of all the features is normalized to 100 (or sometimes 1), so each feature is some percentage of that.

In [6]:
# Plot the feature importances
import matplotlib.pyplot as plt

importances = pd.Series(classifier.feature_importances_, X.columns)

# Plot top n feature importances
n = 13
plt.figure(figsize=(10,n/2))
plt.title(f'Top {n} features')
importances.sort_values()[-n:].plot.barh()

plt.clf()

<Figure size 720x468 with 0 Axes>

![mod1_obj4_tree_wine.png](https://raw.githubusercontent.com/LambdaSchool/data-science-canvas-images/main/unit_2/sprint_2/mod1_obj4_tree_wine.png)

For our model, it looks like the top three features contribute the most to the model, by a significant fraction.

## Challenge

For the above model, we used the default parameters. Using the scikit-learn documentation, explore some of the other parameters. Using the above code, run the model again, but with different parameters. A few to try would be `criterion` (how to split)  and the `max_depth` (how many nodes).

## Additonal Resources

* [Scikit-learn User Guide: Decision Tree](https://scikit-learn.org/stable/modules/tree.html#tree)
* [Scikit-learn: Decision Tree Classifier](https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html#sklearn.tree.DecisionTreeClassifier)