# Decision Trees in sklearn

## Hyperparameters

When we define the model, we can specify the hyperparameters. In practice, the most common ones are

* `max_depth`: The maximum number of levels in the tree.
* `min_samples_leaf`: The minimum number of samples allowed in a leaf.
* `min_samples_split`: The minimum number of samples required to split an internal node.
* `max_features`: The number of features to consider when looking for the best split.

For example, here we define a model where the maximum depth of the trees `max_depth` is 7, and the minimum number of elements in each leaf `min_samples_leaf` is 10.

`>>> model = DecisionTreeClassifier(max_depth = 7, min_samples_leaf = 10)`

In [2]:
# Import statements 
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
import pandas as pd
import numpy as np

# Read the data.
data = np.asarray(pd.read_csv('data.csv', header=None))
# Assign the features to the variable X, and the labels to the variable y. 
X = data[:,0:2]
y = data[:,2]

# Create the decision tree model and assign it to the variable model.
# lay with hyperparameters such as max_depth and min_samples_leaf
# and see what they do to the decision boundary.
model = DecisionTreeClassifier(max_depth=7, min_samples_leaf=1)

# Fit the model.
model.fit(X, y)

# Make predictions. Store them in the variable y_pred.
y_pred = model.predict(X)

# Calculate the accuracy and assign it to the variable acc.
acc = accuracy_score(y, y_pred)