## Build a single Decision Tree model

In [None]:
# # Import model.
# from sklearn.tree import DecisionTreeClassifier

In [None]:
# # Instantiate model with random_state = 42.
# dt = DecisionTreeClassifier(random_state = 42)

In [None]:
# # Fit model.
# dt.fit(X_train, y_train)

In [None]:
# # Evaluate model.
# print(f'Score on training set: {dt.score(X_train, y_train)}')
# print(f'Score on testing set: {dt.score(X_test, y_test)}')

Decision trees tend to overfit. To solve this problem,
- As with all models, try to gather more data.
- As with all models, remove some features.
- Stop our model from growing

### Tuning Hyperparameters of Decision Trees
There are four hyperparameters of decision trees that we may commonly tune in order to prevent overfitting.

- `max_depth`: The maximum depth of the tree.
    - By default, the nodes are expanded until all leaves are pure (or some other argument limits the growth of the tree).
    - In the 20 questions analogy, this is like "How many questions we can ask?"
    
    
- `min_samples_split`: The minimum number of samples required to split an internal node.
    - By default, the minimum number of samples required to split is 2. That is, if there are two or more observations in a node and if we haven't already achieved maximum purity, we can split it!
    
    
- `min_samples_leaf`: The minimum number of samples required to be in a leaf node (a terminal node at the end of the tree).
    - By default, the minimum number of samples required in a leaf node is 1. (This should ring alarm bells - it's very possible that we'll overfit our model to the data!)


- `ccp_alpha`: A [complexity parameter](https://scikit-learn.org/stable/modules/tree.html#minimal-cost-complexity-pruning) similar to $\alpha$ in regularization. As `ccp_alpha` increases, we regularize more.
    - By default, this value is 0.

[Source: Documentation](https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html).

In [None]:
# # Instantiate model with:
# # - a maximum depth of 5.
# # - at least 7 samples required in order to split an internal node.
# # - at least 3 samples in each leaf node.
# # - a cost complexity of 0.01.
# # - random state of 42.

# dt = DecisionTreeClassifier(max_depth = 5,
#                             min_samples_split = 7,
#                             min_samples_leaf = 3,
#                             ccp_alpha = 0.01,
#                             random_state = 42)

## Use GridSearch to tune hyperparameters and find better model

In [2]:
# from sklearn.model_selection import GridSearchCV

In [3]:
# grid = GridSearchCV(estimator = DecisionTreeClassifier(),
#                     param_grid = {'max_depth': [2, 3, 5, 7],
#                                   'min_samples_split': [5, 10, 15, 20],
#                                   'min_samples_leaf': [2, 3, 4, 5, 6],
#                                   'ccp_alpha': [0, 0.001, 0.01, 0.1, 1, 10]},
#                     cv = 5,
#                     verbose = 1)

In [4]:
# import time

# # Start our timer.
# t0 = time.time()

# # Let's GridSearch over the above parameters on our training data.
# grid.fit(X_train, y_train)

# # Stop our timer and print the result.
# print(time.time() - t0)