# Decision Tree

![image.png](attachment:image.png)

Decision Trees are an important type of algorithm for predictive modeling machine learning.

The classical decision tree algorithms have been around for decades and modern variations like random forest are among the most powerful techniques available.

Classification and Regression Trees or CART for short is a term introduced by Leo Breiman to refer to Decision Tree algorithms that can be used for classification or regression predictive modeling problems.

Classically, this algorithm is referred to as “decision trees”, but on some platforms like R they are referred to by the more modern term CART.

The CART algorithm provides a foundation for important algorithms like bagged decision trees, random forest and boosted decision trees.


# CART Model Representation

The representation for the CART model is a binary tree.

This is your binary tree from algorithms and data structures, nothing too fancy. Each root node represents a single input variable (x) and a split point on that variable (assuming the variable is numeric).

The leaf nodes of the tree contain an output variable (y) which is used to make a prediction.

Given a new input, the tree is traversed by evaluating the specific input started at the root node of the tree.

# Advantages

Simple to understand and to interpret. Trees can be visualised.

Requires little data preparation.

Able to handle both numerical and categorical data.

Possible to validate a model using statistical tests.

Performs well even if its assumptions are somewhat violated by the true model from which the data were generated.


# Disadvantages

Overfitting. Mechanisms such as pruning (not currently supported), setting the minimum number of samples required at a leaf node or setting the maximum depth of the tree are necessary to avoid this problem.

Decision trees can be unstable. Mitigant: Use decision trees within an ensemble.

Cannot guarantee to return the globally optimal decision tree. Mitigant: Training multiple trees in an ensemble learner

Decision tree learners create biased trees if some classes dominate. Recommendation: Balance the dataset prior to fitting

# Decision Tree Parameters

### criterion: 
The function to measure the quality of a split. Supported criteria are "gini" for the Gini impurity and "entropy" for the information gain.

### splitter:
The strategy used to choose the split at each node. Supported strategies are "best" to choose the best split and "random" to choose the best random split.

### max_depth:
The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.

### min_samples_split: 
The minimum number of samples required to split an internal node.

### min_samples_leaf: 
The minimum number of samples required to be at a leaf node. A split point at any depth will only be considered if it leaves at least min_samples_leaf training samples in each of the left and right branches. This may have the effect of smoothing the model, especially in regression.

### min_weight_fraction_leaf: 
The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Samples have equal weight when sample_weight is not provided.
max_features: The number of features to consider when looking for the best split.

### max_leaf_nodes: 
Grow a tree with max_leaf_nodes in best-first fashion. Best nodes are defined as relative reduction in impurity. If None then unlimited number of leaf nodes.

### min_impurity_decrease: 
A node will be split if this split induces a decrease of the impurity greater than or equal to this value.

### min_impurity_split: 
Threshold for early stopping in tree growth. A node will split if its impurity is above the threshold, otherwise it is a leaf.