## Decision Tree
Decision trees are a type of supervised learning algorithm used for classification and regression tasks. They work by recursively splitting the input space into smaller regions based on the values of the input features. The splits are made based on the feature that provides the most information gain, which is a measure of how well the split separates the classes or explains the target variable.

For example, if we want to predict whether a person will buy a product or not based on their age and income, a decision tree might split the data into regions based on different age and income ranges, and assign a label to each region based on the majority class or the mean target variable. The decision tree can then make a prediction for a new input by traversing the tree and following the appropriate branches based on the input features.

Pruning is a technique used in decision tree learning to reduce the complexity of a decision tree by removing branches that provide little or no additional information. The goal of pruning is to create a simpler and more generalizable decision tree that is less prone to overfitting the training data.

Pruning can be done in two ways: pre-pruning and post-pruning.

Pre-pruning: In pre-pruning, the decision tree is grown to a certain depth or until a certain number of instances fall into a node. If the information gain obtained by splitting a node is less than a predefined threshold, the splitting process is stopped and the node is declared a leaf node.

Post-pruning: In post-pruning, the decision tree is grown to its maximum size, and then the branches that do not contribute much to the accuracy of the decision tree are removed. This is done by evaluating the accuracy of the decision tree on a validation dataset or by using statistical tests, such as the chi-squared test or cost complexity pruning.

Cost complexity pruning is a popular technique for post-pruning decision trees. It works by adding a regularization parameter, called the complexity parameter, to the cost function that measures the quality of the decision tree. The complexity parameter penalizes decision trees that are too complex by adding a term that is proportional to the number of leaf nodes in the tree. By varying the value of the complexity parameter, we can obtain a sequence of decision trees of different sizes and complexities. The decision tree with the best performance on the validation dataset is chosen as the final pruned decision tree.

We will not look at pruning here, but you can read more about it in the scikit-learn documentation.

In [1]:
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier, export_graphviz
import graphviz

In [2]:
# Load the Iris dataset
iris = load_iris()

In [3]:
X, y = iris.data, iris.target

In [6]:
clf_dectree = DecisionTreeClassifier(max_depth=3)

In [7]:
clf_dectree.fit(X, y)

In this example, we create a new input with feature values of [6.0, 3.0, 4.0, 1.8], which corresponds to a new iris sample with sepal length of 6.0 cm, sepal width of 3.0 cm, petal length of 4.0 cm, and petal width of 1.8 cm. We then use the predict method of the trained decision tree classifier to make a prediction for this input. Finally, we print the predicted class label using the iris.target_names array to map the integer class label to the corresponding class name.

In [8]:
new_input = [[6.0, 3.0, 4.0, 1.8]]
prediction = clf_dectree.predict(new_input)

In [10]:
print('Prediction:', iris.target_names[prediction][0])

Prediction: virginica


In [11]:
dot_data = export_graphviz(clf_dectree,
                           out_file=None,
                           feature_names=iris.feature_names,
                           class_names=iris.target_names,
                           filled=True,
                           rounded=True,
                           special_characters=True)
graph = graphviz.Source(dot_data)
graph.render('iris_tree', format='png')

'iris_tree.png'