# What is a decision tree?

Decision tree learns a set of rules from data which agree with most of the data. For example, today when you woke up, you had to make breakfast, *if you had eggs, you could make omelette, otherwise, you will make simple toast*. This decision which you took is exactly the one like decision trees takes after looking at your data.

Based on the number of your variables, the decision tree finds the most important variable (using [Gini Index](https://en.wikipedia.org/wiki/Decision_tree_learning#Gini_impurity) or [Information Gain](https://medium.com/geekculture/do-you-know-what-information-gain-is-5ac15d9cf7f9?source=friends_link&sk=61698a055a19e0d95ddcbcb13a2ef226)) and based on this creates a condition. In the next step, in the new data, it does the same until there are no variables left and we finally get a decision.

In short, decision tree is deriving a set of rules from your data that give you the right output.

Let's see a minimal example now.

In [None]:
import numpy as np
import pandas as pd
import sklearn

We first read the data ([Source](https://www.kaggle.com/abhishekvermasg1/decision-tree)). We are [Pandas](https://pandas.pydata.org/) library for reading  the CSV file.

In [None]:
df = pd.read_csv('../input/decision-tree/minimal2var.csv')

Let's take a look at the data.

In [None]:
df.head(5)

We have two independent variables (x1 and x2) in the data and we need to predict y. Since, the output is binary (0 or 1), this becomes a classification problem.

For solving this classification task, we will be using decision tree from [scikit-learn](https://scikit-learn.org/).

In [None]:
from sklearn import tree

clf = tree.DecisionTreeClassifier()

The object 'clf' corresponds to the DecisionTreeClassifier. This object will take in data and train decision tree for us.

In [None]:
clf = clf.fit(df[['x1', 'x2']], df['y'])

Above we train the decision tree using *fit()* function. The first argument is x (input features) and the second argument is y (output). 

In case, you are confused how the indexing is done in Pandas, read [this](https://towardsdatascience.com/essential-pandas-every-data-scientist-should-know-in-2021-c642719a78bb?sk=a12a92ba455434140092871a7cbb1943).

In [None]:
clf.predict([[1, 0]])

Above, we predict data using our classifier. We see that the output is 1.

Let's also see the probability of the above example being either in class 0 or class 1.

In [None]:
clf.predict_proba([[1, 0]])

We see that probability for class 0 is 0.48 while class 1 is 0.51. The predict function returns the class with highest probability, hence, class 1 was predicted.

Let's plot the tree to understand what are the decisions it is taking inside.

In [None]:
import matplotlib.pyplot as plt
import graphviz
plt.figure(figsize=(10,10))

# tree.plot_tree(clf, filled=True) 

dot_data = tree.export_graphviz(clf, out_file=None, 
                    node_ids=True,
                    class_names=['0', '1'], feature_names=['x1', 'x2'],
                      filled=True, rounded=True,  
                      special_characters=True)  
graph = graphviz.Source(dot_data)  
graph 

Above we see the structure of decision tree.

Our input was (1, 0). So, x1 = 1 and x2 = 0. Looking at the node 0, we see the first rule x1 <= 0.5. It is not the case (x1 = 1) in our input, so, we go to the right i.e. node 4. We see the second rule, x2 <= 0.5. In our case, it is true (x2 = 0). So, we finally settle at node 5. We finally see class 1 written there, which was what *predict()* function returned.

# Why are decision trees important?
* They give you rules for data which helps you understand the algorithm's inner process. Thus, decision trees are highly sought after in areas like banking and finance where you cannot trust the algorithm blindly.

# What kind of data I should decision tree?
* Tabular data!

# What next?
* Look at libraries like [XGBoost](https://xgboost.readthedocs.io/en/latest/), [LightGBM](https://lightgbm.readthedocs.io/en/latest/) and [CatBoost](https://catboost.ai/) which are used to create decision trees that run the world. The decision trees inside these libraries are derived in more complex way then above. If you are interested, the concepts you need to see is [Gradient Boosting](https://en.wikipedia.org/wiki/Gradient_boosting). Also, before feeding data to these libraries, you need to pre-process your data i.e. convert it into numbers, normalize it and handle missing values.