# 🌳 Basics of Decision Tree Model

Welcome to this detailed guide on Decision Tree models! Let's explore the components, working, and applications of this intuitive machine learning algorithm. 🚀

## 1. Definition

A Decision Tree is a supervised learning algorithm used for classification and regression tasks. It splits the data into subsets based on feature values, forming a tree-like structure. 🌲

## 2. Components of a Decision Tree

- **Root Node**: The top node representing the entire dataset. 🌳
- **Splitting**: The process of dividing a node into two or more sub-nodes. ✂️
- **Decision Node**: A node that splits into further sub-nodes. 🔀
- **Leaf/Terminal Node**: A node that does not split further and represents a class label or output. 🍃
- **Pruning**: The process of removing sub-nodes to prevent overfitting. ✂️
- **Branch/Sub-Tree**: A subsection of the entire tree. 🌿
- **Parent and Child Node**: Nodes connected by a branch, where the parent splits into child nodes. 👪

## 3. How Decision Trees Work

- Start at the root node.
- Split the data based on the best attribute using a selection measure.
- Repeat splitting recursively for each child node.
- Stop when nodes are pure or meet stopping criteria.

## 4. Steps to Build a Decision Tree

1. Select the best attribute using attribute selection measures. 🔍
2. Split the dataset into subsets based on the attribute. ✂️
3. Repeat recursively for each subset. 🔄
4. Stop when all data in a node belongs to the same class or no further splitting is possible. 🛑

## 5. Attribute Selection Measures

- **Information Gain**: Measures the reduction in entropy after a dataset is split on an attribute. 📉
- **Gini Index**: Measures the impurity of a dataset; lower values indicate better splits. ⚖️
- **Chi-Square**: Statistical test to measure the independence of attributes. 📊

## 6. Example: Building a Decision Tree

Consider a dataset of weather conditions and whether to play tennis. The decision tree splits based on attributes like outlook, humidity, and wind to predict the outcome. 🌞🌧️

## 7. Python Code Example
```python
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix

# Load dataset
iris = load_iris()
X = iris.data
y = iris.target

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Create Decision Tree classifier
clf = DecisionTreeClassifier()
clf.fit(X_train, y_train)

# Predict
y_pred = clf.predict(X_test)

# Evaluate
print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))
```


In [1]:
data = [{"Study Hours":5 , "Attendance": 80, "Result":"Pass"},
{"Study Hours":3 , "Attendance": 60, "Result":"Fail"},
{"Study Hours":4 , "Attendance": 70, "Result":"Pass"},
{"Study Hours":2 , "Attendance": 50, "Result":"Fail"}]

In [2]:
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn import metrics
import pandas as pd

In [None]:
# sample data
data = {'Study Hours': [5, 3, 4, 2],
        'Attendance': [80, 60, 70, 50],
        'Result': ['Pass', 'Fail', 'Pass', 'Fail']
}
df = pd.DataFrame(data)
df

Unnamed: 0,Study Hours,Attendance,Result
0,5,80,Pass
1,3,60,Fail
2,4,70,Pass
3,2,50,Fail


In [4]:
# Features and target variable
X = df[['Study Hours', 'Attendance']]
y = df['Result']

In [16]:
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=2529)

In [17]:
# Create Decision Tree Classifier
clf = DecisionTreeClassifier()

In [18]:
# Train Decision Tree Classifier
clf = clf.fit(X_train, y_train)

In [19]:
# Predict the response for test dataset
y_pred = clf.predict(X_test)

In [20]:
# Model Accuracy
print("Accuracy:", metrics.accuracy_score(y_test, y_pred))
# Confusion Matrix
print("Confusion Matrix:\n", metrics.confusion_matrix(y_test, y_pred))
# Classification Report
print("Classification Report:\n", metrics.classification_report(y_test, y_pred))

Accuracy: 1.0
Confusion Matrix:
 [[1]]
Classification Report:
               precision    recall  f1-score   support

        Fail       1.00      1.00      1.00         1

    accuracy                           1.00         1
   macro avg       1.00      1.00      1.00         1
weighted avg       1.00      1.00      1.00         1





In [21]:
# Visualizing the Decision Tree
from sklearn.tree import export_text
# Display the decision tree rules
print("Decision Tree Rules:\n", export_text(clf, feature_names=['Study Hours', 'Attendance']))

Decision Tree Rules:
 |--- Study Hours <= 3.00
|   |--- class: Fail
|--- Study Hours >  3.00
|   |--- class: Pass



## 8. Applications of Decision Tree Models

- Credit Scoring 💳
- Medical Diagnosis 🏥
- Customer Segmentation 👥
- Risk Management ⚠️

## 9. Advantages

- Easy to understand and interpret. 🧠
- Requires little data preprocessing. 🧹
- Can handle both numerical and categorical data. 🔢🔤
- Non-parametric and flexible. 🔄

## 10. Disadvantages

- Prone to overfitting. ⚠️
- Can be unstable with small variations in data. 🔄
- Biased towards attributes with more levels. ⚖️

## 11. Conclusion

Decision Trees are powerful and interpretable models widely used in machine learning. Understanding their components and working is essential for effective application. 🌟