# Decision Tree in Machine Learning

## Definition
- A Decision Tree is used for classification and regression tasks.
- It is a nested if-else structure for decision-making based on feature values.
- The model divides data into subsets using axis-parallel hyperplanes (splits based on one feature).
- It predicts categorical outcomes (classification) or continuous values (regression).

## Purpose
- Breaks down complex decisions into a simpler, interpretable structure.
- Provides a clear visual representation of decision-making.
- Makes predictions by partitioning the feature space into homogeneous regions.


In [1]:
## 


## Tree Structure and Components

- Root Node
- Internal Nodes
- Leaf Nodes
- Branches
- Splits

<img src='images/tree dt.jpeg' width='650px'>

In [2]:
## 

## Geometric Intitution:

Example: Iris dataset,

<img src='images/geo dt.png' width='750px'>

In [3]:
## 

## Sample Decision Tree Creation:

<img src='images/tennis ex.png' width='800px'>

## Iterative Dichotomiser 3(ID3):

1. Calculate Entropy of the entire dataset → 0.94.
2. Calculate Entropy of each feature (based on class distributions after split).
3. Compute Information Gain (IG) for each feature.
4. Feature with highest IG (Outlook) is selected as the root node.
5. Repeat steps recursively for remaining branches until leaf nodes are pure.


### a. Entropy
- Entropy is a measure of the impurity or randomness in the dataset. It helps in deciding the best feature to split at each node.
- Formula:  
  $
  \text{Entropy}(S) = - \sum_{i=1}^{n} p_i \log_2(p_i)
  $
  Where:
  - $ p_i $ is the probability of class $i$ in the set $S$.
  - $n$ is the number of distinct classes.

<img src='images/entropy.png' width='350px'>


### b. Information Gain
- Information Gain is the reduction in entropy achieved by partitioning the data based on a feature.
- It measures how well a feature splits the dataset into pure classes.
- Formula:  
  $
  \text{Information Gain}(S, A) = \text{Entropy}(S) - \sum_{v \in \text{Values}(A)} \frac{|S_v|}{|S|} \cdot \text{Entropy}(S_v)
  $
  Where:
  - $ S $ is the dataset.
  - $ A $ is the feature.
  - $ S_v $ is the subset of $S$ for which feature $A$ has value $v$.


In [4]:
##

## Decision Tree Calculation for "Play Tennis" Dataset

### Step 1: Entropy of the Entire Dataset
The entropy formula:
$
H(S) = - \sum p_i \log_2(p_i)
$
Where $ p_i $ is the probability of each class.

From the dataset:
- Yes = 9
- No = 5
- Total = 14

$
H(S) = - \left( \frac{9}{14} \log_2 \frac{9}{14} + \frac{5}{14} \log_2 \frac{5}{14} \right)
$

Approximating log values:
$
H(S) = - (0.642 \times -0.639 + 0.357 \times -1.485)
$
$
H(S) = - (-0.41 - 0.53) = 0.94
$

So, Entropy of the dataset = 0.94.

---

### Step 2: Entropy of Each Feature
#### Feature: Outlook
| Outlook  | Yes | No | Total | Entropy |
|----------|----|----|------|---------|
| Sunny    | 2  | 3  | 5    | $ H = - (2/5 \log_2 2/5 + 3/5 \log_2 3/5) $ = 0.971 |
| Overcast | 4  | 0  | 4    | $ H = - (4/4 \log_2 4/4) $ = 0.0 |
| Rainy    | 3  | 2  | 5    | $ H = - (3/5 \log_2 3/5 + 2/5 \log_2 2/5) $ = 0.971 |

$
H_{\text{Outlook}} = \frac{5}{14} \times 0.971 + \frac{4}{14} \times 0 + \frac{5}{14} \times 0.971
$
$
= 0.346 + 0 + 0.346 = 0.693
$

#### Feature: Humidity
| Humidity | Yes | No | Total | Entropy |
|----------|----|----|------|---------|
| High     | 3  | 4  | 7    | $ H = - (3/7 \log_2 3/7 + 4/7 \log_2 4/7) $ = 0.985 |
| Normal   | 6  | 1  | 7    | $ H = - (6/7 \log_2 6/7 + 1/7 \log_2 1/7) $ = 0.592 |

$
H_{\text{Humidity}} = \frac{7}{14} \times 0.985 + \frac{7}{14} \times 0.592
$
$
= 0.493 + 0.296 = 0.789
$

#### Feature: Windy
| Windy | Yes | No | Total | Entropy |
|-------|----|----|------|---------|
| False | 6  | 2  | 8    | $ H = - (6/8 \log_2 6/8 + 2/8 \log_2 2/8) $ = 0.811 |
| True  | 3  | 3  | 6    | $ H = - (3/6 \log_2 3/6 + 3/6 \log_2 3/6) $ = 1.0 |

$
H_{\text{Windy}} = \frac{8}{14} \times 0.811 + \frac{6}{14} \times 1.0
$
$
= 0.463 + 0.429 = 0.892
$

---

### Step 3: Information Gain (IG)
$
IG = H(S) - H(\text{feature})
$

| Feature  | Entropy | Information Gain |
|----------|---------|-----------------|
| Outlook  | 0.693   | $ 0.94 - 0.693 = 0.247 $ |
| Humidity | 0.789   | $ 0.94 - 0.789 = 0.151 $ |
| Windy    | 0.892   | $ 0.94 - 0.892 = 0.048 $ |

- Outlook has the highest Information Gain (0.247), so it is selected as the root node.

---

### Step 4: Recursive Splitting
1. Select Outlook as the root node (highest IG).
2. Create branches for each Outlook value:
   - Overcast → Always Yes (pure node).
   - Sunny & Rainy → Further split based on Humidity & Wind.
3. Repeat steps until all leaf nodes are pure.

---

## LIKE AS:

### Step 5: Splitting "Sunny" Branch using Humidity
We now split the Sunny branch using the Humidity feature.

#### Entropy for Humidity in "Sunny" branch
| Humidity | Yes | No | Total | Entropy |
|----------|----|----|------|---------|
| High     | 0  | 3  | 3    | $ H = - (3/3 \log_2 3/3) = 0 $ |
| Normal   | 2  | 0  | 2    | $ H = - (2/2 \log_2 2/2) = 0 $ |

$
H_{\text{Sunny}} = \frac{3}{5} \times 0 + \frac{2}{5} \times 0 = 0
$

#### Information Gain for Humidity in "Sunny"
$
IG = H(Sunny) - H(Humidity)
$

$
IG = 0.971 - 0 = 0.971
$

✅ Since IG is 1 (pure split), Humidity is selected, and the Sunny branch is fully classified.

---

### Step 6: Splitting "Rainy" Branch using Windy
Next, we split the Rainy branch using the Windy feature.

#### Entropy for Windy in "Rainy" branch
| Windy  | Yes | No | Total | Entropy |
|--------|----|----|------|---------|
| Weak   | 3  | 0  | 3    | $ H = - (3/3 \log_2 3/3) = 0 $ |
| Strong | 0  | 2  | 2    | $ H = - (2/2 \log_2 2/2) = 0 $ |

$
H_{\text{Rainy}} = \frac{3}{5} \times 0 + \frac{2}{5} \times 0 = 0
$

#### Information Gain for Windy in "Rainy"
$
IG = H(Rainy) - H(Windy)
$

$
IG = 0.971 - 0 = 0.971
$

✅ Since IG is 1 (pure split), Windy is selected, and the Rainy branch is fully classified.

---

### Final Decision Tree

<img src='images/final dt.png' width='550px'>

In [5]:
## 


### Gini Impurity
- Gini Impurity is another metric to measure the "impurity" of a dataset. It helps in splitting data at each node.
- It is commonly used in CART (Classification and Regression Trees).
- Formula:  
  $
  \text{Gini Impurity}(S) = 1 - \sum_{i=1}^{n} p_i^2
  $
  Where:
  - $ p_i $ is the probability of class $i$ in the set $S$.
  - $n$ is the number of distinct classes.


In [6]:
##

In [7]:
##

## 

In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.tree import DecisionTreeClassifier, export_text
from sklearn.metrics import accuracy_score

# Step 1: Load the dataset
file_path = "Datasets/play_tennis_dataset.csv"  # Update with your actual file path
df = pd.read_csv(file_path)

# Step 2: Convert categorical features into numerical values
encoder = LabelEncoder()
for column in df.columns:
    df[column] = encoder.fit_transform(df[column])

# Step 3: Split dataset into training and testing sets (80% training, 20% testing)
X = df.drop(columns=["PlayTennis"])  # Features
y = df["PlayTennis"]  # Target variable

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Step 4: Train Decision Tree Classifier
model = DecisionTreeClassifier(criterion="gini", random_state=42,max_depth=3)
model.fit(X_train, y_train)

# Step 5: Predict on test data
y_pred = model.predict(X_test)

# Step 6: Evaluate model accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Decision Tree Accuracy: {accuracy * 100:.2f}%")

# Step 7: Display Decision Tree Rules
tree_rules = export_text(model, feature_names=list(X.columns))
print(tree_rules)


Decision Tree Accuracy: 100.00%
|--- Outlook <= 1.50
|   |--- class: 1
|--- Outlook >  1.50
|   |--- Humidity <= 0.50
|   |   |--- class: 0
|   |--- Humidity >  0.50
|   |   |--- class: 1

