# Decision Tree Assignment

Question 1: What is a Decision Tree, and how does it work in the context of classification?
Answer:

A Decision Tree is a supervised machine learning algorithm used for classification and regression, but it is most commonly used for classification tasks.
It works by splitting the dataset into smaller subsets based on the most significant features, forming a tree-like structure of decision rules.

How it works (for classification):

1. **Start at the root node**:
The algorithm looks at all features and selects the best feature to split the data.

* ‚ÄúBest‚Äù means the feature that gives the purest split, measured using impurity metrics like

* Gini Index

* Entropy (Information Gain)

2. **Create branches based on the feature split**:
The data is divided into subsets depending on the feature values.

3. **Repeat splitting**:
Each subset again splits on the best feature remaining.
This continues until any stopping condition is reached (e.g., no feature left, nodes become pure).

4. **Leaf Node (Final Decision)**:
Once the tree cannot be split further, the node becomes a leaf, which represents a class label.

**Example (simple**):

If the feature is "Age" and the rule is Age < 18, then the branches will classify individuals into categories (e.g., ‚ÄúMinor‚Äù vs ‚ÄúAdult‚Äù).

Question 2: Explain Gini Impurity and Entropy as impurity measures. How do they impact the splits in a Decision Tree?
Answer:

In Decision Trees, impurity measures are used to decide which feature gives the best split. Two common impurity measures are Gini Impurity and Entropy.

1. Gini Impurity
Definition:

Gini Impurity measures how often a randomly chosen sample would be incorrectly classified if it were labeled randomly according to the class distribution in the node.

Formula:
Gini
=
1
‚àí
‚àë
ùëñ
=
1
ùëò
ùëù
ùëñ
2
Gini=1‚àí
i=1
‚àë
k
	‚Äã

p
i
2

	‚Äã
Where:

* ùëù
ùëñ
p
i
	‚Äã

 = proportion of class i

* k = number of classes

Interpretation:

* Gini = 0 ‚Üí perfectly pure node (only one class).

* Higher values ‚Üí higher impurity.

2. Entropy (Information Gain)
Definition:

Entropy measures the uncertainty or disorder in a dataset.

Formula:
Entropy
=
‚àí
‚àë
ùëñ
=
1
ùëò
ùëù
ùëñ
log
‚Å°
2
(
ùëù
ùëñ
)
Entropy=‚àí
i=1
‚àë
k
	‚Äã

p
i
	‚Äã

log
2
	‚Äã

(p
i
	‚Äã

)
Interpretation:

Entropy = 0 ‚Üí pure node

Maximum when classes are evenly mixed (high disorder)

Information Gain is:

ùêº
ùê∫
=
Entropy(parent)
‚àí
Weighted Entropy(children)
IG=Entropy(parent)‚àíWeighted Entropy(children)

How they impact Decision Tree splits:

The algorithm evaluates each feature and calculates impurity before and after the split.

It selects the feature that reduces impurity the most:

CART algorithm ‚Üí uses Gini Impurity

ID3/C4.5 ‚Üí use Entropy (Information Gain)

Impact:

A split is better if it results in:

lower Gini

or higher Information Gain

Both measures aim to create pure child nodes, meaning the split groups similar classes together.

Question 3: What is the difference between Pre-Pruning and Post-Pruning in Decision Trees? Give one practical advantage of each.
Answer:

Decision Trees can easily become overfitted, so pruning techniques are used to control tree growth.
There are two types of pruning: Pre-Pruning and Post-Pruning.

1. **Pre-Pruning (Early Stopping)
Meaning**:

Pre-pruning stops the tree from growing too deep by applying constraints during the tree-building process.

Examples of pre-pruning rules:

Maximum depth (max_depth)

Minimum samples to split (min_samples_split)

Minimum samples per leaf (min_samples_leaf)

Minimum impurity decrease

**Advantage**:

‚úîÔ∏è Reduces training time because the tree does not grow fully.
(Useful when dataset is large.)

2. **Post-Pruning (Prune After Full Growth)
Meaning**:

Post-pruning allows the tree to grow completely, and then removes unnecessary or weak branches afterward.

**Common methods**:

Cost Complexity Pruning (CCP)

Reduced Error Pruning

**Advantage**:

‚úîÔ∏è Improves accuracy and generalization because it removes overfitted branches while keeping useful splits.
(Usually results in a more reliable model.)

Question 4: What is Information Gain in Decision Trees, and why is it important for choosing the best split?
Answer:

Information Gain (IG) is a metric used in Decision Trees to measure how much uncertainty (entropy) is reduced after splitting the dataset based on a feature.

It helps determine which feature is the best for splitting at each node.

**Definition**:

Information Gain is defined as the difference between the entropy of the parent node and the weighted entropy of the child nodes after the split.

Formula:
Information Gain
=
Entropy(parent)
‚àí
‚àë
ùëñ
=
1
ùëò
ùëõ
ùëñ
ùëõ
√ó
Entropy(child
ùëñ
)
Information Gain=Entropy(parent)‚àí
i=1
‚àë
k
	‚Äã

n
n
i
	‚Äã

	‚Äã

√óEntropy(child
i
	‚Äã

)

Where:



* n
i
	‚Äã

 = number of samples in child node i

* ùëõ
n = total samples in parent node

Entropy = measure of impurity/disorder

**Why is Information Gain important?**

1. **Helps choose the best feature for splitting**:
The feature with the highest Information Gain creates the purest child nodes.

2. **Reduces uncertainty**:
Higher IG means the split reduces more randomness in class labels.

3. **Improves accuracy**:
Better splits lead to more accurate and generalized decision trees.

4. **Guides tree growth**:
IG ensures the tree grows in a meaningful direction by selecting informative features.

Question 5: What are some common real-world applications of Decision Trees, and what are their main advantages and limitations?

Answer:

Decision Trees are widely used in various industries because they are simple, interpretable, and effective for both classification and regression tasks.

**Real-world applications of Decision Trees:**

1. **Medical Diagnosis:**
Classifying diseases based on symptoms, test reports, and patient history.

2. **Banking & Finance:**

* Loan approval

* Credit risk assessment

* Fraud detection

3. **Marketing & Customer Segmentation:**
Predicting customer behavior, buying patterns, and churn.

4. **E-commerce:**
Product recommendation and predicting customer purchases.

5. **HR & Recruitment:**
Screening candidates based on job requirements and skill levels.

6. **Manufacturing:**
Identifying defective products and quality control.

**Advantages of Decision Trees:**

1. **Easy to understand and interpret:**
No mathematical background needed to interpret the model.

2. **Handles both numerical and categorical data**.

3.  **Requires little data preprocessing:**
No need for scaling or normalization.

4. **Works well for small to medium datasets**.

5. **Can capture non-linear relationships**.

**Limitations of Decision Trees:**

1.**Prone to overfitting:**
Trees can become very deep and memorize the training data.

2.**Unstable model:**
Small changes in data can produce completely different trees.

3. **Biased splits for features with many categories.**

4. **Lower accuracy compared to ensemble methods**
(e.g., Random Forest, Gradient Boosting).


Question 6: Write a Python program to:
‚óè Load the Iris Dataset
‚óè Train a Decision Tree Classifier using the Gini criterion
‚óè Print the model‚Äôs accuracy and feature importances


In [1]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

# 1. Load the Iris Dataset
iris = load_iris()
X = iris.data            # Features
y = iris.target          # Labels

# 2. Train-test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)

# 3. Train Decision Tree Classifier (criterion = 'gini')
model = DecisionTreeClassifier(criterion='gini', random_state=42)
model.fit(X_train, y_train)

# 4. Predict and calculate accuracy
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)

# 5. Print accuracy and feature importances
print("Model Accuracy:", accuracy)
print("Feature Importances:")
for feature, importance in zip(iris.feature_names, model.feature_importances_):
    print(f"{feature}: {importance:.4f}")


Model Accuracy: 1.0
Feature Importances:
sepal length (cm): 0.0000
sepal width (cm): 0.0191
petal length (cm): 0.8933
petal width (cm): 0.0876


Question 7: Write a Python program to:

‚óè Load the Iris Dataset

‚óè Train a Decision Tree Classifier with max_depth=3 and compare its accuracy to
a fully-grown tree.


In [2]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

# 1. Load the Iris dataset
iris = load_iris()
X = iris.data
y = iris.target

# 2. Train-test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)

# 3. Decision Tree with max_depth = 3
tree_limited = DecisionTreeClassifier(max_depth=3, random_state=42)
tree_limited.fit(X_train, y_train)
y_pred_limited = tree_limited.predict(X_test)
acc_limited = accuracy_score(y_test, y_pred_limited)

# 4. Fully-grown Decision Tree (no depth limit)
tree_full = DecisionTreeClassifier(random_state=42)
tree_full.fit(X_train, y_train)
y_pred_full = tree_full.predict(X_test)
acc_full = accuracy_score(y_test, y_pred_full)

# 5. Print comparison
print("Accuracy with max_depth=3:", acc_limited)
print("Accuracy with full tree:", acc_full)


Accuracy with max_depth=3: 1.0
Accuracy with full tree: 1.0


Question 8: Write a Python program to:

‚óè Load the Boston Housing Dataset

‚óè Train a Decision Tree Regressor

‚óè Print the Mean Squared Error (MSE) and feature importances

In [4]:
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import mean_squared_error

# 1. Load the California Housing Dataset
boston = fetch_california_housing()
X = boston.data
y = boston.target

# 2. Train-test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)

# 3. Train Decision Tree Regressor
model = DecisionTreeRegressor(random_state=42)
model.fit(X_train, y_train)

# 4. Predictions and MSE
y_pred = model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)

# 5. Print results
print("Mean Squared Error (MSE):", mse)
print("\nFeature Importances:")
for feature, importance in zip(boston.feature_names, model.feature_importances_):
    print(f"{feature}: {importance:.4f}")

Mean Squared Error (MSE): 0.5280096503174904

Feature Importances:
MedInc: 0.5235
HouseAge: 0.0521
AveRooms: 0.0494
AveBedrms: 0.0250
Population: 0.0322
AveOccup: 0.1390
Latitude: 0.0900
Longitude: 0.0888


Question 9: Write a Python program to:

‚óè Load the Iris Dataset

‚óè Tune the Decision Tree‚Äôs max_depth and min_samples_split using
GridSearchCV

‚óè Print the best parameters and the resulting model accuracy

In [5]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

# 1. Load the Iris dataset
iris = load_iris()
X = iris.data
y = iris.target

# 2. Train-test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)

# 3. Define the model
model = DecisionTreeClassifier(random_state=42)

# 4. Define the parameter grid
param_grid = {
    "max_depth": [2, 3, 4, 5, None],
    "min_samples_split": [2, 3, 5, 10]
}

# 5. GridSearchCV for hyperparameter tuning
grid = GridSearchCV(
    estimator=model,
    param_grid=param_grid,
    cv=5,                 # 5-fold cross validation
    scoring='accuracy'
)

# 6. Fit the model
grid.fit(X_train, y_train)

# 7. Best parameters
print("Best Parameters:", grid.best_params_)

# 8. Evaluate with best estimator
best_model = grid.best_estimator_
y_pred = best_model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)

print("Model Accuracy with Best Parameters:", accuracy)


Best Parameters: {'max_depth': 4, 'min_samples_split': 10}
Model Accuracy with Best Parameters: 1.0
