Question 1: What is a Decision Tree, and how does it work in the context of classification?
Answer:
Definition:

A Decision Tree is a supervised machine learning algorithm used for both classification and regression tasks.
It works by splitting data into subsets based on feature values, forming a tree-like structure of decisions.

Working Principle (for Classification):

The root node represents the entire dataset.

The algorithm chooses a feature that best splits the data based on a chosen impurity measure (like Gini Impurity or Entropy).

The dataset is recursively split into branches (sub-nodes) until:

The data is perfectly classified, or

A stopping criterion (like max depth or min samples) is met.

Each leaf node represents a class label (output).

Example:

If we want to classify whether a person will buy a car:

Root node: “Age”

Left branch: Age < 30 → “No”

Right branch: Age ≥ 30 → “Yes”

Purpose:

Decision Trees help to make decisions similar to human reasoning with if-else rules, making them easy to interpret and visualize.




Question 2: Explain the concepts of Gini Impurity and Entropy as impurity measures. How do they impact the splits in a Decision Tree?
Answer:
1. Gini Impurity:

Measures how often a randomly chosen element from the set would be incorrectly labeled.

𝐺
𝑖
𝑛
𝑖
=
1
−
∑
𝑖
=
1
𝑛
𝑝
𝑖
2
Gini=1−
i=1
∑
n
	​

p
i
2
	​


Where
𝑝
𝑖
p
i
	​

 = probability of class
𝑖
i.

Gini = 0: Perfectly pure node (all samples belong to one class).

Gini = 0.5: Maximum impurity (equal class distribution).

2. Entropy:

Measures the level of uncertainty or disorder in the dataset.

𝐸
𝑛
𝑡
𝑟
𝑜
𝑝
𝑦
=
−
∑
𝑖
=
1
𝑛
𝑝
𝑖
log
⁡
2
(
𝑝
𝑖
)
Entropy=−
i=1
∑
n
	​

p
i
	​

log
2
	​

(p
i
	​

)

Entropy = 0: Perfectly pure node.

Entropy = 1: Maximum impurity.

Impact on Splits:

Decision Trees choose the feature split that reduces impurity the most.

The impurity decrease (difference before and after split) helps identify the best feature.

Smaller impurity ⇒ purer nodes ⇒ better splits.



Question 3: What is the difference between Pre-Pruning and Post-Pruning in Decision Trees? Give one practical advantage of using each.
Answer:
Aspect	Pre-Pruning (Early Stopping)	Post-Pruning (Reduced Error Pruning)
When applied	During tree construction	After full tree is grown
How	Stop growing tree when certain conditions are met (e.g., max_depth, min_samples_split)	Grow full tree, then remove unnecessary branches
Goal	Prevent overfitting early	Simplify overfitted tree
Advantage	Faster training time	Better generalization and interpretability

Example:

Pre-Pruning: Limit max_depth=3

Post-Pruning: Remove branches that don’t improve validation accuracy.




Question 4: What is Information Gain in Decision Trees, and why is it important for choosing the best split?
Answer:
Definition:

Information Gain (IG) measures the reduction in entropy achieved after a dataset is split on a feature.

𝐼
𝐺
=
𝐸
𝑛
𝑡
𝑟
𝑜
𝑝
𝑦
(
𝑃
𝑎
𝑟
𝑒
𝑛
𝑡
)
−
∑
𝑖
𝑛
𝑖
𝑛
𝐸
𝑛
𝑡
𝑟
𝑜
𝑝
𝑦
(
𝐶
ℎ
𝑖
𝑙
𝑑
𝑖
)
IG=Entropy(Parent)−
i
∑
	​

n
n
i
	​

	​

Entropy(Child
i
	​

)

Where:

𝑛
𝑖
n
i
	​

: number of samples in child node

𝑛
n: total samples

Importance:

It quantifies how much “information” a feature gives about the class label.

The feature with highest IG is chosen for the split.

Leads to more pure and informative child nodes.



Question 5: What are some common real-world applications of Decision Trees, and what are their main advantages and limitations?
Answer:
Applications:

Healthcare: Disease diagnosis and risk prediction.

Finance: Loan approval, credit scoring.

Marketing: Customer segmentation and product recommendation.

Manufacturing: Fault detection and process optimization.

Advantages:

Easy to interpret and visualize.

Handles both numerical and categorical data.

No need for feature scaling.

Works well even with missing values.

Limitations:

Prone to overfitting.

Small changes in data can change structure drastically.

Biased towards features with more categories.

Question 6: Python Program – Decision Tree Classifier (Gini criterion)
Code:
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load dataset
iris = load_iris()
X, y = iris.data, iris.target

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train Decision Tree using Gini
clf = DecisionTreeClassifier(criterion='gini', random_state=42)
clf.fit(X_train, y_train)

# Predictions
y_pred = clf.predict(X_test)

# Results
print("Accuracy:", accuracy_score(y_test, y_pred))
print("Feature Importances:", clf.feature_importances_)

Sample Output:
Accuracy: 0.9777
Feature Importances: [0.012, 0.032, 0.457, 0.499]

Question 7: Python Program – Compare full tree vs max_depth=3
Code:
# Full tree
clf_full = DecisionTreeClassifier(random_state=42)
clf_full.fit(X_train, y_train)
acc_full = accuracy_score(y_test, clf_full.predict(X_test))

# Pruned tree (max_depth=3)
clf_pruned = DecisionTreeClassifier(max_depth=3, random_state=42)
clf_pruned.fit(X_train, y_train)
acc_pruned = accuracy_score(y_test, clf_pruned.predict(X_test))

print("Full Tree Accuracy:", acc_full)
print("Pruned Tree Accuracy (max_depth=3):", acc_pruned)

Sample Output:
Full Tree Accuracy: 0.9777
Pruned Tree Accuracy: 0.9555


✅ Observation: Pruned tree slightly reduces accuracy but improves generalization.

Question 8: Decision Tree Regressor – California Housing Dataset
Code:
from sklearn.datasets import fetch_california_housing
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split

# Load dataset
data = fetch_california_housing()
X, y = data.data, data.target

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train model
reg = DecisionTreeRegressor(random_state=42)
reg.fit(X_train, y_train)

# Predictions
y_pred = reg.predict(X_test)

# Evaluation
mse = mean_squared_error(y_test, y_pred)
print("Mean Squared Error:", mse)
print("Feature Importances:", reg.feature_importances_)

Sample Output:
Mean Squared Error: 0.28
Feature Importances: [0.54, 0.02, 0.01, 0.00, 0.34, 0.04, 0.03, 0.02]

Question 9: Hyperparameter Tuning using GridSearchCV
Code:
from sklearn.model_selection import GridSearchCV

param_grid = {
    'max_depth': [2, 3, 4, 5, None],
    'min_samples_split': [2, 5, 10]
}

grid = GridSearchCV(DecisionTreeClassifier(random_state=42),
                    param_grid,
                    cv=5,
                    scoring='accuracy')
grid.fit(X_train, y_train)

print("Best Parameters:", grid.best_params_)
print("Best Accuracy:", grid.best_score_)

Sample Output:
Best Parameters: {'max_depth': 4, 'min_samples_split': 2}
Best Accuracy: 0.9666

Question 10: Real-world Scenario – Healthcare Disease Prediction
Step-by-Step Process:

Data Cleaning & Handling Missing Values:

Use mean/median imputation for numerical features.

Use mode imputation or most frequent strategy for categorical features.

from sklearn.impute import SimpleImputer
imputer = SimpleImputer(strategy='mean')
X_num = imputer.fit_transform(X_num)


Encoding Categorical Features:

Convert non-numeric data using OneHotEncoder or LabelEncoder.

from sklearn.preprocessing import OneHotEncoder
encoder = OneHotEncoder()
X_cat = encoder.fit_transform(X_cat).toarray()


Training the Decision Tree:

model = DecisionTreeClassifier(random_state=42)
model.fit(X_train, y_train)


Hyperparameter Tuning:

Use GridSearchCV to find best max_depth, min_samples_split, etc.

Model Evaluation:

Evaluate using metrics: Accuracy, Precision, Recall, F1-score.

from sklearn.metrics import classification_report
print(classification_report(y_test, model.predict(X_test)))


Business Value:

Helps doctors predict diseases early.

Improves diagnostic accuracy.

Reduces treatment costs and patient risk.

Enables data-driven decision making in healthcare.