Que-1  What is a Decision Tree, and how does it work in the context of classification? 

Ans-1 A Decision Tree is a supervised machine learning algorithm that classifies data by repeatedly splitting it based on feature values, forming a tree-like structure where each leaf node represents a class decision.
A decision tree is a supervised learning algorithm that classifies data using a series of rule-based splits, and it is widely used in marketing for customer targeting, logistics for delay prediction, and finance for credit risk assessment due to its interpretability.
Finance Use Case – Loan Default Prediction
Logistics Use Case – Delivery Delay Classification
Marketing Use Case – Customer Conversion Prediction

Que-2 Explain the concepts of Gini Impurity and Entropy as impurity measures. How do they impact the splits in a Decision Tree? 
Ans-2 In a Decision Tree, Gini Impurity and Entropy are impurity measures used to decide how to split the data at each node.
Gini Impurity- Gini Impurity measures the probability of incorrect classification if a data point is randomly labeled according to the class distribution at a node.     
            Gini=1−∑pi2
Entropy- Entropy measures the uncertainty or randomness in the data.   Entropy=−∑pi*log2(pi)

Information Gain (for Entropy)
IG=Entropy(parent)−Weighted Entropy(children)
IG=Entropy(parent)−Weighted Entropy(children)
Gini Reduction
Gini Gain= Gini(parent)−Weighted Gini(children)
Gini Gain=Gini(parent)−Weighted Gini(children)

Que-3  What is the difference between Pre-Pruning and Post-Pruning in Decision Trees? Give one practical advantage of using each.
Ans-3 Pre-Pruning (Early Stopping) - Pre-pruning stops the tree from growing further once certain conditions are met, before it becomes too complex.
Post-Pruning (Pruning After Full Growth) - Post-pruning allows the tree to grow fully first, and then removes branches that do not improve performance on validation data.
Pre-pruning controls tree growth early to reduce complexity and training time, while post-pruning trims a fully grown tree to improve generalization by removing overfitting branches.

Que-4  What is Information Gain in Decision Trees, and why is it important for choosing the best split? 
Ans-4 Information Gain (IG) is a metric used in Decision Trees (especially with Entropy) to measure how much uncertainty is reduced after splitting the data on a particular feature.
Information Gain measures the reduction in entropy achieved by a split, and it is important because decision trees choose the feature with the highest information gain to create the most informative and pure child nodes.

Que-5 What are some common real-world applications of Decision Trees, and what are their main advantages and limitations? 
Ans-5 A Decision Tree is a supervised machine learning algorithm widely used in business and analytics because its logic is rule-based and easy to interpret.
A company predicts whether a customer will respond to a discount offer based on age, income, and past purchases.
Banks classify applicants as low risk or high risk using credit score, income, and existing liabilities.
Advantages of Decision Trees

Easy to Interpret & Explain
Handles Non-Linear Relationships
Minimal Data Preprocessing
Works with Numerical & Categorical Data

Limitations of Decision Trees
Overfitting
Solution: pruning, max depth, Random Forest.
Unstable- Small data changes can produce very different trees.
Lower Accuracy Compared to Ensembles- Single trees are weaker than Random Forest or XGBoost.
Bias Toward Dominant Features- Can favor features with more split points.
Decision trees are widely used in marketing, finance, logistics, and healthcare for classification and decision-making due to their interpretability, but they suffer from overfitting and instability, which is why ensemble methods are often preferred.

In [28]:
# Que-6 Write a Python program to: 
# ● Load the Iris Dataset 
# ● Train a Decision Tree Classifier using the Gini criterion
# ● Print the model’s accuracy and feature importances  
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
iris = load_iris()
x = iris.data
y = iris.target
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x,y, random_state = 44)
model = DecisionTreeClassifier(criterion = 'gini', random_state = 42)
model.fit(x_train, y_train)
y_pred = model.predict(x_test)
y_pred

array([2, 0, 1, 1, 2, 0, 2, 2, 2, 1, 0, 1, 0, 2, 0, 0, 2, 1, 0, 2, 1, 2,
       2, 1, 2, 1, 0, 1, 0, 1, 0, 1, 1, 2, 0, 1, 0, 0])

In [25]:
print(accuracy_score(y_test, y_pred))

0.9736842105263158


In [29]:
for feature, importance in zip(iris.feature_names, model.feature_importances_):
    print(f"{feature}: {importance}")

sepal length (cm): 0.025894921470142718
sepal width (cm): 0.0
petal length (cm): 0.9293248785154677
petal width (cm): 0.044780200014389517


In [35]:
# Que-7 Write a Python program to: 
# ● Load the Iris Dataset 
# ● Train a Decision Tree Classifier with max_depth=3 and compare its accuracy to a fully-grown tree. 

model_full_len = DecisionTreeClassifier(random_state = 44)
model_full_len.fit(x_train, y_train)
y_pred_fulllen = model_full_len.predict(x_test)

model_with_prun = DecisionTreeClassifier(max_depth = 3, random_state = 44)
model_with_prun.fit(x_train, y_train)
y_pred_wih_prun = model_with_prun.predict(x_test)
print("Accuracy of the model without pruning : ", accuracy_score(y_test, y_pred_fulllen))
print("Accuracy of the model with pruning : ", accuracy_score(y_test, y_pred_wih_prun))

Accuracy of the model without pruning :  0.8947368421052632
Accuracy of the model with pruning :  0.9473684210526315


In [51]:
# Que-8 Write a Python program to: 
# ● Load the Boston Housing Dataset 
# ● Train a Decision Tree Regressor 
# ● Print the Mean Squared Error (MSE) and feature importances 

from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import mean_squared_error
df = fetch_openml(name="boston", version=1, as_frame=False)
x = df.data 
y = df.target
x_train, x_test, y_train, y_test = train_test_split(x, y, random_state=42)
model = DecisionTreeRegressor(random_state = 42)
model.fit(x_train, y_train)
y_pred = model.predict(x_test)
print("Mean Squared Error (MSE):", mean_squared_error(y_test, y_pred))

Mean Squared Error (MSE): 16.68842519685039


In [57]:
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.metrics import accuracy_score

iris = load_iris()
x = iris.data
y = iris.target

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=42)
dt = DecisionTreeClassifier(random_state=42)

param_grid = {"max_depth": [2, 3, 4, 5, None],"min_samples_split": [2, 5, 10]}

grid_search = GridSearchCV(estimator=dt, param_grid=param_grid, cv=5, scoring="accuracy")

grid_search.fit(x_train, y_train)

best_model = grid_search.best_estimator_

y_pred = best_model.predict(x_test)

accuracy = accuracy_score(y_test, y_pred)

print("Best Parameters:", grid_search.best_params_)
print("Model Accuracy:", accuracy)

Best Parameters: {'max_depth': 4, 'min_samples_split': 2}
Model Accuracy: 1.0


Que-10 Imagine you’re working as a data scientist for a healthcare company that wants to predict whether a patient has a certain
disease. You have a large dataset with mixed data types and some missing values. Explain the step-by-step process you would follow to: 
● Handle the missing values 
● Encode the categorical features 
● Train a Decision Tree model 
● Tune its hyperparameters 
● Evaluate its performance And describe what business value this model could provide in the real-world setting. 

Ans-10 
Handle Missing Values

Numerical features → Impute with:
    Mean (if data is normally distributed)
    Median (if data is skewed or has outliers)

Categorical features → Impute with:
    Mode (most frequent value)
    Or a category like "Unknown"
    ex- Missing cholesterol values might be replaced with the median to avoid skewing due to extreme cases.

Encode Categorical Features
    Use Label Encoding for binary categories (Yes/No)
    Use One-Hot Encoding for multi-category features (e.g., blood group, region)
    Smoking status → {Non-smoker, Former, Current} → One-Hot Encoding

Train a Decision Tree Model
Split data into training and testing sets
Train a DecisionTreeClassifier
Start with default parameters