
### **Question 1:**

**What is a Decision Tree, and how does it work in the context of classification?**

**Answer:**
>A Decision Tree is a supervised learning algorithm used for classification and regression tasks.
It works by splitting the dataset into smaller parts based on feature values to form a tree-like model.
Each internal node represents a decision rule on a feature, and each leaf node gives a class label.
The algorithm selects the feature that best separates the data using impurity measures like Gini or Entropy.
This process continues recursively until no further improvement can be made.
Decision Trees aim to create the purest possible subsets for accurate prediction.
They are easy to interpret because they follow simple “if-else” decision paths.
Hence, they’re widely used for transparent and rule-based classification tasks.

---

### **Question 2:**

**Explain the concepts of Gini Impurity and Entropy as impurity measures. How do they impact the splits in a Decision Tree?**

**Answer:**
>Gini Impurity and Entropy are measures used to determine the quality of splits in a Decision Tree.
Gini Impurity calculates how often a randomly chosen element would be misclassified.
Entropy measures the level of uncertainty or disorder in a node.
Both values are low when the node is pure (contains mostly one class).
The algorithm tries to split data to reduce these impurity values as much as possible.
This reduction is called Information Gain when using entropy.
Lower impurity means better-separated and more accurate nodes.
Thus, these measures directly guide how the Decision Tree grows and makes predictions.


### **Question 3:**

**What is the difference between Pre-Pruning and Post-Pruning in Decision Trees? Give one practical advantage of using each.**

**Answer:**
>Pre-Pruning (also called *Early Stopping*) stops the tree from growing once a certain condition is met, such as a minimum number of samples or maximum depth.
It prevents the model from becoming too complex and overfitting early in the training stage.
Post-Pruning, on the other hand, allows the tree to grow fully and then removes branches that provide little or no improvement in accuracy.
This helps simplify the model while keeping its predictive power strong.
An advantage of Pre-Pruning is that it saves training time and computational resources.
An advantage of Post-Pruning is that it produces a more generalized model with better real-world performance.
Both methods help control overfitting but differ in when they are applied.
Thus, pruning ensures the Decision Tree remains accurate yet interpretable.

---

### **Question 4:**

**What is Information Gain in Decision Trees, and why is it important for choosing the best split?**

**Answer:**
>Information Gain (IG) measures how much uncertainty or impurity is reduced after splitting the dataset on a particular feature.
It is calculated as the difference between the parent node’s entropy and the weighted average entropy of its child nodes.
A higher Information Gain indicates a more effective split that better separates the classes.
The Decision Tree algorithm selects the feature with the highest IG at each step for splitting.
This helps ensure that each node becomes purer and more meaningful for classification.
By maximizing Information Gain, the tree improves prediction accuracy and efficiency.
It plays a key role in building an optimal tree structure with clear decision boundaries.
Hence, IG guides the learning process toward the most informative features.



### **Question 5:**

**What are some common real-world applications of Decision Trees, and what are their main advantages and limitations?**

**Answer:**
>Decision Trees are widely used in finance for credit risk assessment and loan approval decisions.
In healthcare, they help diagnose diseases and predict patient outcomes.
In marketing, they’re used for customer segmentation and predicting buying behavior.
Their main advantage is interpretability — results can be easily understood through “if-else” rules.
They can also handle both numerical and categorical data effectively.
However, Decision Trees often overfit the training data if not properly pruned.
They can also be unstable, as small data changes may lead to a different structure.
Despite these limits, they remain popular for their simplicity, clarity, and strong baseline performance.






In [1]:
# Dataset Info:
# ● Iris Dataset for classification tasks (sklearn.datasets.load_iris() or
# provided CSV).
# ● Boston Housing Dataset for regression tasks
# (sklearn.datasets.load_boston() or provided CSV).

# Question 6: Write a Python program to:
# ● Load the Iris Dataset
# ● Train a Decision Tree Classifier using the Gini criterion
# ● Print the model’s accuracy and feature importances

from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load dataset
iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.3, random_state=42)

# Train Decision Tree using Gini criterion
model = DecisionTreeClassifier(criterion='gini', random_state=42)
model.fit(X_train, y_train)

# Evaluate model
y_pred = model.predict(X_test)
print("Accuracy:", accuracy_score(y_test, y_pred))
print("Feature Importances:", model.feature_importances_)


Accuracy: 1.0
Feature Importances: [0.         0.01911002 0.89326355 0.08762643]


In [2]:
# Question 7: Write a Python program to:
# ● Load the Iris Dataset
# ● Train a Decision Tree Classifier with max_depth=3 and compare its accuracy to
# a fully-grown tree.

from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load dataset
iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.3, random_state=42)

# Train limited-depth and full tree models
tree_limited = DecisionTreeClassifier(max_depth=3, random_state=42)
tree_full = DecisionTreeClassifier(random_state=42)
tree_limited.fit(X_train, y_train)
tree_full.fit(X_train, y_train)

# Compare accuracies
print("Accuracy (max_depth=3):", accuracy_score(y_test, tree_limited.predict(X_test)))
print("Accuracy (full tree):", accuracy_score(y_test, tree_full.predict(X_test)))


Accuracy (max_depth=3): 1.0
Accuracy (full tree): 1.0


In [3]:
# Question 8: Write a Python program to:
# ● Load the California Housing dataset from sklearn
# ● Train a Decision Tree Regressor
# ● Print the Mean Squared Error (MSE) and feature importances

# Import necessary libraries
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import mean_squared_error

# Load the California Housing dataset
california = fetch_california_housing()
X = california.data       # Features
y = california.target     # Target (median house value)

# Split dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)

# Create Decision Tree Regressor
dt_regressor = DecisionTreeRegressor(random_state=42)

# Train the model
dt_regressor.fit(X_train, y_train)

# Make predictions
y_pred = dt_regressor.predict(X_test)

# Calculate Mean Squared Error
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error (MSE): {mse:.2f}")

# Print feature importances
print("Feature Importances:")
for feature_name, importance in zip(california.feature_names, dt_regressor.feature_importances_):
    print(f"{feature_name}: {importance:.4f}")



Mean Squared Error (MSE): 0.53
Feature Importances:
MedInc: 0.5235
HouseAge: 0.0521
AveRooms: 0.0494
AveBedrms: 0.0250
Population: 0.0322
AveOccup: 0.1390
Latitude: 0.0900
Longitude: 0.0888


In [4]:
# Question 9: Write a Python program to:
# ● Load the Iris Dataset
# ● Tune the Decision Tree’s max_depth and min_samples_split using
# GridSearchCV
# ● Print the best parameters and the resulting model accuracy

# Import necessary libraries
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

# Load the Iris dataset
iris = load_iris()
X = iris.data
y = iris.target

# Split dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)

# Create Decision Tree classifier
dt = DecisionTreeClassifier(random_state=42)

# Define the hyperparameter grid to search
param_grid = {
    'max_depth': [None, 2, 3, 4, 5],
    'min_samples_split': [2, 3, 4, 5]
}

# Initialize GridSearchCV
grid_search = GridSearchCV(estimator=dt, param_grid=param_grid, cv=5, scoring='accuracy')

# Fit GridSearchCV to training data
grid_search.fit(X_train, y_train)

# Get the best parameters
best_params = grid_search.best_params_
print(f"Best Parameters: {best_params}")

# Predict with the best model
best_model = grid_search.best_estimator_
y_pred = best_model.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy with Best Parameters: {accuracy:.2f}")


Best Parameters: {'max_depth': None, 'min_samples_split': 2}
Accuracy with Best Parameters: 1.00


**Question 10**

Imagine you’re working as a data scientist for a healthcare company that
wants to predict whether a patient has a certain disease. You have a large dataset with
mixed data types and some missing values.
Explain the step-by-step process you would follow to:

● Handle the missing values
● Encode the categorical features
● Train a Decision Tree model
● Tune its hyperparameters
● Evaluate its performance
And describe what business value this model could provide in the real-world
setting.

**Answer:**


### **1️. Handle Missing Values**

* Analyze missing data.
* Impute numerical features (mean/median) and categorical features (mode or ‘Unknown’).
* Drop features with too many missing values if necessary.



### **2. Encode Categorical Features**

* Use Label Encoding for ordinal data.
* Use One-Hot Encoding for nominal data.



### **3. Train Decision Tree Model**

* Split data into training and testing sets.
* Initialize and fit a Decision Tree classifier.



### **4️. Tune Hyperparameters**

* Key hyperparameters: `max_depth`, `min_samples_split`, `min_samples_leaf`, `criterion`.
* Use GridSearchCV or RandomizedSearchCV to find the best combination.



### **5️. Evaluate Performance**

* Use metrics beyond accuracy: Precision, Recall, F1-Score, ROC-AUC.
* Confusion matrix helps understand false positives/negatives, crucial in healthcare.



### **6️. Business Value**

* Early disease detection and risk stratification.
* Supports clinical decisions and prioritizes resources.
* Reduces unnecessary tests and costs.
* Enables population health management and preventive care.

