#Theory

1. What is a Decision Tree, and how does it work in the context of classification?
 - A Decision Tree is a supervised machine learning algorithm used for both classification and regression problems.
It works like a flowchart — each internal node represents a condition (like a question on a feature), each branch shows the outcome of that condition, and each leaf node gives the final result (class label).

In classification, it divides the dataset into smaller groups based on the best feature split until the data becomes as pure as possible (all belonging to one class).
It uses impurity measures like Gini or Entropy to decide which feature to split on.

2. Explain the concepts of Gini Impurity and Entropy as impurity measures. How do they impact the splits in a Decision Tree?
 - Both Gini and Entropy tell how mixed the classes are in a node.

~ Entropy measures randomness or uncertainty.
Formula: Entropy = -∑ pᵢ * log₂(pᵢ)

 ~ Gini Impurity measures how often a randomly chosen element would be incorrectly labeled.
Formula: Gini = 1 - ∑ pᵢ²

Lower values mean purer data.
Decision Tree chooses the feature that gives the highest reduction in impurity — that is, the split that makes the child nodes most pure.
In short, both work similarly; Gini is faster, Entropy is more detailed.

3. What is the difference between Pre-Pruning and Post-Pruning in Decision Trees? Give one practical advantage of using each.
 -  ~ Pre-Pruning: You stop growing the tree early by setting limits like max_depth, min_samples_split, etc.
➤ Advantage: avoids overfitting and makes training faster.

 ~ Post-Pruning: You first build a full tree and then remove unnecessary branches.
➤ Advantage: keeps useful complexity and improves test accuracy.

4. What is Information Gain in Decision Trees, and why is it important for choosing the best split?
 - Information Gain tells how much a feature helps in reducing uncertainty (entropy).
Formula:
IG = Entropy(parent) - Weighted Avg(Entropy(children))

Higher Information Gain means better feature for splitting.
So the algorithm chooses the feature with maximum IG to get more accurate and pure child nodes.

5. What are some common real-world applications of Decision Trees, and what are their main advantages and limitations?

 -  ~ Applications:

Loan approval / credit risk prediction

Disease diagnosis

Customer churn prediction

Fraud detection

Marketing and sales forecasting

 ~ Advantages:

Simple and easy to explain

Works on both numeric and categorical data

No need for feature scaling

 ~ Limitations:

Can overfit on training data

Sensitive to small data changes

Less accurate than ensemble methods like Random Forest

#Practical

In [3]:
#6 Write a Python program to:
#     Load the Iris Dataset
#     Train a Decision Tree Classifier using the Gini criterion
#     Print the model’s accuracy and feature importances

from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

iris = load_iris()
X, y = iris.data, iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42, stratify=y)

clf = DecisionTreeClassifier(criterion='gini', random_state=42)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)

print("Accuracy:", accuracy_score(y_test, y_pred))
print("Feature Importances:", clf.feature_importances_)


Accuracy: 0.9333333333333333
Feature Importances: [0.         0.02857143 0.54117647 0.4302521 ]


In [5]:
#7 Write a Python program to train a Decision Tree Classifier with max_depth=3 and compare accuracy with a fully-grown tree.

clf_full = DecisionTreeClassifier(random_state=42)
clf_full.fit(X_train, y_train)
acc_full = accuracy_score(y_test, clf_full.predict(X_test))

clf_md3 = DecisionTreeClassifier(max_depth=3, random_state=42)
clf_md3.fit(X_train, y_train)
acc_md3 = accuracy_score(y_test, clf_md3.predict(X_test))

print("Fully-grown Accuracy:", acc_full)
print("max_depth=3 Accuracy:", acc_md3)


Fully-grown Accuracy: 0.9333333333333333
max_depth=3 Accuracy: 0.9777777777777777


In [6]:
#8 Write a Python program to train a Decision Tree Regressor on the Boston Housing Dataset and print the MSE and feature importances.

from sklearn.datasets import fetch_california_housing
from sklearn.tree import DecisionTreeRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

data = fetch_california_housing()
X, y = data.data, data.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

reg = DecisionTreeRegressor(random_state=42)
reg.fit(X_train, y_train)
y_pred = reg.predict(X_test)

mse = mean_squared_error(y_test, y_pred)
print("MSE:", mse)
print("Top Feature Importances:", sorted(zip(data.feature_names, reg.feature_importances_), key=lambda x:x[1], reverse=True)[:5])


MSE: 0.495235205629094
Top Feature Importances: [('MedInc', np.float64(0.5285090936963706)), ('AveOccup', np.float64(0.13083767753210346)), ('Latitude', np.float64(0.09371656401749287)), ('Longitude', np.float64(0.08290202505986989)), ('AveRooms', np.float64(0.05297496833123543))]


In [7]:
#9 Write a Python program to tune Decision Tree hyperparameters using GridSearchCV.

from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.metrics import accuracy_score


iris = load_iris()
X, y = iris.data, iris.target

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42, stratify=y
)

model = DecisionTreeClassifier(random_state=42)

param_grid = {
    'max_depth': [1, 2, 3, 4, 5, None],
    'min_samples_split': [2, 5, 10]
}

grid = GridSearchCV(
    estimator=model,
    param_grid=param_grid,
    cv=5,
    scoring='accuracy',
    n_jobs=-1
)

grid.fit(X_train, y_train)

print("Best Parameters:", grid.best_params_)
print("Best CV Accuracy:", grid.best_score_)

best_model =_


Best Parameters: {'max_depth': 3, 'min_samples_split': 2}
Best CV Accuracy: 0.9523809523809523


10. Healthcare Use Case — Predicting Disease Using Decision Trees

 - Steps I would follow:

1. Check data, find missing values.

2. Fill missing numeric values with median, categorical with mode or “Unknown”.

3. Encode categorical data (OneHotEncoder or LabelEncoder).

4. Train DecisionTreeClassifier on the cleaned data.

5. Tune parameters like max_depth and min_samples_split using GridSearchCV.

6. Evaluate using accuracy, recall, F1-score, and confusion matrix.

Business value:
Helps hospitals predict diseases early, prioritize patients, reduce cost and save time by automating initial diagnosis steps.