# üß† Decision Tree Case Studies ‚Äî Answers Notebook
This notebook contains answers and detailed explanations for all 10 case studies using **Entropy**, **Information Gain**, and **Gini Impurity** metrics.
---

## 1Ô∏è‚É£ Loan Approval Prediction ‚Äî Using Entropy & Information Gain

In [None]:

import pandas as pd
from sklearn.tree import DecisionTreeClassifier, plot_tree
import matplotlib.pyplot as plt

# Dataset
data = {
    'Income': ['High', 'High', 'Low', 'Low'],
    'CreditScore': ['Good', 'Bad', 'Good', 'Bad'],
    'LoanApproved': ['Yes', 'Yes', 'Yes', 'No']
}
df = pd.DataFrame(data)

# Encode categorical features
df_encoded = pd.get_dummies(df[['Income', 'CreditScore']])
y = df['LoanApproved'].map({'Yes': 1, 'No': 0})

# Train Decision Tree using Entropy
model = DecisionTreeClassifier(criterion='entropy', random_state=0)
model.fit(df_encoded, y)

plt.figure(figsize=(6,4))
plot_tree(model, feature_names=df_encoded.columns, class_names=['No','Yes'], filled=True)
plt.show()


**Result:**
- Root node = **CreditScore** (highest Information Gain)
- Rule: If CreditScore=Good ‚Üí Approve Loan (Yes), else check Income.


## 2Ô∏è‚É£ Student Exam Pass Prediction ‚Äî Using Gini Impurity

In [None]:

data = {
    'HoursStudied': ['High','Medium','Low','Low','High'],
    'Attendance': ['Good','Good','Poor','Good','Poor'],
    'Passed': ['Yes','Yes','No','No','Yes']
}
df = pd.DataFrame(data)

df_encoded = pd.get_dummies(df[['HoursStudied','Attendance']])
y = df['Passed'].map({'Yes':1,'No':0})

model = DecisionTreeClassifier(criterion='gini', random_state=0)
model.fit(df_encoded, y)

plt.figure(figsize=(6,4))
plot_tree(model, feature_names=df_encoded.columns, class_names=['No','Yes'], filled=True)
plt.show()


**Result:**
- Best Split = **HoursStudied** (lower Gini)
- Students with High or Medium study hours mostly pass.


## 3Ô∏è‚É£ Employee Attrition ‚Äî Using Entropy & Information Gain

In [None]:

data = {
    'Age': [25,45,30,50],
    'Overtime': ['Yes','No','Yes','No'],
    'SalaryLevel': ['Low','High','Medium','High'],
    'Attrition': ['Yes','No','Yes','No']
}
df = pd.DataFrame(data)

df_encoded = pd.get_dummies(df[['Overtime','SalaryLevel']])
y = df['Attrition'].map({'Yes':1,'No':0})

model = DecisionTreeClassifier(criterion='entropy', random_state=0)
model.fit(df_encoded, y)

plt.figure(figsize=(6,4))
plot_tree(model, feature_names=df_encoded.columns, class_names=['No','Yes'], filled=True)
plt.show()


**Result:**
- Best root = **Overtime** (higher Information Gain)
- Employees doing Overtime are more likely to leave.


## 4Ô∏è‚É£ Customer Churn ‚Äî Using Entropy & IG

In [None]:

data = {
    'Contract': ['Month-to-Month','One Year','Two Year','Month-to-Month'],
    'MonthlyCharges': ['High','Low','Low','Medium'],
    'Churn': ['Yes','No','No','Yes']
}
df = pd.DataFrame(data)

df_encoded = pd.get_dummies(df[['Contract','MonthlyCharges']])
y = df['Churn'].map({'Yes':1,'No':0})

model = DecisionTreeClassifier(criterion='entropy', random_state=0)
model.fit(df_encoded, y)

plt.figure(figsize=(6,4))
plot_tree(model, feature_names=df_encoded.columns, class_names=['No','Yes'], filled=True)
plt.show()


**Result:**
- Best root = **Contract** (Month-to-Month ‚Üí High churn)


## 5Ô∏è‚É£ Weather-based Play Decision ‚Äî Classic ID3 Example

In [None]:

data = {
    'Outlook': ['Sunny','Sunny','Overcast','Rain','Rain'],
    'Temperature': ['Hot','Hot','Hot','Mild','Cool'],
    'Humidity': ['High','High','High','High','Normal'],
    'Wind': ['Weak','Strong','Weak','Weak','Weak'],
    'Play': ['No','No','Yes','Yes','Yes']
}
df = pd.DataFrame(data)

df_encoded = pd.get_dummies(df[['Outlook','Temperature','Humidity','Wind']])
y = df['Play'].map({'Yes':1,'No':0})

model = DecisionTreeClassifier(criterion='entropy', random_state=0)
model.fit(df_encoded, y)

plt.figure(figsize=(10,6))
plot_tree(model, feature_names=df_encoded.columns, class_names=['No','Yes'], filled=True)
plt.show()


**Result:**
- Best root = **Outlook** (highest Information Gain)


## 6Ô∏è‚É£ Purchase Prediction (E-commerce) ‚Äî Using Gini

**Result:** Best Split = Time on Website


## 7Ô∏è‚É£ Disease Diagnosis ‚Äî Using Entropy

**Result:** Best Feature = Fever


## 8Ô∏è‚É£ Spam Email Classification ‚Äî Using Entropy

**Result:** Best Feature = Contains_Free


## 9Ô∏è‚É£ Credit Card Fraud Detection ‚Äî Using Gini

**Result:** Best Feature = Foreign


## 10Ô∏è‚É£ Car Purchase Decision ‚Äî Using Information Gain

**Result:** Best Feature = Income


# üß† Decision Tree Case Studies ‚Äî Answers (Detailed for 6‚Äì10)
This notebook provides full code, datasets, and explanations for case studies 6 through 10. Each case includes dataset creation, encoding, model training, visualization, and an explicit determination of the best feature / root split.

## 6Ô∏è‚É£ Purchase Prediction (E-commerce) ‚Äî Gini Impurity
**Goal:** Predict `Purchased` using `Age` and `Time on Website`.

We use Gini-criterion Decision Tree and check feature importances to decide the best split.

In [None]:

import pandas as pd
from sklearn.tree import DecisionTreeClassifier, plot_tree
import matplotlib.pyplot as plt

data6 = {
    'Age': [23,35,31,22],
    'TimeOnSite': [15,45,35,10],
    'Purchased': ['No','Yes','Yes','No']
}
df6 = pd.DataFrame(data6)
df6


In [None]:

# Prepare features and target
X6 = df6[['Age','TimeOnSite']]
y6 = df6['Purchased'].map({'Yes':1,'No':0})

# Train decision tree with Gini
model6 = DecisionTreeClassifier(criterion='gini', random_state=0)
model6.fit(X6, y6)

# Feature importances and tree plot
print("Feature importances:", dict(zip(X6.columns, model6.feature_importances_)))

plt.figure(figsize=(6,4))
plot_tree(model6, feature_names=X6.columns, class_names=['No','Yes'], filled=True)
plt.show()


**Conclusion (Case 6):**
- Feature importances show which of `Age` or `TimeOnSite` the model used most.
- The feature with the higher importance is the recommended root split. (Interpret from the printed importances above.)

## 7Ô∏è‚É£ Disease Diagnosis ‚Äî Entropy & Information Gain
**Goal:** Predict `Flu` using `Fever` and `Cough`.

In [None]:

data7 = {
    'Fever': ['Yes','Yes','No','No'],
    'Cough': ['Yes','No','Yes','No'],
    'Flu': ['Yes','Yes','No','No']
}
df7 = pd.DataFrame(data7)
df7


In [None]:

# One-hot encode and train with entropy criterion
X7 = pd.get_dummies(df7[['Fever','Cough']])
y7 = df7['Flu'].map({'Yes':1,'No':0})

model7 = DecisionTreeClassifier(criterion='entropy', random_state=0)
model7.fit(X7, y7)

print("Feature importances:", dict(zip(X7.columns, model7.feature_importances_)))

plt.figure(figsize=(6,4))
plot_tree(model7, feature_names=X7.columns, class_names=['No','Yes'], filled=True)
plt.show()


**Conclusion (Case 7):**
- Compare importances for `Fever_Yes` vs `Cough_Yes` (or the encoded features). The highest importance indicates the best root feature (Fever or Cough).

## 8Ô∏è‚É£ Spam Email Classification ‚Äî Entropy & Information Gain
**Goal:** Predict `Spam` using presence of `Free` and `Click`.

In [None]:

data8 = {
    'Contains_Free': ['Yes','Yes','No','No'],
    'Contains_Click': ['Yes','No','Yes','No'],
    'Spam': ['Yes','Yes','No','No']
}
df8 = pd.DataFrame(data8)
df8


In [None]:

X8 = pd.get_dummies(df8[['Contains_Free','Contains_Click']])
y8 = df8['Spam'].map({'Yes':1,'No':0})

model8 = DecisionTreeClassifier(criterion='entropy', random_state=0)
model8.fit(X8, y8)

print("Feature importances:", dict(zip(X8.columns, model8.feature_importances_)))

plt.figure(figsize=(6,4))
plot_tree(model8, feature_names=X8.columns, class_names=['No','Yes'], filled=True)
plt.show()


**Conclusion (Case 8):**
- The attribute (`Contains_Free` or `Contains_Click`) with higher importance is the best split. In small synthetic datasets like this, often `Contains_Free` explains spam strongly.

## 9Ô∏è‚É£ Credit Card Fraud Detection ‚Äî Gini Impurity
**Goal:** Predict `Fraud` using `Amount` (High/Low) and `Foreign` (Yes/No).

In [None]:

data9 = {
    'Amount': ['High','High','Low','Low'],
    'Foreign': ['Yes','No','No','Yes'],
    'Fraud': ['Yes','No','No','Yes']
}
df9 = pd.DataFrame(data9)
df9


In [None]:

X9 = pd.get_dummies(df9[['Amount','Foreign']])
y9 = df9['Fraud'].map({'Yes':1,'No':0})

model9 = DecisionTreeClassifier(criterion='gini', random_state=0)
model9.fit(X9, y9)

print("Feature importances:", dict(zip(X9.columns, model9.feature_importances_)))

plt.figure(figsize=(6,4))
plot_tree(model9, feature_names=X9.columns, class_names=['No','Yes'], filled=True)
plt.show()


**Conclusion (Case 9):**
- The `Foreign_Yes` or `Amount_High` importance reveals the preferred split; higher importance indicates the root. In this dataset `Foreign` often separates fraud effectively.

## üîü Car Purchase Decision ‚Äî Entropy & Information Gain
**Goal:** Predict `Buy Car` using `Income` and `Age Group`.

In [None]:

data10 = {
    'Income': ['High','Medium','High','Low'],
    'AgeGroup': ['30-40','20-30','40-50','20-30'],
    'BuyCar': ['Yes','No','Yes','No']
}
df10 = pd.DataFrame(data10)
df10


In [None]:

X10 = pd.get_dummies(df10[['Income','AgeGroup']])
y10 = df10['BuyCar'].map({'Yes':1,'No':0})

model10 = DecisionTreeClassifier(criterion='entropy', random_state=0)
model10.fit(X10, y10)

print("Feature importances:", dict(zip(X10.columns, model10.feature_importances_)))

plt.figure(figsize=(8,4))
plot_tree(model10, feature_names=X10.columns, class_names=['No','Yes'], filled=True)
plt.show()


**Conclusion (Case 10):**
- The one-hot encoded feature with the largest importance (for example `Income_High`) indicates which attribute (Income or AgeGroup) is most predictive; choose that attribute as the root split.

----

## How to interpret results
- After running each code block, check the printed feature importances to see which feature the tree prioritized.
- The feature with the highest importance is the recommended root split (highest information gain / impurity reduction).

If you want, I can now merge these detailed cells into your existing notebook (`Decision_Trees_CBA.ipynb`) to exactly match its layout and styling.