[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Nepal-College-of-Information-Technology/AI-Data-Science-Worksop-2024/blob/main/Practice%20Yourself/Assignment_6.1_Decision_Trees_and_Random_Forests.ipynb)


# Assignment 6.1: Decision Trees and Random Forests

## Objective:
Build and evaluate a **Decision Tree** and a **Random Forest** classifier to predict whether a bank customer will subscribe to a term deposit based on various customer attributes (such as age, job, marital status, etc.).

### Dataset:
You can use the [Bank Marketing Dataset](https://archive.ics.uci.edu/ml/datasets/bank+marketing) or a similar dataset with customer attributes.

---

## Tasks:

### Task 1: Load and Preprocess the Dataset

1. Load the dataset into a Pandas DataFrame.
2. Handle any missing data if present.
3. Convert categorical variables to numerical values using one-hot encoding.



```python
import pandas as pd

# Load the dataset
df = pd.read_csv('your_dataset.csv')  # Replace with the actual dataset path

# Handle missing data (if necessary)
# df.fillna(0, inplace=True)  # Example for handling missing values

# One-hot encode categorical variables
df_encoded = pd.get_dummies(df, drop_first=True)

# Display the first few rows of the preprocessed dataset
df_encoded.head()

In [None]:
import pandas as pd
import numpy as np
df = pd.read_csv('/content/bank.csv')

df.fillna(0, inplace=True)
df_encoded = pd.get_dummies(df, drop_first=True)

print(df_encoded.head())


### Task 2: Split the Dataset

Split the dataset into training and testing sets using an 80-20 ratio.

In [None]:
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn import tree
import matplotlib.pyplot as plt

iris = load_iris()
X = iris.data
y = iris.target


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


### Task 3: Train a Decision Tree Classifier

1.	Train a Decision Tree Classifier on the training data.
2.	Visualize the trained decision tree.
3.	Make predictions on the test data.

In [None]:
clf = DecisionTreeClassifier()
clf.fit(X_train, y_train)

### Task 4: Train a Random Forest Classifier

1.	Train a Random Forest Classifier on the training data.
2.	Compare the performance of the Random Forest with the Decision Tree.

In [None]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report
rf_classifier = RandomForestClassifier(random_state=42)
rf_classifier.fit(X_train, y_train)


y_pred_rf = rf_classifier.predict(X_test)
rf_accuracy = accuracy_score(y_test, y_pred_rf)
print("Random Forest Accuracy:", rf_accuracy)
print("Random Forest Classification Report:\n", classification_report(y_test, y_pred_rf))


### Task 5: Evaluate the Models

Calculate and compare the accuracy, precision, recall, and F1-score for both the Decision Tree and Random Forest models.

In [None]:

from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix

dt_model = DecisionTreeClassifier()
dt_model.fit(X_train, y_train)

dt_predictions = dt_model.predict(X_test)
dt_accuracy = accuracy_score(y_test, dt_predictions)
dt_precision = precision_score(y_test, dt_predictions, average='weighted')
dt_recall = recall_score(y_test, dt_predictions, average='weighted')
dt_f1 = f1_score(y_test, dt_predictions, average='weighted')
rf_model = RandomForestClassifier()
rf_model.fit(X_train, y_train)
rf_predictions = rf_model.predict(X_test)


rf_accuracy = accuracy_score(y_test, rf_predictions)
rf_precision = precision_score(y_test, rf_predictions, average='weighted')
rf_recall = recall_score(y_test, rf_predictions, average='weighted')
rf_f1 = f1_score(y_test, rf_predictions, average='weighted')

print(f"Decision Tree - Accuracy: {dt_accuracy}, Precision: {dt_precision}, Recall: {dt_recall}, F1-score: {dt_f1}")
print(f"Random Forest - Accuracy: {rf_accuracy}, Precision: {rf_precision}, Recall: {rf_recall}, F1-score: {rf_f1}")


### Task 6: Confusion Matrix

1.	Generate the confusion matrix for both models.
2.	Visualize the confusion matrix using a heatmap.

In [None]:

y_true = [0, 6, 0, 1, 1, 0, 1, 9]
y_pred = [0, 1, 0, 1, 4, 0, 1, 1]
from sklearn.metrics import confusion_matrix
import seaborn as sns
import matplotlib.pyplot as plt


cm = confusion_matrix(y_true, y_pred)


plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt='d', cmap='Reds', xticklabels=['Predicted Negative', 'Predicted Positive'], yticklabels=['Actual Negative', 'Actual Positive'])
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.title('Confusion Matrix')
plt.show()

### Conclusion:

In this assignment, you:

- Loaded and preprocessed the dataset.
- Trained a Decision Tree and a Random Forest classifier.
- Compared their performance using accuracy, precision, recall, and F1-score.
- Visualized their performance using confusion matrices.

Which model performed better, and why?

(write your answer here....)

---