<a href="https://colab.research.google.com/github/MehrdadJalali-AI/Statistics-and-Machine-Learning/blob/main/Day8.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Supervised Learning Models and Evaluation Techniques


This notebook covers supervised learning models with a focus on **classification**. We will implement:
1. **Decision Trees**
2. **Random Forest**
3. **K-Nearest Neighbors (KNN)**

We will also explore model evaluation techniques like **Holdout Validation** and **K-Fold Cross-Validation** using metrics such as **accuracy**, **precision**, **recall**, and **F1-score**.


## Step 1: Import Libraries

In [None]:

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split, KFold, cross_val_score
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, classification_report


## Step 2: Create a Synthetic Dataset

In [None]:

# Creating a synthetic dataset
from sklearn.datasets import make_classification
X, y = make_classification(n_samples=200, n_features=4, n_classes=2, random_state=42)

# Split data into training and testing sets (Holdout method)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


## Step 3: Decision Tree Classifier

In [None]:

# Initialize and train the Decision Tree Classifier
tree_clf = DecisionTreeClassifier(random_state=42)
tree_clf.fit(X_train, y_train)

# Predictions and Evaluation
y_pred_tree = tree_clf.predict(X_test)

# Evaluation Metrics
print("Decision Tree Classifier Results")
print("Accuracy:", accuracy_score(y_test, y_pred_tree))
print("Precision:", precision_score(y_test, y_pred_tree))
print("Recall:", recall_score(y_test, y_pred_tree))
print("F1 Score:", f1_score(y_test, y_pred_tree))


## Step 4: Random Forest Classifier

In [None]:

# Initialize and train the Random Forest Classifier
rf_clf = RandomForestClassifier(n_estimators=100, random_state=42)
rf_clf.fit(X_train, y_train)

# Predictions and Evaluation
y_pred_rf = rf_clf.predict(X_test)

# Evaluation Metrics
print("Random Forest Classifier Results")
print("Accuracy:", accuracy_score(y_test, y_pred_rf))
print("Precision:", precision_score(y_test, y_pred_rf))
print("Recall:", recall_score(y_test, y_pred_rf))
print("F1 Score:", f1_score(y_test, y_pred_rf))


## Step 5: K-Nearest Neighbors Classifier

In [None]:

# Initialize and train the K-Nearest Neighbors Classifier
knn_clf = KNeighborsClassifier(n_neighbors=5)
knn_clf.fit(X_train, y_train)

# Predictions and Evaluation
y_pred_knn = knn_clf.predict(X_test)

# Evaluation Metrics
print("K-Nearest Neighbors Classifier Results")
print("Accuracy:", accuracy_score(y_test, y_pred_knn))
print("Precision:", precision_score(y_test, y_pred_knn))
print("Recall:", recall_score(y_test, y_pred_knn))
print("F1 Score:", f1_score(y_test, y_pred_knn))


## Step 6: Model Evaluation with K-Fold Cross-Validation

In [None]:

# K-Fold Cross-Validation
kf = KFold(n_splits=5, shuffle=True, random_state=42)
models = {"Decision Tree": tree_clf, "Random Forest": rf_clf, "KNN": knn_clf}

for model_name, model in models.items():
    accuracy_scores = cross_val_score(model, X, y, cv=kf, scoring='accuracy')
    print(f"{model_name} - Average Accuracy: {np.mean(accuracy_scores):.2f}")


## Exercises


**Exercise 1**: Try different values of `n_neighbors` in KNN and observe how it affects performance.

**Exercise 2**: Experiment with different numbers of trees (e.g., 50, 100, 200) in the Random Forest model and see the impact on accuracy.

**Exercise 3**: Modify the `max_depth` parameter in Decision Tree to control tree depth and observe its effect on overfitting or underfitting.
