# Decision Tree Classifier with the Iris Dataset

##  Overview
This code demonstrates how to use a **Decision Tree Classifier** from `scikit-learn` to classify flowers in the **Iris dataset**.  
The Iris dataset is a well-known dataset in machine learning that contains measurements of iris flowers (features) and their species (labels).  

The goal: **Train a Decision Tree to predict the species of iris flowers** based on their measurements.

---

## Steps in the Code

### 1. Import Libraries
- **pandas, numpy** ‚Üí for handling data (not heavily used here, but good practice).  
- **scikit-learn (sklearn)** ‚Üí for dataset loading, splitting, training, and evaluation.  

### 2. Load the Dataset
- `datasets.load_iris()` loads the Iris dataset.  
- `X = iris.data` ‚Üí the features (measurements of the flowers).  
- `y = iris.target` ‚Üí the labels (species: Setosa, Versicolor, Virginica).  

### 3. Split the Data
- `train_test_split()` splits the dataset into:
  - **80% training data** ‚Üí used to train the model.  
  - **20% testing data** ‚Üí used to check accuracy.  

### 4. Train the Model
- `DecisionTreeClassifier()` creates a Decision Tree model.  
- `.fit(X_train, y_train)` trains it on the training data.  

### 5. Make Predictions
- `.predict(X_test)` predicts flower species for the test data.  

### 6. Evaluate the Model
- **Accuracy** ‚Üí How many predictions were correct overall.  
- **Precision** ‚Üí How many predicted classes were actually correct.  
- **Recall** ‚Üí How well the model found all correct samples.  
- **F1-score** ‚Üí A balance between precision and recall.  

### 7. Print Results
The performance metrics are displayed to show how well the model performed.  

---

## Summary
This example shows a complete workflow for **classification using Decision Trees**:
- Load dataset  
- Split data  
- Train model  
- Make predictions  
- Evaluate performance  

The Iris dataset is a simple but powerful starting point for learning classification in machine learning.


In [1]:
#Import Libraries and Dataset
import pandas as pd
import numpy as np

# Import necessary libraries
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, precision_score, recall_score,  f1_score, classification_report
from sklearn import tree
from sklearn.tree import DecisionTreeClassifier

# Load the Iris dataset
iris = datasets.load_iris()
X = iris.data # Features
y = iris.target # Labels

# Split the dataset into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create and train the Decision Tree model
dt_model = DecisionTreeClassifier(random_state=42)
dt_model.fit(X_train, y_train)

# Predict the labels for test data
y_pred = dt_model.predict(X_test)

# Compute performance metrics
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred, average='weighted')
recall = recall_score(y_test, y_pred, average='weighted')
f1 = f1_score(y_test, y_pred, average='weighted')

# Print performance metrics
print("Accuracy:", accuracy)
print("Precision:", precision)
print("Recall:", recall)
print("F1-score:", f1)

Accuracy: 1.0
Precision: 1.0
Recall: 1.0
F1-score: 1.0


In [2]:
# Detailed classification report
print("\nClassification Report:")
print(classification_report(y_test, y_pred, target_names=iris.target_names))


Classification Report:
              precision    recall  f1-score   support

      setosa       1.00      1.00      1.00        10
  versicolor       1.00      1.00      1.00         9
   virginica       1.00      1.00      1.00        11

    accuracy                           1.00        30
   macro avg       1.00      1.00      1.00        30
weighted avg       1.00      1.00      1.00        30



# k-Nearest Neighbors (kNN) Classifier

##  Overview
This code demonstrates the **k-Nearest Neighbors (kNN)** algorithm for **binary classification**.  
Instead of using a real dataset, we generate a **synthetic dataset** with `make_classification()` from scikit-learn.  

The goal: Train a kNN classifier to correctly classify samples into two classes (0 or 1).

---

##  Steps in the Code

### 1. Import Libraries
- `make_classification` ‚Üí generates a synthetic dataset.  
- `train_test_split` ‚Üí splits data into train/test sets.  
- `KNeighborsClassifier` ‚Üí the kNN algorithm.  
- `accuracy_score`, `precision_score`, `recall_score`, `f1_score` ‚Üí evaluation metrics.  

### 2. Generate Dataset
- `make_classification(n_samples=1000, n_features=20, n_classes=2)`  
  - Creates **1000 samples**.  
  - Each sample has **20 features**.  
  - There are **2 possible classes** (binary classification).  

### 3. Split the Data
- `train_test_split()` splits into:
  - **80% training data** (to train the model).  
  - **20% testing data** (to evaluate the model).  

### 4. Train the kNN Model
- `KNeighborsClassifier(n_neighbors=5)` ‚Üí uses the **5 nearest neighbors** to classify a sample.  
- `.fit(X_train, y_train)` trains the model.  

### 5. Make Predictions
- `.predict(X_test)` predicts the class labels for test data.  

### 6. Evaluate the Model
The following performance metrics are calculated:
- **Accuracy** ‚Üí Overall correctness of the model.  
- **Precision** ‚Üí Of the predicted positives, how many were correct.  
- **Recall** ‚Üí Of the actual positives, how many were correctly identified.  
- **F1-score** ‚Üí Balance between precision and recall.  

### 7. Print Results
The performance metrics are displayed with 4 decimal places for readability.  

---

##  Summary
This example shows how to:  
1. Generate a synthetic dataset.  
2. Train a **k-Nearest Neighbors (kNN)** model.  
3. Evaluate it using **common classification metrics**.  

kNN is a simple yet powerful algorithm that classifies a new data point based on the majority class of its **nearest neighbors**.


In [3]:
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

# Generate synthetic dataset for binary classification
X, y = make_classification(n_samples=1000, n_features=20, n_classes=2, random_state=42)

# Split dataset into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create and train kNN classifier
knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(X_train, y_train)

# Predict labels for test data
y_pred = knn.predict(X_test)

# Calculate performance metrics
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)

# Output performance metrics
print("Performance Metrics:")
print(f"Accuracy: {accuracy:.4f}")
print(f"Precision: {precision:.4f}")
print(f"Recall: {recall:.4f}")
print(f"F1-score: {f1:.4f}")

Performance Metrics:
Accuracy: 0.8100
Precision: 0.8791
Recall: 0.7477
F1-score: 0.8081


#  Support Vector Machine (SVM) with ROC Curve

##  Overview
This code shows how to train a **Support Vector Machine (SVM)** classifier for binary classification, and then evaluate its performance using the **ROC curve** and **AUC (Area Under the Curve)**.  

The ROC curve helps visualize the **trade-off between True Positive Rate and False Positive Rate**, while the AUC gives a single number to summarize performance.

---

## üõ†Ô∏è Steps in the Code

### 1. Import Libraries
- **NumPy** ‚Üí numerical operations.  
- **Matplotlib** ‚Üí plotting.  
- **scikit-learn** ‚Üí dataset generation, SVM model, splitting, and metrics.  

### 2. Generate Dataset
- `make_classification()` creates a **synthetic binary dataset** with:  
  - **1000 samples**  
  - **20 features**  
  - **2 classes**  

### 3. Split the Data
- `train_test_split()` ‚Üí splits data into **80% training** and **20% testing**.  

### 4. Train the SVM Model
- `SVC(probability=True)` ‚Üí creates an SVM classifier with probability estimates enabled (needed for ROC).  
- `.fit(X_train, y_train)` ‚Üí trains the model.  

### 5. Predict Probabilities
- `.predict_proba(X_test)[:, 1]` ‚Üí gets the probability of class **1** (positive class).  

### 6. Compute ROC & AUC
- `roc_curve(y_test, y_prob)` ‚Üí calculates **False Positive Rate (FPR)**, **True Positive Rate (TPR)**, and thresholds.  
- `auc(fpr, tpr)` ‚Üí computes the **Area Under the Curve (AUC)**.  

### 7. Plot ROC Curve
- X-axis ‚Üí **False Positive Rate (FPR)**  
- Y-axis ‚Üí **True Positive Rate (TPR)**  
- The diagonal line (`[0,1]`) represents random guessing.  
- The closer the ROC curve is to the **top-left corner**, the better the classifier.  

---

## ‚úÖ Summary
- Trained an **SVM classifier** on synthetic binary data.  
- Evaluated performance using **ROC curve** and **AUC score**.  
- **ROC Curve** ‚Üí shows the balance between sensitivity (TPR) and fallout (FPR).  
- **AUC** ‚Üí a single score where **1.0 = perfect classifier**, and **0.5 = random guessing**.
