Title: Binary vs. Multi-Class Classification<br>

Task 1:<br>
Binary Classification: Predict if a website visitor will click a button (Click or No Click).<br>
Use a web visitor interaction dataset.<br>
Task: Implement binary classification for click prediction.

In [2]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix

# Mock dataset (replace with your real data loading)
data = {
    'time_on_site': [30, 45, 10, 120, 60, 5, 15, 40, 70, 90],
    'pages_visited': [3, 5, 1, 8, 6, 1, 2, 4, 7, 6],
    'referrer_google': ['yes', 'yes', 'no', 'yes', 'yes', 'no', 'no', 'yes', 'yes', 'yes'],
    'clicked': [0, 1, 0, 1, 1, 0, 0, 1, 1, 1]
}
df = pd.DataFrame(data)

# Handle categorical features by one-hot encoding
X = df.drop('clicked', axis=1)
X = pd.get_dummies(X)

# Target
y = df['clicked']

# Check for missing values
if X.isnull().sum().sum() > 0:
    X = X.fillna(0)  # or apply other imputation

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y)

# Feature scaling
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Train logistic regression classifier
clf = LogisticRegression(random_state=42)
clf.fit(X_train, y_train)

# Predict and evaluate
y_pred = clf.predict(X_test)

print("Confusion Matrix:")
print(confusion_matrix(y_test, y_pred))

print("\nClassification Report:")
print(classification_report(y_test, y_pred, target_names=["No Click", "Click"]))

Confusion Matrix:
[[1 0]
 [0 1]]

Classification Report:
              precision    recall  f1-score   support

    No Click       1.00      1.00      1.00         1
       Click       1.00      1.00      1.00         1

    accuracy                           1.00         2
   macro avg       1.00      1.00      1.00         2
weighted avg       1.00      1.00      1.00         2



Task 2:<br>
Multi-Class Classification: Recognize handwritten digits (0-9).<br>
Use the MNIST dataset.<br>
Task: Develop a model that correctly classifies each handwritten digit.<br>


In [3]:
from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, confusion_matrix

# Load MNIST data
mnist = fetch_openml('mnist_784', version=1)
X, y = mnist["data"], mnist["target"]

# Convert target to integers
y = y.astype(int)

# Split dataset
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y)

# Scale features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Train a Random Forest classifier
clf = RandomForestClassifier(n_estimators=100, random_state=42)
clf.fit(X_train, y_train)

# Predictions
y_pred = clf.predict(X_test)

# Evaluation
print("Confusion Matrix:")
print(confusion_matrix(y_test, y_pred))

print("\nClassification Report:")
print(classification_report(y_test, y_pred))

Confusion Matrix:
[[1371    1    0    0    1    0    2    0    5    1]
 [   0 1548    9    7    1    1    2    5    1    1]
 [   8    1 1350    3    6    1    7   14    7    1]
 [   3    1   14 1367    0   15    2   12   12    2]
 [   4    3    1    0 1312    0   10    0    4   31]
 [   5    1    1   17    3 1208   11    2    7    8]
 [   9    2    1    0    3    4 1350    0    6    0]
 [   1    5   12    0    9    0    0 1414    3   15]
 [   3    3    8    7    3   11    3    0 1308   19]
 [   8    7    1   16   18    4    1   12   11 1313]]

Classification Report:
              precision    recall  f1-score   support

           0       0.97      0.99      0.98      1381
           1       0.98      0.98      0.98      1575
           2       0.97      0.97      0.97      1398
           3       0.96      0.96      0.96      1428
           4       0.97      0.96      0.96      1365
           5       0.97      0.96      0.96      1263
           6       0.97      0.98      0.98     

Task 3:<br>
Multi-Class Classification: Classify a flower species based on petal and sepal measurements.<br>
Use the Iris dataset.<br>
Task: Use features to classify into three species: Setosa, Versicolor, or Virginica.


In [4]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import classification_report, confusion_matrix
import pandas as pd

# Load Iris dataset
iris = load_iris()
X = iris.data  # features: sepal length, sepal width, petal length, petal width
y = iris.target  # target: 0=Setosa, 1=Versicolor, 2=Virginica
target_names = iris.target_names

# Split into training and test sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42, stratify=y)

# Train Decision Tree Classifier
clf = DecisionTreeClassifier(random_state=42)
clf.fit(X_train, y_train)

# Predict on test set
y_pred = clf.predict(X_test)

# Evaluation
print("Confusion Matrix:")
print(confusion_matrix(y_test, y_pred))

print("\nClassification Report:")
print(classification_report(y_test, y_pred, target_names=target_names))

Confusion Matrix:
[[15  0  0]
 [ 0 12  3]
 [ 0  0 15]]

Classification Report:
              precision    recall  f1-score   support

      setosa       1.00      1.00      1.00        15
  versicolor       1.00      0.80      0.89        15
   virginica       0.83      1.00      0.91        15

    accuracy                           0.93        45
   macro avg       0.94      0.93      0.93        45
weighted avg       0.94      0.93      0.93        45

