# Classical ML: Hands-on Laboratory

In this notebook, we will master the fundamentals of Scikit-Learn (sklearn). We will cover:
1.  **Regression**: Predicting continuous numbers (House Prices).
2.  **Classification**: Improving binary health predictions (Breast Cancer).
3.  **Clustering**: Grouping data without labels (K-Means).

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression, LogisticRegression
from sklearn.metrics import mean_squared_error, accuracy_score, confusion_matrix, classification_report
from sklearn.cluster import KMeans
from sklearn.datasets import fetch_california_housing, load_breast_cancer, make_blobs

# Set style
sns.set(style="whitegrid")

## Part 1: Linear Regression (Predicting Prices)

We will use the famous California Housing dataset.

In [None]:
# 1. Load Data
housing = fetch_california_housing()
X = pd.DataFrame(housing.data, columns=housing.feature_names)
y = housing.target # Median House Value in $100k

print(f"Dataset Shape: {X.shape}")
X.head()

In [None]:
# 2. Split Data (80% Train, 20% Test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# 3. Train Model
regressor = LinearRegression()
regressor.fit(X_train, y_train)

# 4. Predict
y_pred = regressor.predict(X_test)

# 5. Evaluate
mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)

print(f"Root Mean Squared Error: ${rmse*100000:.2f}")

# Visualize Logic: Actual vs Predicted
plt.figure(figsize=(8, 6))
plt.scatter(y_test, y_pred, alpha=0.3)
plt.plot([0, 5], [0, 5], color='red', lw=2) # Perfect prediction line
plt.xlabel("Actual Price")
plt.ylabel("Predicted Price")
plt.title("Regression Analysis")
plt.show()

## Part 2: Logistic Regression (Classification)

We will predict if a tumor is Malignant (0) or Benign (1).

In [None]:
# 1. Load Data
cancer = load_breast_cancer()
X_c = pd.DataFrame(cancer.data, columns=cancer.feature_names)
y_c = cancer.target

# 2. Split
X_train_c, X_test_c, y_train_c, y_test_c = train_test_split(X_c, y_c, test_size=0.2, random_state=42)

# 3. Train
classifier = LogisticRegression(max_iter=3000)
classifier.fit(X_train_c, y_train_c)

# 4. Evaluate
y_pred_c = classifier.predict(X_test_c)
acc = accuracy_score(y_test_c, y_pred_c)

print(f"Accuracy: {acc*100:.2f}%")
print("\nConfusion Matrix:")
sns.heatmap(confusion_matrix(y_test_c, y_pred_c), annot=True, fmt='d', cmap='Blues')
plt.show()

## Part 3: K-Means Clustering (Unsupervised)

We will generate fake blobs of data and see if the computer can group them automatically.

In [None]:
# 1. Create Fake Data
X_blob, _ = make_blobs(n_samples=300, centers=4, cluster_std=0.60, random_state=0)

# 2. Train K-Means
kmeans = KMeans(n_clusters=4, init='k-means++', max_iter=300, n_init=10, random_state=0)
y_kmeans = kmeans.fit_predict(X_blob)

# 3. Visualize
plt.figure(figsize=(8, 6))
plt.scatter(X_blob[:, 0], X_blob[:, 1], c=y_kmeans, s=50, cmap='viridis')
centers = kmeans.cluster_centers_
plt.scatter(centers[:, 0], centers[:, 1], c='red', s=200, alpha=0.7, label='Centroids')
plt.title("K-Means Clustering Result")
plt.legend()
plt.show()