# Supervised Learning Workshop
This notebook introduces the basic workflow for supervised machine learning using scikit-learn.
Inspired by: *Introduction to Machine Learning with Python* by Andreas Müller and Sarah Guido.

## 1. Load and Explore the Dataset
We'll use the Iris dataset, a classic classification problem.

In [2]:
from sklearn.datasets import load_iris
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

iris = load_iris()
df = pd.DataFrame(iris.data, columns=iris.feature_names)
df['target'] = iris.target
df.head()

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),target
0,5.1,3.5,1.4,0.2,0
1,4.9,3.0,1.4,0.2,0
2,4.7,3.2,1.3,0.2,0
3,4.6,3.1,1.5,0.2,0
4,5.0,3.6,1.4,0.2,0


## 2. Train/Test Split
We split our data to evaluate how well our model generalizes.

In [None]:
from sklearn.model_selection import train_test_split

X = iris.data
y = iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)
X_train.shape, X_test.shape

## 3. Train a Model
We start with k-Nearest Neighbors (k-NN), a simple and intuitive classifier.

In [None]:
from sklearn.neighbors import KNeighborsClassifier

knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(X_train, y_train)
print('Test accuracy:', knn.score(X_test, y_test))

## 4. Try Another Classifier (Logistic Regression)
Experiment with different models and see how results compare.

In [None]:
from sklearn.linear_model import LogisticRegression

lr = LogisticRegression(max_iter=200)
lr.fit(X_train, y_train)
print('Test accuracy (Logistic Regression):', lr.score(X_test, y_test))

## 5. Mini Challenge
Try these tasks:
1. Train a `DecisionTreeClassifier` and compare test accuracy.
2. Change `n_neighbors` in `KNeighborsClassifier` to 1, 5, and 10 and see how accuracy changes.
3. Plot a confusion matrix using `sklearn.metrics.confusion_matrix`.