# Day 10: Introduction to Machine Learning - Starter Notebook

Welcome to Day 10! This notebook introduces machine learning basics with scikit-learn.

## Learning Objectives
- Understand the basics of machine learning and its applications
- Distinguish between supervised and unsupervised learning
- Use scikit-learn for simple ML tasks
- Evaluate model performance

## Instructions
Complete each exercise section below. Refer to `docs/day_10_intro_machine_learning.md` for detailed guidance.

---
## Setup
Run the cell below to import required libraries.

In [None]:
# Import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# scikit-learn imports
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression, LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, mean_squared_error, classification_report

# Display settings
%matplotlib inline

print("Libraries imported successfully!")

---
## Load the Dataset

We'll use the Iris dataset for these exercises.

**Dataset:** `../data/iris.csv`

In [None]:
# Load the Iris dataset
df = pd.read_csv('../data/iris.csv')
df.head()

In [None]:
# Explore the dataset
print(df.shape)
print(df.info())
print(df.describe())

---
## Exercise 1: Simple Regression

**Deliverables:**
1. Use scikit-learn to fit a linear regression model to a dataset.

**Success Criteria:**
- Model fits and predicts correctly
- Results are interpreted

In [None]:
# TODO: Prepare features (X) and target (y) for regression
# Example: Predict petal_width from petal_length

X = None  # Replace with your code
y = None  # Replace with your code

In [None]:
# TODO: Split data into training and test sets
# Hint: Use train_test_split(X, y, test_size=0.2, random_state=42)

X_train, X_test, y_train, y_test = None, None, None, None  # Replace

In [None]:
# TODO: Create and fit the linear regression model
# Hint: model = LinearRegression(); model.fit(X_train, y_train)

model = None  # Replace with your code

In [None]:
# TODO: Make predictions and evaluate
# Hint: predictions = model.predict(X_test)
#       mse = mean_squared_error(y_test, predictions)


In [None]:
# TODO: Visualize the regression results
# Hint: Plot actual vs predicted values


---
## Exercise 2: Simple Classification

**Deliverables:**
1. Use scikit-learn to fit a classifier (e.g., KNN or logistic regression).

**Success Criteria:**
- Model classifies data correctly
- Accuracy is measured

In [None]:
# TODO: Prepare features (X) and target (y) for classification
# Use all numeric features to predict species

X = None  # Replace with your code
y = None  # Replace with your code

In [None]:
# TODO: Split data into training and test sets

X_train, X_test, y_train, y_test = None, None, None, None  # Replace

In [None]:
# TODO: Create and fit a classifier
# Hint: Use KNeighborsClassifier(n_neighbors=3) or LogisticRegression()

classifier = None  # Replace with your code

In [None]:
# TODO: Make predictions and evaluate
# Hint: predictions = classifier.predict(X_test)
#       accuracy = accuracy_score(y_test, predictions)


---
## Exercise 3: Model Evaluation

**Deliverables:**
1. Evaluate model performance using appropriate metrics.

**Success Criteria:**
- Metrics (accuracy, RMSE, etc.) are calculated
- Results are interpreted

In [None]:
# TODO: Generate a classification report
# Hint: Use classification_report(y_test, predictions)


In [None]:
# TODO: Create a confusion matrix visualization
# Hint: Use sns.heatmap() with sklearn.metrics.confusion_matrix


---
## Validation Checklist

Before proceeding to the next day, verify:
- [ ] Can fit and interpret a regression model
- [ ] Can fit and interpret a classification model
- [ ] Can evaluate model performance with appropriate metrics