# Machine Learning with Scikit-Learn

Machine Learning (ML) sounds magical, but it's mostly just math. It's about writing programs that **learn** rules from data instead of us writing the rules manually.

We will use **Scikit-Learn**, the industry-standard library for "Classical ML" (things like predicting prices or classifying emails).

## Learning Objectives
- **The Workflow**: The 6 steps every ML project follows.
- **Train/Test Split**: Why we hide some data from the model during training.
- **Linear Regression**: Building a simple model to predict numbers.
- **Evaluation**: How to measure if our model is actually good.

In [None]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

# 1. Generate dummy data (House Size vs Price)
X = np.random.rand(100, 1) * 100  # Size (0 to 100 sqm)
y = 3 * X + 10 + np.random.randn(100, 1) * 5  # Price = 3*Size + 10 + Noise

# 2. Split Data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

## 2. Training the Model

This is where the magic happens.
1.  **Instantiate**: We create an empty "brain" (the model). Here we choose `LinearRegression`, which tries to draw a straight line through the data.
2.  **Fit**: We show the model our training data (`X_train`, `y_train`). It looks at the examples and learns the relationship (the math formula).

After this step, the model has "learned." It knows the slope and intercept of the line that best fits the data.

In [None]:
# 3. Instantiate
model = LinearRegression()

# 4. Fit
model.fit(X_train, y_train)

print(f"Model learned: Price = {model.coef_[0][0]:.2f} * Size + {model.intercept_[0]:.2f}")

## 3. Evaluation

We can't just trust the model. We need to test it.
We take the **Test Set** (`X_test`)—data the model has *never seen before*—and ask it to make predictions.

Then we compare those predictions to the actual answers (`y_test`).
*   **RMSE (Root Mean Squared Error)**: A common way to score the error. It tells us, on average, how far off our predictions are. (Lower is better!)

In [None]:
# 5. Predict
predictions = model.predict(X_test)

# 6. Evaluate (Root Mean Squared Error)
rmse = np.sqrt(mean_squared_error(y_test, predictions))
print(f"Average Error (RMSE): {rmse:.2f}")