# Introduction to ML

# What is machine learning?
Machine Learning is a field of computer science that gives computers the ability to learn from data without being explicitly programmed.

📌 Simple definition:
Machine learning is when a computer uses examples (data) to figure out how to do something by itself.

# What is Supervised Learning?
Supervised Learning is when you teach a computer using labeled examples, just like a teacher helping a student with questions and answers.

📚 Simple Analogy
Imagine you're learning to recognize animals.
Someone shows you flashcards:

Picture	Label
🐱	"Cat"
🐶	"Dog"
🐰	"Rabbit"

After seeing enough of these, you learn to tell them apart.
This is supervised learning: you learn from examples with correct answers.

---

🔍 Supervised Learning = Input → Output
1 .You provide inputs (features)
2. You provide the correct outputs (labels)
3. The model learns the relationship

---

## 📂 Types of Supervised Learning:
There are 2 main types depending on the kind of output:

### 🎯 Regression
Predicting a number (continuous value)

📌 Examples:
Predict house price 🏠
Predict temperature tomorrow 🌡️
Predict stock price 📈

🧪 Example:
Hours studied	Test Score
2	50
4	70
6	85

→ Predict test score if someone studied 5 hours.


### 🏷️ Classification
Predicting a category (discrete label)

📌 Examples:

Email is spam or not spam 📧
Image is cat, dog, or rabbit 🐱🐶🐰
Patient is healthy or sick 🏥

🧪 Example:
Email Text	Label
"Win a prize now!"	Spam
"Your meeting is at 3 PM today"	Not Spam

→ Predict if a new email is spam.

---

📈 What happens during training?
You give the model many examples (input → correct output)

The model tries to learn a pattern that connects input to output

Later, you give it new inputs and ask it to predict the output

✅ Summary
Feature	Description
What is it?	Learning from labeled data
Input	Known data (features)
Output	Known labels
Goal	Predict output for new inputs
Types	Regression (numbers), Classification (labels)

---


In [None]:
from sklearn.linear_model import LinearRegression

# Data: hours studied and scores
X = [[2], [4], [6]]
y = [50, 70, 85]

model = LinearRegression()
model.fit(X, y)

print(model.predict([[5]]))  # Predict score for 5 hours studied


---
---

## 🧩 What is Unsupervised Learning?
Unsupervised Learning is when a computer learns without being told the right answer.
It finds patterns and groups in the data on its own.

📚 Simple Analogy
Imagine you’re given a box of mixed LEGO bricks — different shapes and colors — but no instructions.

You start sorting them by color or size, just by noticing patterns.
That’s what unsupervised learning does.

---

### 🔍 Key idea:
No labels (no "this is a dog", "this is a cat")

The algorithm just sees raw data

It tries to find structure: like clusters, groups, or rules



---

### 👀 Real Example: Customer Segmentation
Let’s say a store has a list of customers:

Age	Spending Score
```
22	90
35	60
56	30
24	88
52	35
36	59
```

You don’t know who is a “high spender” or “low spender”.

You use unsupervised learning to group customers with similar behavior.

---

## 📂 Main Types of Unsupervised Learning

### Clustering
Grouping data into clusters based on similarity.

📌 Examples:
Grouping customers by shopping behavior 🛍️
Grouping news articles by topic 📰
Grouping animals by shape/features 🐾

🧪 Example:
Given these points:
```
(1,1), (2,2), (10,10), (11,11)
An algorithm might find 2 clusters:
```

```
Cluster A: near (1,1)
Cluster B: near (10,10)
```

✅ Popular Algorithm:
K-Means Clustering

### Dimensionality Reduction
Making data simpler by reducing the number of features, while keeping important info.

📌 Examples:

Visualizing high-dimensional data in 2D (like reducing 100 features to 2)

Compressing images

Removing noise from data

🧪 Imagine you have this data:

```
Height	Weight	Age	Shoe Size	...
```

Too many columns! Dimensionality reduction finds the most important patterns and reduces it to fewer features.

✅ Popular Algorithms:
PCA (Principal Component Analysis)

t-SNE (for visualization)

---

 Comparison with Supervised Learning:
```
| 🧩 Feature            | 🧠 Supervised Learning                    | 🔍 Unsupervised Learning               |
|----------------------|------------------------------------------|----------------------------------------|
| **Has labels?**      | ✅ Yes (inputs with correct outputs)      | ❌ No labels                            |
| **Goal**             | Predict outputs                          | Discover hidden structure              |
| **Example Task**     | Spam detection, price prediction         | Customer grouping, topic modeling      |
| **Output type**      | Classification or regression             | Clusters or new simplified features    |

```

In [None]:
from sklearn.cluster import KMeans
import numpy as np

# Data: X and Y positions (like people’s spending patterns)
X = np.array([[1, 2], [2, 3], [10, 11], [11, 12]])

kmeans = KMeans(n_clusters=2)
kmeans.fit(X)

print(kmeans.labels_)  # Shows which cluster each point belongs to


---
---

## 📈 What is Linear Regression?

Linear Regression is a machine learning model that finds the best straight line to predict a number (output) from input data.

It’s used when you believe:
➡️ "The more X increases, the more Y increases (or decreases), in a straight-line way."

🧠 Simple Idea
Imagine this:

Hours Studied	Test Score
```
1	50
2	60
3	70
4	80
```

You can see:
More hours → Higher score.
A line could describe this relationship!

---

📏 What does Linear Regression do?
It finds the best-fitting straight line that predicts Y from X.

That line looks like this in math:

🔢 Equation:
```
Y = aX + b
```
X = input (e.g. hours studied)
Y = output (e.g. test score)
a = slope (how much Y changes per unit of X)
b = intercept (Y when X = 0)

This is just like the line formula from high school math:
```
y = mx + c
```

---

🎯 What is Linear Regression used for?
Predicting house prices 🏠

Forecasting sales 📊

Estimating age from height 👶📏

Predicting crop yield from rainfall 🌾🌧️

In [None]:
from sklearn.linear_model import LinearRegression

# Training data
X = [[1], [2], [3], [4]]        # Hours studied
y = [50, 60, 70, 80]            # Test scores

model = LinearRegression()
model.fit(X, y)

# Predict score if someone studies 5 hours
print(model.predict([[5]]))    # Output: [90.]

📊 Visualization
Imagine this:
```
Data points:
   ●     ●      ●      ●
   |     |      |      |
   +-----+------+------+
       line going through them
```

---
---

## 💰 What is a Cost Function?
A cost function tells us how wrong our model is.
It measures the difference between predicted values and actual values.

📌 Simple definition:
It’s like a score:
Lower cost = better model.
Higher cost = worse model.

🎯 Goal of training:
Make the cost function as small as possible.

---

🔢 Cost Function Formula (for Linear Regression)
Let’s say we have a bunch of data points:
| X (input) | y (actual output) |
| --------- | ----------------- |
| 1         | 3                 |
| 2         | 5                 |
| 3         | 7                 |

We train a model with prediction rule:

```
y_pred = a * X + b
```

The cost function we use is:

✅ Mean Squared Error (MSE):
![image.png](attachment:image.png)
Where:
![image-2.png](attachment:image-2.png)

---
## 🧠 Intuition Behind Cost Function
| Step                  | Meaning                                       |
| --------------------- | --------------------------------------------- |
| $\hat{y}_i - y_i$     | How wrong was the prediction?                 |
| $(\hat{y}_i - y_i)^2$ | Make it always positive, punish bigger errors |
| Sum over all points   | Total error across all predictions            |
| Divide by m           | Get **average error** → “mean” squared error  |


So, the cost function is just:

❗ “How far are our predictions from the real answers, on average?”

---

Data:
| Hours Studied (X) | Actual Score (y) | Predicted Score (𝑦̂) |
| ----------------- | ---------------- | --------------------- |
| 1                 | 50               | 45                    |
| 2                 | 60               | 55                    |
| 3                 | 70               | 65                    |


Calculate the cost:
![image-4.png](attachment:image-4.png)
Cost = 25 → our predictions are off by 5 on average.

---

## 📊 Visualizing the Cost Function
When training a model, we want to adjust a and b in y = aX + b to minimize the cost.

We can plot the cost depending on the values of a and b.

Example: Cost as a bowl-shaped surface
Imagine a 3D plot where:

X-axis = value of a (slope)

Y-axis = value of b (intercept)

Z-axis = cost (error)

🔻 The bottom of the bowl is the minimum cost — the best model.

## 🖼️ Visualization Example (2D)
If we fix b and just vary a, the cost function looks like this:
```pgsql
Cost
  ▲
  |
  |        ●
  |       ●
  |     ●
  |   ●
  | ●
  +--------------------> Slope (a)
        "U"-shape
```
➡️ The lowest point is the best a.

This is what gradient descent will try to find.

✅ Summary
| Concept              | Meaning                                                            |
| -------------------- | ------------------------------------------------------------------ |
| **Cost function**    | Tells how bad the model’s predictions are                          |
| **Formula**          | MSE: Average of squared errors between predictions and real values |
| **Intuition**        | Lower cost = better model                                          |
| **Visualization**    | A "U"-shaped curve (2D) or bowl (3D) that we want to minimize      |
| **Goal of training** | Find values (like `a`, `b`) that **minimize the cost function**    |


---
---

## 📉 What is Gradient Descent?
Gradient Descent is the method used to find the best model parameters (like weights a, b) that minimize the cost function.

🧠 Simple idea:
“We start somewhere, then keep taking small steps downhill on the cost function until we reach the lowest point.”

Like climbing down a mountain in the fog — you don’t see the whole path, but always step in the steepest downhill direction.

---

## ⚙️ How Does Gradient Descent Work?
Repeat:
1. Calculate the gradient of the cost function (slope)
2. Move in the opposite direction of the slope
3. Repeat until the slope is almost zero (minimum reached)

🔍 What is a Gradient?
A gradient is just the direction — tells us how fast the cost is changing.
If the slope is:

➕ positive → move left (decrease the value)

➖ negative → move right (increase the value)

---

🔁 Gradient Descent Algorithm (for Linear Regression)
We update a and b repeatedly using the formulas:
![image-3.png](attachment:image-3.png)
Where:
![image-2.png](attachment:image-2.png)

---

## 🚀 Implementing Gradient Descent (Python Example)


In [1]:
# Simple data
X = [1, 2, 3, 4]    # Input (hours studied)
y = [50, 60, 70, 80]  # Output (test score)

# Initialize parameters
a = 0
b = 0
alpha = 0.01  # Learning rate
epochs = 1000

m = len(X)

# Gradient Descent loop
for _ in range(epochs):
    da = 0
    db = 0
    for i in range(m):
        y_pred = a * X[i] + b
        error = y_pred - y[i]
        da += error * X[i]
        db += error
    a -= alpha * (2/m) * da
    b -= alpha * (2/m) * db

print(f"Trained model: y = {a:.2f} * x + {b:.2f}")


Trained model: y = 10.56 * x + 38.36


## ⚙️ Learning Rate (α)
Learning rate controls how big each step is in gradient descent.
| α (Learning Rate)  | What happens?                        |
| ------------------ | ------------------------------------ |
| Too small (0.0001) | Learning is **very slow** 🐢         |
| Too big (1.0)      | May **overshoot** or **diverge** 🚀  |
| Just right (0.01)  | **Smooth** and **stable** learning ✅ |

---
🖼️ Visualizing Gradient Descent
Imagine the cost function as a U-shaped curve:
```pgsql
Cost
  ▲
  |       ●
  |     ●
  |   ●
  | ●
  +------------------> parameter (a)
```
Each ● is a step of gradient descent, moving down the slope.

---

## 📈 Gradient Descent for Linear Regression Summary

| Step              | Action                                               |
| ----------------- | ---------------------------------------------------- |
| Start             | With random `a` and `b`                              |
| Compute cost      | Measure how bad predictions are                      |
| Find gradient     | See how to change `a` and `b` to reduce cost         |
| Update parameters | Using gradient and learning rate                     |
| Repeat            | Until the cost stops changing (or max steps reached) |


## ✅ Summary Table
| Concept           | Meaning                                                           |
| ----------------- | ----------------------------------------------------------------- |
| Gradient descent  | A method to **minimize** the cost function                        |
| Goal              | Find best model parameters (like slope and intercept)             |
| Learning rate (α) | Size of the step we take in each update                           |
| Use in regression | Updates `a` and `b` to fit the best line to the data              |
| Key operation     | Adjust parameters in the direction that **reduces cost** the most |


---
---



## Practice quiz: Supervised vs unsupervised learning
A) Supervised
B) Unsupervised
| # | Question                                                      | Your Answer | Correct Answer | Result |
| - | ------------------------------------------------------------- | ----------- | -------------- | ------ |
| 1 | What type of learning uses labeled data?                      | A ✅         | A ✅            | ✅      |
| 2 | Which of the following is an **unsupervised** learning task?  | B ✅         | B ✅            | ✅      |
| 3 | Predicting house prices is an example of...?                  | A ✅         | A ✅            | ✅      |
| 4 | What is the main goal of unsupervised learning?               | B ✅         | B ✅            | ✅      |
| 5 | Is topic modeling a supervised learning task?                 | A ❌         | B ❌            | ❌      |
| 6 | Which of the following is a typical supervised learning task? | A ✅         | A ✅            | ✅      |
| 7 | Dimensionality reduction is...?                               | B ✅         | B ✅            | ✅      |


## Practice quiz: Regression

| # | Question                                                                             | Options                                                                                                                                                       | Your Answer | Correct Answer |
| - | ------------------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------- | -------------- |
| 1 | What is the main goal of regression?                                                 | A) Classify data points<br>B) Reduce dimensionality<br>**C) Predict a continuous output**<br>D) Cluster data into groups                                      | C           | ✅ C            |
| 2 | Which of the following is an example of a regression problem?                        | A) Predicting whether an email is spam<br>**B) Predicting the price of a house**<br>C) Classifying animals<br>D) Identifying spoken words                     | B           | ✅ B            |
| 3 | In linear regression, what does the slope represent?                                 | A) The intercept on the Y-axis<br>B) The cost function<br>**C) The rate of change of Y with respect to X**<br>D) The error                                    | C           | ✅ C            |
| 4 | What is the name of the function that measures the error in regression?              | A) Gradient<br>B) Slope<br>**C) Cost function (like MSE)**<br>D) Learning rate                                                                                | C           | ✅ C            |
| 5 | Which algorithm is commonly used to minimize the cost function in linear regression? | A) Support Vector Machine<br>B) K-Means<br>C) Decision Trees<br>**D) Gradient Descent**                                                                       | D           | ✅ D            |
| 6 | What happens if the learning rate is too high during gradient descent?               | A) The model will converge slowly<br>**B) The model might overshoot and never converge**<br>C) The model will always converge perfectly<br>D) Nothing happens | B           | ✅ B            |
| 7 | Which of the following is NOT a regression algorithm?                                | A) Linear Regression<br>B) Lasso Regression<br>C) Ridge Regression<br>**D) K-Means Clustering**                                                               | D           | ✅ D            |


Practice quiz: Train the model with gradient descent

Optional lab: Model representation•60 minutes
Optional lab: Cost function•60 minutes
Optional lab: Gradient descent•60 minutes