# Overview: Supervised Machine Learning

## What is Supervised Machine Learning?

Before diving deeper, let’s understand where supervised learning fits within the broader field of Artificial Intelligence (AI).

There are two major AI approaches:

- **Knowledge-based systems**, which include:
  - **Rule-based systems**
  - **Case-based reasoning**

- **Machine Learning (ML)**, which allows systems to **learn from data without being explicitly programmed**.  
  > “Machine Learning provides systems with the ability to learn from experience without being programmed explicitly. Machine Learning is concerned with the development of applications that can access data and learn from it on themselves.”

ML can be applied to different kinds of problems, such as:

- **Regression** – Predicting continuous values (e.g., prices)
- **Classification** – Predicting discrete labels (e.g., spam or not spam)
- **Clustering** – Grouping similar data points without predefined labels

Depending on the type of problem, different learning strategies are used:

- Supervised Learning  
- Unsupervised Learning  
- Semi-supervised Learning  
- Reinforcement Learning  
- Active Learning  

This gives a high-level view of where **Supervised Learning** fits in among the various AI and ML approaches.

---

## What is Supervised Machine Learning?

Supervised learning means **training a model under guidance**, similar to how a teacher helps a student learn.

We provide the model with **examples and their target outputs** (e.g., car features and their price).  
The model then learns to recognize **patterns** and can **generalize** to make predictions on new, unseen data.

### Key Components

| Concept | Description |
|----------|--------------|
| **Rows** | Observations or objects we want to predict for |
| **Columns** | Features or characteristics of each observation |
| **X** | The **feature matrix** (2D array — array of arrays) |
| **y** | The **target variable** (1D array) |

**Formal definition:**  
\[
g(X) \approx y
\]  
Where:
- **X** = feature matrix  
- **y** = target variable  
- **g** = model that approximates the mapping from X → y  

> The goal of training is to learn the function **g** that best predicts **y** from **X**.  
> While predictions may not always be perfect, we aim to minimize the error as much as possible.

---

### Using the Model for Prediction 

When we apply the model to new (unseen) data, it produces **predicted values** — often as **probabilities**.  
Depending on a chosen **threshold** (e.g., 0.5), the output is interpreted as a class label:
- ≥ 0.5 → Spam (1)
- < 0.5 → Not spam (0)

---

## Types of Supervised Machine Learning

### 1. Regression
Used for predicting **continuous values**.

**Examples:**
- Predicting the price of a car or house  
- Estimating temperature or stock prices  

**Model output:**  
A number between \(-\infty\) and \(+\infty\).

---

### 2. Classification
Used for predicting **categories or labels**.

**Examples:**
- Identifying whether an email is spam or not spam  
- Recognizing an image as a car, dog, or cat  

**Model output:**  
A category or class label.

**Subtypes:**
- **Binary Classification** – Two possible classes (e.g., spam vs. not spam)
- **Multiclass Classification** – More than two classes (e.g., cat, dog, car)

---

### 3. Ranking
Used when the goal is to **rank items** by relevance or score.

**Examples:**
- Recommender systems (e.g., ranking products by purchase likelihood)
- Search engines (e.g., ranking pages by relevance)

Here, the algorithm assigns a **score** to each item and ranks them accordingly.

---

**References:**
1. [ML Zoomcamp — Introduction to Machine Learning (Slides)](https://www.slideshare.net/slideshow/ml-zoomcamp-13-supervised-machine-learning/250116520)  
2. [Reference Notes](https://knowmledge.com/2023/09/11/ml-zoomcamp-2023-introduction-to-machine-learning-part-3/)
