# What is Machine Learning?
---
- Machine Learning (ML) is the science of getting computers to learn and act from data without being explicitly programmed. 
- Instead of writing a set of rules, you provide data to an algorithm, which then builds its own logic.

---
*A subfeild of computer science that gives computer for ability to learn (pattern in data) without explicitly programmed.*
- Machines learn pattern from the data and replicate the same in future.


> A computer program is said to learn from Experience `E` with respect to some class of Task `T` and performance measure `P`. If its performance `P` at task `T` improves with experience `E`. 

*By Tom Mitchell (1997)*

![image.png](attachment:image.png)

# Motivations of Machine Learning
---
## Why Machine Learning?

- **Automation of tasks**  
  Machines can learn from data and perform tasks without being explicitly programmed.

- **Handling complexity**  
  Some problems are too complex for humans to code rules manually (e.g., face recognition, language translation).

- **Adaptability**  
  ML systems improve with more data → performance gets better over time.

- **Pattern discovery**  
  ML uncovers hidden patterns and insights in large datasets that humans might miss.

- **Scalability**  
  ML can process huge volumes of data faster than humans.

---

## Real-world Motivations
- **Recommendation Systems** (Netflix, Amazon, YouTube)  
- **Healthcare Predictions** (disease diagnosis, drug discovery)  
- **Finance** (fraud detection, stock market trends)  
- **Self-driving Cars** (autonomous decision making)  
- **Natural Language Processing** (chatbots, translations)



# AI vs ML vs DL vs DS

---

## 1. Artificial Intelligence (AI)
- **Broadest field**
- Goal: Make machines act smart and mimic human intelligence
- Examples: Chatbots, Self-driving cars, Virtual assistants

---

## 2. Machine Learning (ML)
- **Subset of AI**
- Machines learn from data (patterns, predictions) without being explicitly programmed
- Examples: Recommendation systems, Spam filters, Fraud detection

---

## 3. Deep Learning (DL)
- **Subset of ML**
- Uses Artificial Neural Networks (ANNs) to learn complex patterns
- Great for large-scale data (images, videos, text, speech)
- Examples: Face recognition, Voice assistants, Autonomous driving

---

## 4. Data Science (DS)
- **Interdisciplinary field**
- Combines statistics, data analysis, ML, visualization → extract insights & build solutions
- Examples: Business analytics, Predictive modeling, Decision-making support

---

## Comparison Table

| Field                   | Scope                         | Key Techniques                 | Examples                          |
|--------------------------|-------------------------------|--------------------------------|-----------------------------------|
| **AI**                  | Broad – mimic human behavior | Rule-based + ML + DL           | Chatbots, Robotics                |
| **ML**                  | Subset of AI – learn from data | Regression, Classification     | Spam filters, Recommendations     |
| **DL**                  | Subset of ML – neural nets    | CNN, RNN, Transformers         | Face ID, Speech recognition       |
| **DS**                  | Broader than AI/ML/DL         | Stats, ML, Visualization       | Predictive models, Analytics      |

---


![image.png](attachment:image.png)

# Styles of Machine Learning

Machine Learning is broadly divided into three main styles (sometimes four):

---

## 1. Supervised Learning
- **Definition**: Learn from labeled data (input → output is known).
- **Goal**: Predict output for new/unseen input.
- **Examples**: 
  - Predicting house prices
  - Spam email detection
- **Techniques**: Regression, Classification

---

## 2. Unsupervised Learning
- **Definition**: Learn from unlabeled data (only input, no output).
- **Goal**: Discover hidden patterns, groupings, or structures.
- **Examples**: 
  - Customer segmentation
  - Market basket analysis
- **Techniques**: Clustering, Dimensionality Reduction

---

## 3. Reinforcement Learning
- **Definition**: Agent learns by interacting with environment, receiving rewards/penalties.
- **Goal**: Learn a policy that maximizes long-term reward.
- **Examples**: 
  - Self-driving cars
  - Game playing (Chess, Go, Atari)
- **Techniques**: Q-Learning, Deep RL

---

## 4. Semi-Supervised Learning
- **Definition**: Mix of labeled and unlabeled data.
- **Goal**: Leverage small labeled + large unlabeled data.
- **Examples**: 
  - Medical imaging (few labeled scans, many unlabeled)

---

## Comparison Table

| Style                  | Data Used        | Goal                           | Examples                          |
|-------------------------|-----------------|--------------------------------|-----------------------------------|
| **Supervised**          | Labeled         | Predict outputs for new data    | Price prediction, Spam detection  |
| **Unsupervised**        | Unlabeled       | Discover hidden structure       | Customer segmentation, Clustering |
| **Reinforcement**       | Interaction     | Maximize reward via actions     | Robotics, Self-driving cars       |
| **Semi-Supervised**     | Few labeled + many unlabeled | Improve learning with less data | Medical imaging, Web content      |

---



![image.png](attachment:image.png)

# Supervised Machine Learning

## Definition
- A type of ML where the model learns from **labeled data** (input + correct output given).
- Goal: **Predict output** for new, unseen inputs.

---

## How it Works
1. Provide dataset with **features (X)** and **labels (y)**.
2. Model learns mapping **X → y** during training.
3. Test the model on new data to predict outputs.



![image.png](attachment:image.png)

---

# Types
## Regression
- **Definition**: Predicts a **continuous value** (numbers).
- **Goal**: Estimate "how much" or "how many".
- **Examples**:
  - Predicting house prices
  - Forecasting temperature
  - Predicting stock prices
---

### 1. Regression (House Price Prediction)

| House Size (sq.ft) | No. of Bedrooms | Price (₹ in Lakhs) |
|---------------------|-----------------|--------------------|
| 1000               | 2               | 40                 |
| 1500               | 3               | 55                 |
| 2000               | 3               | 70                 |
| 2500               | 4               | 90                 |

**Goal:** Learn the relation → predict price of a new house.

---
## Classification
- **Definition**: Predicts a **categorical label** (class).
- **Goal**: Decide "which category".
- **Examples**:
  - Spam vs Not Spam (Email filter)
  - Disease vs No Disease (Medical diagnosis)
  - Cat vs Dog (Image classification)
---

### 2. Classification (Email Spam Detection)

| Email Text Snippet                 | Label     |
|------------------------------------|-----------|
| "Win a free iPhone now!!!"         | Spam      |
| "Meeting scheduled at 3 PM"        | Not Spam  |
| "Congratulations, you won lottery" | Spam      |
| "Project deadline tomorrow"        | Not Spam  |

**Goal:** Classify new emails as *Spam* or *Not Spam*.

---

## Comparison Table

| Feature           | Regression                        | Classification                  |
|-------------------|-----------------------------------|---------------------------------|
| **Output Type**   | Continuous numeric value          | Discrete categories (labels)    |
| **Questions**     | "How much?" / "How many?"         | "Which class?" / "Yes or No?"   |
| **Examples**      | House price, Temperature forecast | Spam detection, Disease check   |
| **Algorithms**    | Linear Regression, SVR            | Logistic Regression, SVM, Trees |

---


![image.png](attachment:image.png)

# Unsupervised Machine Learning

## Definition
- A type of ML where the model learns from **unlabeled data** (only input, no output).
- Goal: **Find hidden patterns, groupings, or structure** in the data.

---

## Advantages
- Works with **unlabeled datasets** (cheap, widely available).
- Useful for **exploratory data analysis**.
- Reveals hidden patterns not obvious to humans.

## Limitations
- Harder to evaluate (no labels = no direct accuracy).
- May produce clusters/patterns that are not meaningful.

---

## Comparison with Supervised ML

| Feature              | Supervised ML             | Unsupervised ML                 |
|-----------------------|--------------------------|---------------------------------|
| Data Type            | Labeled                  | Unlabeled                       |
| Goal                 | Predict output           | Find hidden patterns            |
| Output Example       | House price, Spam/NotSpam| Customer groups, Topic clusters |
| Algorithms Examples  | Linear Regression, SVM   | K-Means, PCA, Hierarchical Clust|

---



![image.png](attachment:image.png)


# Key Techniques
1. **Clustering** → Group similar data points  
   - Example: Customer segmentation  
   - Algorithms: K-Means, Hierarchical Clustering, DBSCAN  

2. **Dimensionality Reduction** → Simplify data by reducing features  
   - Example: Compressing image data, Visualization of high-dimensional data  
   - Algorithms: PCA (Principal Component Analysis), t-SNE  

---

## Mini Example Datasets

### 1. Clustering (Customer Segmentation)

| Customer ID | Age | Annual Income (₹) | Spending Score |
|-------------|-----|--------------------|----------------|
| 1           | 22  | 30,000             | 80             |
| 2           | 25  | 35,000             | 75             |
| 3           | 45  | 70,000             | 20             |
| 4           | 40  | 65,000             | 25             |
| 5           | 23  | 32,000             | 78             |

**Goal:** Group customers into clusters (e.g., High spenders, Low spenders).

---

### 2. Dimensionality Reduction (Image Compression)

| Pixel 1 | Pixel 2 | Pixel 3 | Pixel 4 | ... |
|---------|---------|---------|---------|-----|
| 120     | 135     | 140     | 150     | ... |
| 122     | 138     | 142     | 149     | ... |

**Goal:** Reduce high-dimensional pixel data → fewer features → faster processing.

---


![image.png](attachment:image.png)

### Labeled vs Unlabeled Data

---

1. Labeled Data
- **Definition**: Data where each input has a corresponding **known output (label)**.  
- **Used In**: Supervised Learning & Semi-Supervised Learning.  
- **Purpose**: Helps the model learn the exact mapping from input → output.

- Mini Example (Email Spam Detection)

| Email Text Snippet                 | Label     |
|------------------------------------|-----------|
| "Win a free iPhone now!!!"         | Spam      |
| "Meeting scheduled at 3 PM"        | Not Spam  |
| "Project deadline tomorrow"        | Not Spam  |

**Goal:** Train the model to classify new emails correctly.

---

2. Unlabeled Data
- **Definition**: Data with **no output or label**. Only input is available.  
- **Used In**: Unsupervised Learning & Semi-Supervised Learning.  
- **Purpose**: Model finds patterns, clusters, or structures from raw data.

- Mini Example (Customer Data)

| Customer ID | Age | Annual Income (₹) | Spending Score |
|-------------|-----|------------------|----------------|
| 1           | 22  | 30,000           | 80             |
| 2           | 25  | 35,000           | 75             |
| 3           | 45  | 70,000           | 20             |
| 4           | 40  | 65,000           | 25             |

**Goal:** Group customers into clusters (high spenders, low spenders).

---

- Key Differences

| Feature           | Labeled Data                     | Unlabeled Data                  |
|-------------------|---------------------------------|---------------------------------|
| Output            | Known (labels provided)          | Unknown (no labels)            |
| ML Type           | Supervised, Semi-Supervised      | Unsupervised, Semi-Supervised  |
| Purpose           | Predict outputs                  | Find patterns or clusters      |
| Example           | Email spam detection, House price prediction | Customer segmentation, Topic modeling |

---


![image.png](attachment:image.png)

---

## 1. Semi-Supervised Learning

### Definition
- A mix of **labeled and unlabeled data**.  
- Goal: **Leverage a small labeled dataset + large unlabeled dataset** to improve learning accuracy.  

### Why Use It?
- Labeling data is expensive or time-consuming.
- Most real-world datasets are mostly unlabeled.

### Examples
- Medical imaging: few labeled scans, many unlabeled
- Web content classification
- Speech recognition with limited transcribed audio

### Mini Dataset Example

| Patient ID | Scan Features      | Disease Label |
|------------|------------------|---------------|
| 1          | [0.2, 0.5, 0.8]  | Positive      |
| 2          | [0.1, 0.4, 0.7]  | ?             |
| 3          | [0.3, 0.6, 0.9]  | Positive      |
| 4          | [0.2, 0.3, 0.5]  | ?             |

**Goal:** Use labeled + unlabeled data → predict disease for unlabeled scans.




![image.png](attachment:image.png)

---

## 2. Reinforcement Learning (RL)

### Definition
- An **agent learns by interacting with an environment**.
- Goal: **Take actions to maximize long-term reward**.
- Based on trial & error, not labeled data.

### Components
1. **Agent** → Learner/decision maker  
2. **Environment** → Where the agent operates  
3. **Action** → Choices the agent can make  
4. **Reward** → Feedback signal (positive/negative)  
5. **Policy** → Strategy to decide actions  

### Examples
- Self-driving cars  
- Game playing (Chess, Go, Atari)  
- Robotics tasks  

### Mini Example

| State           | Action           | Reward |
|-----------------|-----------------|--------|
| At traffic light| Stop             | +10    |
| At traffic light| Go               | -10    |
| Empty road      | Accelerate       | +5     |
| Empty road      | Brake            | -2     |

**Goal:** Learn policy → maximize total reward over time.

---


![image.png](attachment:image.png)