

# Rule-Based Systems vs Machine Learning

## Rule-Based Systems

In a **rule-based system**, you manually define rules to distinguish between *ham* (normal emails) and *spam*.  

You start by setting up rules — and at first, everything works fine.  
However, as new types of spam appear, you constantly need to **adjust the rule set**, creating an endless loop of maintenance.  

Over time, the system becomes:
- Harder to manage   
- Less flexible  
- Prone to breaking when new patterns emerge  

This endless reconfiguration is often called the **“hamster wheel” problem** — you’re always updating rules but never fully catching up.

---

## Machine Learning Approach

The second way to implement a spam filter is to use **Machine Learning (ML)** instead of hand-coded rules.  

Here’s the process:

1. **Collect the data**  
2. **Define and calculate (extract) features**  
3. **Train the model**  
4. **Apply the model to new data**

---

### 1. Collect the Data

You can collect data automatically while using the **“SPAM”** or **“Not SPAM”** button in your email system.  
Each click generates a labeled example (spam = 1, no spam = 0).

---

### 2. Define & Extract Features

To create features, start from the same ideas you would use for rules in the rule-based system — but now turn them into measurable characteristics.

**Example features:**

| Feature Description | Type | Example Value |
|----------------------|------|----------------|
| Length of title > 10? | Binary | 1 / 0 |
| Length of body > 10? | Binary | 1 / 0 |
| Sender = “promotions@online.com”? | Binary | 1 / 0 |
| Sender = “hpYOSKmL@test.com”? | Binary | 1 / 0 |
| Sender domain = “test.com”? | Binary | 1 / 0 |
| Description contains “deposit”? | Binary | 1 / 0 |

Each email can be represented as a binary vector, for example:  
`[1, 1, 0, 0, 1, 1]`

Each email also has a **label/target**:
- `1` → spam  
- `0` → not spam  

---

### 3. Training the Model

This data is used to **train** (or *fit*) the model.  

During training, the model finds relationships between features and labels — similar to solving a complex system of equations with many parameters.  

- Each feature is assigned a **weight**, showing how much it contributes to predicting spam.  
- The goal is to **minimize the classification error** between predicted and actual labels.  
- The result is a **trained model** containing the optimized weights that best separate spam from non-spam.

---

### 4. Applying the Model

When applying the model to **unknown data (new emails)**:
- The model outputs a **probability** — e.g., `0.85` → 85% chance of being spam.  
- A **threshold** (commonly `0.5`) is used to make a final decision:
  - If `P(spam) ≥ 0.5` → classify as **spam**
  - If `P(spam) < 0.5` → classify as **not spam**

This process replaces rigid rules with **learned patterns**, making the system adaptive and more maintainable.

---

## Summary

| Step | Description |
|------|--------------|
| **Rule-Based System** | Manually create and adjust rules. Becomes complex and hard to maintain. |
| **Machine Learning** | Collect data, extract features, train model, and make predictions using learned patterns. |

---

**References**  
- [Slides](https://www.slideshare.net/AlexeyGrigorev/ml-zoomcamp-12-ml-vs-rulebased-systems)
- [Reference Notes](https://knowmledge.com/2023/09/10/ml-zoomcamp-2023-introduction-to-machine-learning-part-2/)

---
