# What does it mean for machines to *learn* something?
If I download a copy of Wikipedia, has my machine learned something?

![ML Def 1](images/ml_def_1.png)

![ML Def 2](images/ml_def_2.png)

## Spam Filter
T = Classifying new emails into spam or ham (not spam) <br>
P = Percentage of correctly classified emails (accuracy) <br>
E = Training set (Emails that are already flagged as spam by the user, and emails that were not flagged)

# Traditional vs ML approach
Understanding why machine learning is used

## Spam Filter with traditional approach
1. Examine what are the patterns found in a spam email
2. Write a long list of if-else statements to detect spams
3. Test the program and repeat 1 & 2 until its good enough to launch

In [1]:
KEYWORDS = ["4 u", "prize", "credit", "lottery", "million", "dollar"]
THRESHOLD = 4

def spam_filter(email):
  count = 0
  for keyword in KEYWORDS:
    if keyword in email:
      count += 1
  return count >= THRESHOLD

In [2]:
spam_filter("Free lottery 4 u to win prize worth 10 million dollars")

True

## Spam Filter with ML approach
1. Collect a large number of emails that are already classified as spam or ham
2. Automatically learn patterns that are unusually frequent in spams compared to hams
3. Classify emails

## Why use ML?
- Traditional approach is difficult to maintain: requires a lot of **hand-tuning** and **long list of rules**
- What if spammers figure out the patterns that get flagged as spam and start avoiding them? ML can automatically start detecting and **adapt**
- Complex problems like speech recognition are just not possible to be solved **at scale** in traditional approach.
- ML systems can be inspected to **get insights** about complex problems.

In [4]:
spam_filter("Free lottery for u to win prize worth 10M USD")

False