# Naïve Bayes
Probabilistic classifiers can give you a prediction and a probability - something you can't get from nearest neighbour classifiers.

## Lazy vs Eager Learners
Lazy learners save a training set and have to remember it each time to use it while eager learners will generate a model that it can use to classify new data - this tends to be faster,

## Probability
Prior probability of a hypothesis: P(h). For a coin: P(heads) = 0.5. Rolling 1 with a die: P(1) = 1/6. Probability person is woman if even number of men and women: P(Female) = 0.5.

If you research a uni (say Stanford) a person is at and realise that it is 86% female, you may change the probability of a random person there being female. You can denote this as P(Female | D) - probability of hypothesis h given data D:  P(Female | attends stanford) = 0.86. The formula is:

$$
P(A|B) = \frac{P(A \cap B)}
{p(B)}
$$

Terms
- P(h) is the prior probability before evidence
- P(D) is the posterior probability after we observe data D. Also called conditional probability.

## Laptop and Phone Probabilities

| Name  |  Laptop |  Phone |
|:-:|:-:|:-:|
| Kate  | PC  | Android  |
| Tom  | PC |  Android |
| Harry  | PC |  Android |
| Annika  | Mac  | iPhone  |
| Naomi  | Mac  | Android  |
| Joe  | Mac  | iPhone  |
| Chakotay  | Mac  | iPhone  |
| Neelix  | Mac  | Android  |
| Kes  | PC | iPhone  |
| B’Elanna  | Mac  | iPhone  |

### Probability randomly selected person has an iPhone?
p(iPhone) =  5 /10 = 0.5

### Probability randomly selected person has an iPhone if they already have a Mac?
$$
P(iPhone|mac) = \frac{P(mac \cap iPhone)}
{P(mac)}
$$

4 People have both a Mac and iPhone: $P(mac \cap iPhone) = 4/10 = 0.4$  
Probability random person has a mac: $P(mac) = 6/10 = 0.6$

Probability that someone uses an iPhone if that person has a Mac:
$$
P(iPhone|mac) = \frac{P(0.4)}
{p(0.6)} = 0.667
$$

### Probability  someone owns a Mac if they also own an iPhone?
$$
P(mac|iPhone) = \frac{P(iPhone \cap mac)}
{P(iPhone)} =  \frac{P(4 / 10)}
{P(5/10)} =  0.4 / 0.5 = 0.8
$$

## Shopping Cart Example
Want to determine whether to show you an ad for a green tea if you're likely to purchase it.  
P(D): probability some training data will be observed. If half the people live in the post code 88005, then the probability of someone being from that postcode P(88005) = 0.5  
P(D|H): probability that a value holds given the hypothesis. For example, the probability that someone lives in 88005 given they've bought green tea P(88005|green tea)

In [8]:
import pandas as pd

customers = {   'Customer ID': range(10),
                'Zipcode': [88005, 88001, 88001, 88005, 88003, 88005, 88005, 88001, 88005, 88003],
                'Bought Organic': [True, False, True, False, True, False, False, False, True, True],
                'Bought Green Tea': [True, False, True, False, False, True, False, False, True, True]}

df = pd.DataFrame(customers)
df

Unnamed: 0,Bought Green Tea,Bought Organic,Customer ID,Zipcode
0,True,True,0,88005
1,False,False,1,88001
2,True,True,2,88001
3,False,False,3,88005
4,False,True,4,88003
5,True,False,5,88005
6,False,False,6,88005
7,False,False,7,88001
8,True,True,8,88005
9,True,True,9,88003


## Probability someone who bought green tea lives in post code 88005
P(D|H): P(88005 | green tea)

In [28]:
num_bought_green_tea = sum(df['Bought Green Tea'])
bought_tea_zip = df[(df['Bought Green Tea']==True) & (df['Zipcode']==88005)]
bought_tea_zip

Unnamed: 0,Bought Green Tea,Bought Organic,Customer ID,Zipcode
0,True,True,0,88005
5,True,False,5,88005
8,True,True,8,88005


In [29]:
num_bought_tea_and_live_88005 = len(bought_tea_zip['Zipcode'])

print('{} / {} = {}'.format(num_bought_tea_and_live_88005, num_bought_green_tea, num_bought_tea_and_live_88005 / num_bought_green_tea))

3 / 5 = 0.6


## Opposite: Probability someone who did NOT buy green tea lives in post code 88005
P(D|H): P(88005 | !green tea)

In [30]:
num_didnt_buy_green_tea = len(df['Bought Green Tea']) - sum(df['Bought Green Tea'])
no_tea_zip = df[(df['Bought Green Tea']==False) & (df['Zipcode']==88005)]
no_tea_zip

Unnamed: 0,Bought Green Tea,Bought Organic,Customer ID,Zipcode
3,False,False,3,88005
6,False,False,6,88005


In [33]:
num_didnt_buy_tea_live_88005 = len(no_tea_zip['Zipcode'])
print('{} / {} = {}'.format(num_didnt_buy_tea_live_88005, num_didnt_buy_green_tea, num_didnt_buy_tea_live_88005 / num_didnt_buy_green_tea))

2 / 5 = 0.4


## Probability someone being in postcode 88001?

In [37]:
lives_88001 = df[df['Zipcode'] == 88001]
lives_88001

Unnamed: 0,Bought Green Tea,Bought Organic,Customer ID,Zipcode
1,False,False,1,88001
2,True,True,2,88001
7,False,False,7,88001


In [43]:
p = len(lives_88001['Zipcode']) / len(df['Zipcode'])
p

0.3

# Bayes Theorem
Describes relationship between P(h), P(h | D), P(D), amd P(D | h)

$$
P(h | D) = \frac{P(D | h)P(h)}{P(D)}
$$

Can use this theorem to decide between multiple hypotheses. For example, if you're given some data you can use this to determine what sport someone plays.

In the tea example, we have two hypotheses:
1. They will buy green tea: P(buy tea | 88005)
2. They will not buy green tea: P(-buy tea | 88005)

Once we calculate the probability of 0.6 that they will buy green tea, we can say its likely they will make the purchase.

## Electronics Store
3 sales fliers to send in email, they can show:
1. Laptop
2. Desktop
3. Tablet

Using the information we have about the customer, we want ot send the flier that will be most likely to generate a sale. They hypotheses for which flier is best:  
$P(laptop | D) = \frac{P(D | laptop)P(laptop)}{P(D)}$  
$P(desktop | D) = \frac{P(D | desktop)P(desktop)}{P(D)}$  
$P(tablet | D) = \frac{P(D | tablet)P(tablet)}{P(D)}$

We can refer to these hypotheses as h₁, h₂, h₃ etc and they tend to be the different classes we want to predict e.g. different sports, has disease, does not have disease etc. Once we've calculated all the probabilities, we can pick the hypotheses that is most likely - called **the maximum a posteriori hypothesis** or $h_{MAP}$.

$$
h_{MAP} = arg max_{h \in H} P(h|D)
$$

H is the set of all hypotheses, so this works out for each hypothesis out of all the hypotheses, compute the probabilities and find the one with the highest. With bayes theorem, we convert this to:

$$
h_{MAP} = arg max_{h \in H} \frac{P(D|h)P(h)}
{P(D)}
$$

P(D) is independent of the hypotheses. If you find the nominator before you divide by this denominator, you can compare the hypotheses to find the largest number - most likely.

## Determine if patient has cancer or not
- 0.8% of people have some form of cancer
- T gives binary result: positive or negative
- Cancer present - test returns true positive 98% of the time
- Cancer abssent - test returns correct negative result 97% of the time

Therefore I can work out:
- P(cancer) = 0.008
- P(-cancer) = 0.992
- P(POS|cancer) = 0.98
- P(NEG|cancer) = 0.02
- P(POS|-cancer) = 0.03
- P(NEG|-cancer) = 0.97

If you have a blood test done and the test result is positive, is it more likely that you have cancer than you don't?
- P(POS|cancer): 0.98 * P(cancer): 0.008 = 0.00784
- P(POS|-cancer): 0.03 * P(cancer) = 0.0294

Most likely to not have cancer, to determine how likely the person is to have cancer:

$$
P(cancer|POS) = \frac{0.00784}{0.00784 + 0.0294} = 0.21
$$

There is a 21% chance of having cancer.

## Why use Bayes Theorem?

The example of the green tea and zipcode from earlier presented us with two hypotheses:
- P(h₁|D) = P(buygreentea|8805)
- P(h₂|D) = P(buygreentea|8805)

Can rewrite as

$$
\frac{P(8805 | buygreentea)P(buygreentea)}
{8805}
$$

Since we can calculate it directly from data in the table, why use the equation? Because normally its hard to calculate P(h|D) directly.

## Naïve Bayes
Use more evidence than a single piece of data when calculating the probability with Bayes theorem. In the tea example, we have two types of evidence: zip code and whether a person has purchased organic items. To calculate the probability of a hypothesis, we multiply the individual probabilities.

What is the probability that someone who lives in 8805 zipcode and bought organic items will buy green tea?  
P(tea|8805 & organic) = 0