# Calculate Probability of Playing Golf

See the example from 

- [https://datacadamia.com/data_mining/naive_bayes](https://datacadamia.com/data_mining/naive_bayes).
- [https://www.geeksforgeeks.org/naive-bayes-classifiers/](https://www.geeksforgeeks.org/naive-bayes-classifiers)

In [1]:
import pandas as pd

In [2]:
df = pd.read_csv('golf.csv'); df.shape

(14, 7)

In [4]:
df

Unnamed: 0,Outlook,Temperature Numeric,Temperature Nominal,Humidity Numeric,Humidity Nominal,Windy,Play
0,overcast,83,hot,86,high,False,yes
1,overcast,64,cool,65,normal,True,yes
2,overcast,72,mild,90,high,True,yes
3,overcast,81,hot,75,normal,False,yes
4,rainy,70,mild,96,high,False,yes
5,rainy,68,cool,80,normal,False,yes
6,rainy,65,cool,70,normal,True,no
7,rainy,75,mild,80,normal,False,yes
8,rainy,71,mild,91,high,True,no
9,sunny,85,hot,85,high,False,no


## Calculate Probabilities

### P(A) - Probability of A Occuring Itself

In [4]:
p_yes_no = df.groupby('Play')
p_yes_no = p_yes_no.agg({'Play': ['count', lambda x: str(len(x)) + '/' + str(len(df))]})
p_yes_no.columns = ['Play', 'P(Yes)/ P(No)']
p_yes_no

Unnamed: 0_level_0,Play,P(Yes)/ P(No)
Play,Unnamed: 1_level_1,Unnamed: 2_level_1
no,5,5/14
yes,9,9/14


### P(A and B) - Probability of A and B Occuring Together

In [5]:
def get_conditional_probability(df, feature_name):
    df_conditional = df.groupby(feature_name).agg({
        'Play': [
            lambda x: sum(1 for i in x if i == 'yes'), 
            lambda x: sum(1 for i in x if i == 'no'),
            lambda x: '{}/{}'.format(sum(1 for i in x if i == 'yes'), df.Play.value_counts().yes), 
            lambda x: '{}/{}'.format(sum(1 for i in x if i == 'no'), df.Play.value_counts().no), 
        ]
    })
    df_conditional.columns = ['Yes', 'No', f'P({feature_name}|Yes)', f'P({feature_name}|No)']
    return df_conditional


In [6]:
p_outlook_given_play = get_conditional_probability(df, 'Outlook')
p_outlook_given_play

Unnamed: 0_level_0,Yes,No,P(Outlook|Yes),P(Outlook|No)
Outlook,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
overcast,4,0,4/9,0/5
rainy,3,2,3/9,2/5
sunny,2,3,2/9,3/5


In [7]:
p_temparature_given_play = get_conditional_probability(df, 'Temperature Nominal')
p_temparature_given_play

Unnamed: 0_level_0,Yes,No,P(Temperature Nominal|Yes),P(Temperature Nominal|No)
Temperature Nominal,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
cool,3,1,3/9,1/5
hot,2,2,2/9,2/5
mild,4,2,4/9,2/5


In [8]:
p_humidity_given_play = get_conditional_probability(df, 'Humidity Nominal')
p_humidity_given_play

Unnamed: 0_level_0,Yes,No,P(Humidity Nominal|Yes),P(Humidity Nominal|No)
Humidity Nominal,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
high,3,4,3/9,4/5
normal,6,1,6/9,1/5


In [9]:
p_windy_given_play = get_conditional_probability(df, 'Windy')
p_windy_given_play

Unnamed: 0_level_0,Yes,No,P(Windy|Yes),P(Windy|No)
Windy,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
False,6,2,6/9,2/5
True,3,3,3/9,3/5


## Question: Probability of playing golf in any day?

> P(Yes)

The answer is 9/14.

## Question: Probability of playing if the weather is sunny?

Below is the equation:

> P(Yes | Sunny) = P(Sunny|Yes) * P(Yes) / P(Sunny)

P(Sunny |Yes) = 2/9 = 0.22  
P(Sunny) = 5/14 = 0.36  
P(Yes)= 9/14 = 0.64  
**P(Yes|Sunny)** = 0.22 * 0.64 / 0.36 = 0.39

## Question: Probability of the playing given that it is a new day with below conditions?

Bayes theorem states:

Probability of event A given that B already occured is

$$
\text{Prob(A given B)} = \frac{\displaystyle \text{Prob(A and B)}}{\displaystyle \text{Prob(A)}}
$$

What is the probability of Play for `yes` and `no` a new day with the following day characteristics?

> Rainy, Cool, High, True

Probability of  yes, given that it is a new day is:

$$
\begin{array}{rrr}
P(Yes|New Day) & = & \frac{P(Rainy Outlook |Yes).P(Cool Temperature |Yes).P(High Humidity|Yes).P(With Wind|Yes).P(Play|Yes)}{P(New Day) }
\end{array}
$$

In this equation, `P(Yes|New Day)` is unknown but if it is summed up with the no probabililty `P(No|New Day)`, it sums up to 1.

$$
P(New Day)=P(Yes|New Day)+P(No|New Day)=1
$$

In [10]:
def get_p_yes_no_given_new_day(play, outlook, temperature, humidity, windy):
    _play = play.capitalize()
    a = p_outlook_given_play.loc[p_outlook_given_play.index == outlook, f'P(Outlook|{_play})'][0]
    b = p_temparature_given_play.loc[p_temparature_given_play.index == temperature, f'P(Temperature Nominal|{_play})'][0]
    c = p_humidity_given_play.loc[p_humidity_given_play.index == humidity, f'P(Humidity Nominal|{_play})'][0]
    d = p_windy_given_play.loc[p_windy_given_play.index ==  windy, f'P(Windy|{_play})'].iloc[0]
    e = p_yes_no.loc[p_yes_no.index == play, f'P(Yes)/ P(No)'].iloc[0]
    out = '{} * {} * {} * {} * {}'.format(a, b, c, d, e)
    print(out)
    return round(eval(out), 4)

In [11]:
p_yes_given_new_day = get_p_yes_no_given_new_day('yes', 'rainy', 'cool', 'high', True)
p_yes_given_new_day, 

3/9 * 3/9 * 3/9 * 3/9 * 9/14


(0.0079,)

In [12]:
p_no_given_new_day = get_p_yes_no_given_new_day('no', 'rainy', 'cool', 'high', True)
p_no_given_new_day

2/5 * 1/5 * 4/5 * 3/5 * 5/14


0.0137

In [13]:
p_new_day = (p_yes_given_new_day + p_no_given_new_day)
p_new_day

0.0216

In [14]:
p_yes_given_new_day / p_new_day

0.36574074074074076

In [15]:
p_no_given_new_day / p_new_day

0.6342592592592592