<a href="https://colab.research.google.com/github/erdigokce/probability-in-computer-science/blob/master/5_ConditionalProbabilityAndBayesianTheorem/5_ConditionalProbabilityAndBayesianTheorem.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Conditional Probability And Bayesian Theorem

Below there is a training data set of weather and corresponding target variable ‘Play’ (suggesting possibilities of playing). The target is to classify whether players will play or not based on weather condition.

In [0]:
dataset = [("Sunny", False), ("Overcast", True), ("Rainy", True), ("Sunny", True), ("Sunny", True), ("Overcast", True), ("Rainy", False), ("Rainy", False), ("Sunny", True), ("Rainy", True), ("Sunny", False), ("Overcast", True), ("Overcast", True), ("Rainy", False)]


**Table of playing tennis in weather conditions**

| Weather  	| Playing? 	|
|----------	|----------	|
| Sunny    	| No       	|
| Overcast 	| Yes      	|
| Rainy    	| Yes      	|
| Sunny    	| Yes      	|
| Sunny    	| Yes      	|
| Overcast 	| Yes      	|
| Rainy    	| No       	|
| Rainy    	| No       	|
| Sunny    	| Yes      	|
| Rainy    	| Yes      	|
| Sunny    	| No       	|
| Overcast 	| Yes      	|
| Overcast 	| Yes      	|
| Rainy    	| No       	|

In [0]:
grouped_by_feature = {"Grand Total" : {"Yes": 0, "No": 0}}
for weather, is_played in dataset:
  if weather not in grouped_by_feature:
    grouped_by_feature[weather] = {"Yes": 0, "No": 0}
  if is_played:
    grouped_by_feature[weather]["Yes"] = grouped_by_feature[weather]["Yes"] + 1
    grouped_by_feature["Grand Total"]["Yes"] = grouped_by_feature["Grand Total"]["Yes"] + 1
  else:
    grouped_by_feature[weather]["No"] = grouped_by_feature[weather]["No"] + 1
    grouped_by_feature["Grand Total"]["No"] = grouped_by_feature["Grand Total"]["No"] + 1

**Frequency Table**

| Weather         	| No 	| Yes 	|
|-----------------	|----	|-----	|
| Sunny           	| 2  	| 3   	|
| Overcast        	|    	| 4   	|
| Rainy           	| 3  	| 2   	|
| Grand Total     	| 5  	| 9   	|

After these steps it is possible to calculate probabilities of playing tennis based on weather conditions over total 14 of data.

**Problem : ** Calculate the probability of playing tennis if the weather is sunny.

* **P(c|x)** is the posterior probability of class (c, target) given predictor (x, attributes).
* **P(c) **is the prior probability of class.
* **P(x|c)** is the likelihood which is the probability of predictor given class.
* **P(x)** is the prior probability of predictor.

![Bayesian Formula](https://www.analyticsvidhya.com/wp-content/uploads/2015/09/Bayes_rule-300x172-300x172.png)

For this particular problem the formula turns into the following:

`P(Yes | Sunny) = P( Sunny | Yes) * P(Yes) / P (Sunny)`

In [34]:
prob_of_sunny = (grouped_by_feature["Sunny"]["Yes"] + grouped_by_feature["Sunny"]["No"]) / float(grouped_by_feature["Grand Total"]["Yes"] + grouped_by_feature["Grand Total"]["No"])
print("P(Sunny) : " + str(100 * prob_of_sunny) + "%")

P(Sunny) : 35.714285714285715%


In [35]:
prob_of_yes = grouped_by_feature["Grand Total"]["Yes"] / float(grouped_by_feature["Grand Total"]["Yes"] + grouped_by_feature["Grand Total"]["No"])
print("P(Yes) : " + str(100 * prob_of_yes) + "%")

P(Yes) : 64.28571428571429%


In [36]:
prob_of_sunny_when_yes = grouped_by_feature["Sunny"]["Yes"] / grouped_by_feature["Grand Total"]["Yes"]
print("P(Sunny | Yes) : " + str(100 * prob_of_sunny_when_yes) + "%")

P(Sunny | Yes) : 33.33333333333333%


In [37]:
prob_of_yes_when_sunny = prob_of_sunny_when_yes * prob_of_yes / prob_of_sunny
print("P(Yes | Sunny) : " + str(100 * prob_of_yes_when_sunny) + "%")

P(Yes | Sunny) : 60.0%




---


*This example is cited from [6 Easy Steps to Learn Naive Bayes Algorithm](https://www.analyticsvidhya.com/blog/2017/09/naive-bayes-explained/)*