# Probability Practice
## Simple, Joint, Marginal, Conditional probability and Bayes Theorem

**Objective**
1. Define probability and axioms/corollaries 
2. Apply probability properties to solve example problems

**Rubric**
- 4 points: Completes all practice problems as described. Checks work against  answer key in class before submitting.
- 3 points: Completes some of the practice problems as described or completes assignment, but does not check answer key first.
- 2 points: Missing or incomplete

**Syntax**
- $\cap$: Intersection, $A \cap B$ represents the overlap between $A$ and $B$.
- $\cup$: Union, $A \cup B$ represents $A$, $B$, and their overlap. When you are dealing with unions, watch out for double counting!
- $P(A)$: Probability of an event $A$
- $n(A)$: Number of observations in the event $A$

## Simple probability refers to the probability of occurrence of a simple event

Probability of an event $X$, $P(X)$ is given by
$P(X) = \frac{Number \quad of \quad observations \quad in \quad favor \quad of \quad an \quad event \quad X}{Total \quad Number \quad of \quad observations}$ 

### Example 1
The data collected by an advertisement agency has revealed that out of 2800 visitors, 56 visitors clicked on 1 advertisement(and only 1), 30 clicked on 2 advertisements(and only 2) and 14 clicked on 3 advertisements and the remaining did not click on any advertisement. 

Calculate
* a) The probability that a visitor to the website will not click on any  advertisement.
* b) The probability that a visitor to the website will click on an advertisement.
* c) The probability that a visitor to the website will click on more than one advertisement.

### Solution

a) The probability that a visitor to the website will not click on any advertisement.

In [1]:
data = {'total': 2800, '1ad': 56, '2ad':30, "3ad": 14}

a = ((data['total'] -(data['1ad'] + data['2ad'] + data['3ad'])) / data['total']) * 100
print(str(round(a)) + '% chance')

96% chance


b) The probability that a visitor to the website will click on an advertisement.

In [2]:
b = (((data['1ad'] + data['2ad'] + data['3ad'])) / data['total']) * 100
print(str(round(b)) + '% chance')

4% chance


c) The probability that a visitor to the website will click on more than one advertisement.

In [3]:
c = ((data['2ad'] + data['3ad']) / data['total']) * 100
print(str(round(c)) + '% chance')

2% chance


## Joint probability refers to the probability of occurrence involving two or more events

Let A and B be the two events in a sample space. Then the joint probability if the two events denoted by $P(A \cap B)$, is given by 
$P(A \cap B) = \frac{Number \quad of \quad observations \quad in \quad A \cap B } {Total \quad Number \quad of \quad observations}$ or $P(A \cap B) = P(A) * P(B|A)$ or $P(A \cap B) = P(A|B) * P(B)$. 

Those last two definitions may not make sense just yet, but keep them in your back pocket for later. 


### Example 1
At a popular company service center, a total of 100 complaints were received. 80 customers complained about late delivery of the items and 60 complained about poor product quality.

* a) Calculate the probability that a customer complaint will be about both product quality and late delivery.
* b) What is the probability that a complaint will be only about  late delivery?

### Solution

a) Calculate the probability that a customer complaint will be about both product quality and late delivery

In [4]:
data = {'complaints': 100, 'late': 80, 'quality': 60}
data['both'] = data['late'] + data['quality'] - data['complaints']
data['lateonly'] = data['late'] - data['both']
data['qualityonly'] = data['quality'] - data['both']
a = data['both']/data['complaints']
print(a * 100, '% chance')

40.0 % chance


b. What is the probability that a complaint will be only about late delivery

In [5]:
b = data['lateonly'] / data['complaints'] * 100
print(b, '% chance')

40.0 % chance


### Example 2
|Planned to purchase Apple iPhone Xs Max | Actually placed an order for Apple iPhone Xs Max- Yes |  Actually placed an order for Apple iPhone Xs Max - No | Total |
| ------------- | ------------ | ---------- | -----|
| Yes | 400 | 100 | 500 |
| No | 200 | 1300 | 1500 |
| Total | 600 | 1400 | 2000 |

Calculate the joint probability of the people who planned to purchase and actually placed an order.

### Solution

In [6]:
print(400/2000 * 100, '% chance')

20.0 % chance


## Marginal probability refers to the probability of an event without any condition

P(A) = P(A and $B_{1}$) + P(A and $B_{2}$) + P(A and $B_{3}$) + ... + P(A and $B_{k}$) 
where $B_{1}$, $B_{2}$, $B_{3}$, ..., $B_{k}$ are k mutually exclusive and collectively exhaustive events, defined as follows:

* Two events are mutually exclusive if both the events cannot occur simultaneously.
* A set of events are collectively exhaustive if one of the events must occur.

### Example 1

Use the purchase of Apple iPhone Xs Max table.
What is the probability of planning to purchase Apple iPhone Xs Max?

### Solution
What is the probability of planning to purchase Apple iPhone Xs Max?

In [7]:
print(500/2000 * 100, '% chance')

25.0 % chance


Note that you get the same result by adding the number of outcomes that make up the simple event *planned to purchase* and calculate the probability of that *simple event*.

## Axoim of Probability: General Addition Rule
To get the probability of the event $A$ or $B$, you need to consider the occurrence of either event $A$ or $B$ or both $A$ and $B$.
- $P(A$ or $B) = P(A) + P(B) - P(A$ and $B)$
- $P(A \cup B) = P(A) + P(B) - P(A \cap B)$

### Example 1
Use the purchase of Apple iPhone Xs Max table.

What is the probability of planning to purchase Apple iPhone Xs Max or placed an order?

### Solution

In [8]:
planned = 500/2000 * 100
placed = 600/2000 * 100
orprob = planned + placed - (400/2000 * 100)
print(orprob)


35.0


## Conditional probability refers to the probability of event A, given information about the occurrence of another event B

Probability of $A$ given $B$ is written as $P(A | B)$.

$P(A | B) = \frac{P(A  \cap  B)}{P(B)}$

where $P(A \cap B)$ = $P(A $ and $B)$ and follows the axiom above

### Example 1
Use the purchase of Apple iPhone Xs Max table.

Find the joint probability of the people who planned to purchase and actually placed an order, given that people planned to purchase.

### Solution

In [9]:
planned = 500/2000
placed = 600/2000
both = 400/2000
print(both/planned * 100, '% chance')

80.0 % chance


### Example 2
The following table describes loan default status at a bank and their marital status. 

| Marital Status | Loan Defaulted | Loan No Default | Marginal Total |
| ----------- | ------ | ------- | -------- |
| Single | 42 | 258 | 300 |
| Married | 60 | 590 | 650 |
| Divorced | 13 | 37 | 50 |
| Marginal Total | 115 | 885 | 1000 |


Based on the above table, calculate the probability of default  given divorced.

### Solution

In [10]:
default = 115/1000
divorced = 50/1000
both = 13/1000
print(round(both/divorced*100), '% chance (rounded)')

26 % chance (rounded)


## Independent Events
Events $A$ and $B$ are independent, when $P(A|B) = P(A)$
where 
* $P(A|B)$ is the conditional probability of $A$ given $B$
* $P(A)$   is the marginal probability of $A$

So for independent events, our rule for joint probability is simplified. When events $A$ and $B$ are indepedent, $P(A\cap B)= P(A) * P(B)$

### Example 1

Your experiement is rolling a six-sided die twice, one roll after the other.

- a) What is the sample space of the experiement?
- b) What is the probability of getting a "6" in two consecutive trials when rolling a six-sided die?
- c) Are these two events independent? 

### Solution
a) What is the sample space of the experiement?

In [12]:
print(6*6)

36


b) What is the probability of getting a "6" in two consecutive trials when rolling a six-sided die?

In [11]:
print((1/6) * (1/6))

0.027777777777777776


c) Are these two events independent? 

In [14]:
print('yes')

yes


### Exercise 2
What is the probability of getting a 2 on the face of three dice when they are rolled?

### Solution

In [17]:
samplespace = 6**3
print(1/samplespace)

0.004629629629629629


### Exercise 3
You throw a die three times, what is the probability that one or more of your throws will come up with a 1?
### Solution

In [28]:
x = 5**3
print((1 - (x/samplespace))* 100)

42.129629629629626


### Exercise 4
The following table describes loan default status at a Financial Institution and their Occupation status.

| Occupation Status | Loan defaulted | Loan non-default | Total |
| ----------------- | ---- | ----- | ---- |
| Self Employed     | 80 | 240 | 320 |
| Employed in Private Sector | 120 | 860 | 980 |
| Employed in Government Sector | 200 | 3000 | 3200 |
| Total | 400 | 4100 | 4500 | 


- a) Calculate the occupation status that has maximum joint probability of default.
- b) What is the probability of a loan defaulting?
- c) What is the conditional probability of default, given the occupation category of *self-employed*?

### Solution
a) Calculate the occupation status that has maximum joint probability of default.

In [44]:
x = (80/320) *100
y = (120/980) *100
z = (200/3200) *100
data3 = {'self': x, 'private': y, 'gov': z}
data3 = sorted(data3.items(), reverse=True)
print(data3)

[('self', 25.0), ('private', 12.244897959183673), ('gov', 6.25)]


b) What is the probability of a loan defaulting?

In [45]:
print((400/4500) * 100)

8.88888888888889


c) What is the conditional probability of default, given the occupation category of *self-employed*?

In [1]:
print((80/320)* 100)

25.0


## Application of Probability Rules: Association Rule Mining

By using simple probability concepts such as joint probability and conditional probability, we can solve market basket analysis and recommender systems using association rule mining.
- **Market Basket Analysis** is used frequently by retailers to predict products a customer is likely to buy together to improve their sales. For example, if a customer buys bread, they are likely to buy jam or butter as well. 
- **Recommender Systems** are models that produce list of recommendations to a customer on products such as moives, electronic items, etc. Amazon and other online retailers use these systems heavily. 
- **Association Rule Mining** is a method of finding association between different entities in data. In a retail context, this is a method of finding association relationships that exists in *items frequently purchased* or *frequently purchased items*. 
- **Association** is a relationship of the form $X$ -> $Y$ ($X$ implies $Y$), where $X$ and $Y$ are mutually exclusive. The strength of the association between two subsets can be described by the support, confidence and the lift. 


### The Data
| Invoice No | Milk | Bread | Butter | Jam |
| --- | ---- | ---- | ---- | ----- |
| 1 | Y | Y | N | N|
| 2 | N | Y | Y | N|
| 3 | N | Y | Y | N|
| 4 | N | Y | N | Y|
| 5 | Y | Y | N | N|
| 6 | Y | Y | N | N|
| 7 | N | Y | Y | N|
| 8 | N | Y | Y | N|
| 9 | N | Y | N | Y|
| 10 | Y | N | N | N|

In the above table, milk, bread, butter, and jam are the different products sold by the store. Y means the item is purchased and N means the item is not purchased. The strength of the association between two mutually exclusive subsets can be measured using *support*, *confidence* and *lift*.


### Support between two sets is the joint probability of those events
* Support is the proportion of times items $A$ and $B$ are purchased together.
* The support of the association $A$ and $B$ is written 
$Support(A, B) = P(A \cap B) =\frac{n(A \quad \cap \quad B)}{Total \quad number \quad of \quad observations}$ where $n(A \cap B)$ is the number of times both item $A$ and item $B$ is purchased together.

### Example
For the association bread -> milk, what is the support of the relationship?

### Solution
$Support(Bread, Milk) =$

In [3]:
print((3/10)* 100)

30.0


### Confidence is the conditional probability of purchasing item A, given the item B is purchased
* Confidence is the probability of purchasing item $A$ given item $B$ is purchased.
* The confidence of the association $A$ and $B$ is written 
$Confidence(A, B) = P(A \cap B) =\frac{n(A \cap  B)}{P(A)}$ 

### Example
For the association bread -> milk, what is the confidence of the relationship?

### Solution
$Confidence(A, B) =$

In [4]:
print((3/4)*100)

75.0


### Lift is a measure for a rule.
* With lift value you can interpret the importance of a rule. 
* The lift value of an association rule is the ratio of the confidence of the rule and the expected confidence of the rule.
* The expected confidence of a rule is the product of the support values of the rule body and rule head divided by the support of the rule body. In our example, A is the rule head and B is the rule body.
* The lift of the association $A$ and $B$ is written $Lift(A,B) = \frac {P(A ∩ B)} {P(A) P(B)}$

### Example
For the association bread -> milk, what is the lift of the relationship?

### Solution
$Lift(A, B) =$

In [7]:
print(((3/10)/((4/10)*(9/10))))

0.8333333333333333


### Association rules can be generated based on threshold values of support, confidence and lift.

Assume that the threshold values are given for these measures as follows:
* Support    < 0.25
* Confidence < 0.50
* Lift       > 1

For bread -> milk, what are the three values you calculated? 
* Support    is .30
* Confidence is .75
* Lift       is .8333


Is this rule qualified to be an association rule? If the support, confidence, and lift meet the threshold, then the rule is qualified to be a association.
* yes

## Bayes' theorem

Bayes' theorem described the probability of an event, based on prior knowledge of conditions that might be related to the event.

Mathematically, we define Bayes' theorem as

$P(B_{i}\mid A) = \frac{P(A \mid B_{i}) P(B_{i})}{P(A \mid B_{1})P(B_{1}) + P(A \mid B_{2}) P(B_{2}) + P(A \mid B_{3}) P(B_{3}) + .. + P(A \mid B_{k}) P(B_{k})}$

where 
* $B_{i}$ is the $i$-th event of $k$ mutually exclusive and collectively exhaustive events
* $A$ is the new event that might impact $P(B_{i})$

### Example 1

A certain Electronic equipment is manufactured by three companies, X, Y and Z. 
* 75% are manufactured by X
* 15% are manufactured by Y
* 10% are manufactured by Z

The defect rates of electronic equipment manufactured by companies X, Y and Z are 4%, 6% and 8% respectively.

If an electronic equipment is randomly found to be defective, what is the probability that it is manufactured by X?

### Solution
* Let P(X),P(Y) and P(Z) be probabilities of the electronic equipment manufactured by companies X, Y and Z respectively. 
* Let P(D) be the probability of defective electronic equipment.

We are interested in calculating the probability $P(X|D)$. Use Bayes' Rule to calculate the probability

In [10]:
print((30/720)*100)

4.166666666666666


### Example 2
Given the following statistics, what is the probability that a woman has cancer if she has a positive mammogram result?

a) 	1% of over 50 have breast cancer.
b) 	90% of women who have breast cancer test positive on mammograms,
c) 	8% of women will have false positive

Let 
* event A   denote woman has breast cancer
* event ~A  denote woman has no breast cancer
* event T   denote mammogram test is positive 
* event ~T  denote mammogram test is negative 


### Solution
What is the probability that a woman has cancer if she has a positive mammogram result?

$P(T \mid A)$ = 

In [11]:
print((9/(9+79.2))*100)

10.204081632653061


### Example 3
A1 Electronic World is considering marketing a new model of televisions. In the past, 40% of the new model televisions have been successful, and 60% have been unsuccessful. Before introducing the new model television, the marketing research department conducts an extensive study and releases a report, either favorble or unfavorable.

In the past, 80% of the successful new-model televisions had received favorable market reports, and 30% of the unsuccessful new-model televisions had received favorable reports. For the new model of television under consideration, the marketing research department has issued a favorable report.

What is the probability that the television is successful given a favorable report?

Let the following events be represented as follows:
* $S$  denote the successful television
* $US$ denote the unsuccessful television
* $F$  denote the favorable report
* $UF$ denote the unfavorable television


### Solution
$P(S\mid F)$ = 

In [12]:
print((320/500)*100)

64.0


### Exercise 4

The following contingency table list the probabilities of a person having a certain disease and the result of the diagnostic test conducted to determine whether the person is having the disease.

* Let $D$ be the event of having the disease and the event $D'$ not having the disease.
* Let $T$ be the event of having the test result positive and $T'$ be the event of not having the test result positive.

P(D)    = 0.03,
P(D')   = 0.97,
P(T|D)  = 0.90,
P(T|D') = 0.01,

What is the probability that the disease is acually present given a positive test result?


### Solution

In [13]:
print(27/36.7)

0.7356948228882834


### Exercise 6
A1 Construction company is determining whether it should submit a bid for  a new shopping mall. In the past, A1's main competitor, Pyramid Construction company has submitted bids 70% of the time. If Pyramid Construction company does not bid on a job, the probability that A1 Construction Company will get the job is 50%. If the Pyramid Construction company bids on a job, the probability that A1 Construction company will get the job is 25%.

* a) If A1 Construction company gets the job, what is the probability that Pyramid Construction company did not bid?
* b)  What is the probability that A1 Construction company will get the job?


### Solution
a) If A1 Construction company gets the job, what is the probability that Pyramid Construction company did not bid?

In [None]:
## i think it is 75% but i am not sure

b)  What is the probability that A1 Construction company will get the job?

In [1]:
print(37.5)

37.5


###  Exercise 7
The following table describes loan default status at a Financial Institution and their Occupation status.

| Occupation Status | Loan defaulted | Loan non-default | Total |
| ----------------- | ---- | ----- | ---- |
| Self Employed     | 80 | 240 | 320 |
| Employed in Private Sector | 120 | 860 | 980 |
| Employed in Government Sector | 200 | 3000 | 3200 |
| Total | 400 | 4100 | 4500 | 

* a) What is the probability of Loan non-defaulted?
* b) What is the conditional probability of non-default, given the occupation category- Employed in Government Sector?

### Solution
a) What is the probability of Loan non-defaulted?

In [2]:
print(4100/4500)

0.9111111111111111


b) What is the conditional probability of non-default, given the occupation category- Employed in Government Sector?

In [3]:
print(3000/3200)

0.9375


### Exercise 8
You flip a coin and roll a six-sided dice, what is the probability that the coin comes up tails and the dice comes up with 6?

In [5]:
print(((1/6)*(1/2))*100)

8.333333333333332
