## Introduction to Basic statistics using python

### Chapters 
#### 1. Basic Probability
#### 2. Probability Distributions - Discrete and Normal Distribution
#### 3. Sampling and Sampling Distributions
#### 4. Confidence Interval Estimation
#### 5. Introduction to Hypothesis Testing

## Chapter 1. Basic Probability

**We divide this topic into following sub-topics:**

##### a. Simple probability
##### b. Joint probability
##### c. Marginal probability
##### d. Conditional probability 
##### e. Bayes Theorem probability

*Let us focus on the first type of probability.*

#### a. Simple probability

### Simple Probability refers to the probability of occurrence of a simple event

Probability of an event X, P(X) is given by
$P(X) = \frac{Number \quad of \quad observations \quad in \quad favor \quad of \quad an \quad event \quad X}{Total \quad Number \quad of \quad observations}$ 

#### Example 1:

The data collected by an Advertisement agency has revealed that out of 2800 visitors, 56 visitors clicked on 1 Advertisement, 30 clicked on 2 advertisements and 14 clicked on 3 advertisements and the remaining did not click on any advertisement. 

Calculate
* a) The probability that a visitor to the website will not click on any  advertisement.
* b) The probability that a visitor to the website will click on an advertisement.
* c) The probability that a visitor to the website will click on more than one advertisement.

## Solution

#### a) The probability that a visitor to the website will not click on any  advertisement.

* Here 56 visitors click on 1 advertisement
* Here 30 visitors click on 2 advertisements
* Here 14 visitors click on 3 advertisements
* Hence, the total number of customers clicking on any advertisement is 100

* Total number of visitors = 2800
* Number of visitors who clicked on 1 advertisement  = 56
* Number of visitors who clicked on 2 advertisements = 30
* Number of visitors who clicked on 3 advertisements = 14
* Number of visitors who clicked on 1 or more advertisements = 56 + 30 + 14 = 100
* Number of visitors who did not click on any advertisement = Total number of visitors - Total number of visitors who cliked on 1 or more advertisements.

In [1]:
FE1 = 2800 - 100 # Favorable Event  ====> Number of visitors who did not click on any advertisement.
TE  = 2800       # Total number of events
PE1 = round(FE1 / TE,4)
print('a. The probability that a visitor to the website will not click on any advertisement is %1.4f' % PE1)

a. The probability that a visitor to the website will not click on any advertisement is 0.9643


#### b) The probability that a visitor to the website will click on an advertisement.

In [2]:
FE2 = 100 # 56 + 30 + 14 # Favorable Event  ====> Number of visitors who did click on any advertisement.
TE  = 2800
PE2 = round(FE2 / TE,4)
print('b. The probability that a visitor to the website will click on an advertisement is %1.4f' % PE2)

b. The probability that a visitor to the website will click on an advertisement is 0.0357


#### c) The probability that a visitor to the website will click on more than one advertisement.

More than one advertisement means 2 or 3 advertisements in our case.

In [3]:
FE3 = 44 # = 30 + 14 

"""
where 30 is the number of visitors clicked on 2 advertisements and 14 is the number of visitors who clicked 
on 3 advertisements

"""
TE  = 2800
PE3 = round(FE3 / TE,4) 
print('c. The probability that a visitor to the website will click on more than one advertisement. is %1.4f' % PE3)

c. The probability that a visitor to the website will click on more than one advertisement. is 0.0157


#### Exercise 1

As a marketing manager for A1 Computer Electionics company, you are analyzing the results of an intent to to purchase a new model laptop (say, e MacBook Pro 2018 15-inch) by asking the 1000 existing customers during the next six months.
250 customers expressed their intent to buy in next six months.

Determine the probability of selecing a customer who had planned to purchase the e MacBook.

#### Exercise 2

A standard six-sided die has six faces. Each face of the die contains any one of one, two, three, four, five or six dots. 
If you roll a die, what is the probability that you will get 
* a) a face with five dots?
* b) a face with more than 1 dot?
* c) a face with less than 3 dots?

*Let us focus on the second type of probability.*
##### b. Joint probability
*Joint Probability refers to the probability of occurrence involving two or more events*

Let A and B be the two events in a sample space. Then the joint probability if the two events denoted by P(A $\cap$ B), is given by $P(A \cap B) = \frac{Number \quad of \quad observations \quad in \quad A \cap B } {Total \quad Number \quad of \quad observations}$ 

#### Example 2:

At a popular company service center, a total of 100 complaints were received. 80 customers complained about late delivery of the items and 60 complained about poor product quality.

* a) Calculate the probability that a customer complaint will be about both product quality and late delivery.
* b) What is the probability that a complaint will be only about  late delivery?

### Solution:

#### a) Calculate the probability that a customer complaint will be about both product quality and late delivery

Let
*    L    = Late delivery
*    Q    = Poor quality
*    n(L) = Number of cases in favour of L =  80
*    n(Q) = Number of cases in favour of Q =  60
*    N    = Total Number of complaints     = 100    

$n(L \cap Q)$ = (80 + 60) - 100 = 40 

Probability that a customer complaint will be about both product quality and late delivery = $P(L \cap Q)$

$P(L \cap Q) =  \frac{n(L \cap Q)} {Total \quad Number \quad of \quad observations}$ 

In [4]:
FE4 = 40 # = (80 + 60) - 100 
TE  = 100
PE4 = round(FE4 / TE,4) 
print('a.Probability that a customer complaint will\n\
       be about both product quality and late delivery. is %1.4f' % PE4)

a.Probability that a customer complaint will
       be about both product quality and late delivery. is 0.4000


#### b. What is the probability that a complaint will be only about late delivery

In [5]:
# FE5 is the complaints about poor quality
FE5 = 60
TE  = 100
PE5 = round(FE5 / TE,4) 
PE6 = 1 - PE5
# So,1 - PE5 is the Probability that a customer complaint will be about
#        both product quality and late delivery 
print('b.probability that a complaint will be \n\
         only about late delivery. is %1.4f' % PE6)

b.probability that a complaint will be 
         only about late delivery. is 0.4000


#### Example 3

|Planned to purchase Apple iPhone Xs Max | Actually placed an order for Apple iPhone Xs Max- Yes |  Actually placed an order for Apple iPhone Xs Max - Yes | Total |
| ------------- | ------------ | ---------- | -----|
| Yes | 400 | 100 | 500 |
| No | 200 | 1300 | 1500 |
| Total | 600 | 1400 | 2000 |

Calculate the joint probability of the people who planned to purchase and actually placed an order.


### Solution

You observe from the above table, that 400 people planned to purchase and actually placed an order for Apple iPhone Xs Max is 400 out of 2000 people.

In [6]:
# FE6 is the number of people who planned and actually placed an order for Apple iPhone Xs Max
FE6 = 400
TE  = 2000
PE7 = round(FE6 / TE,4) 
print('Joint probability of the people who planned to purchase and actually placed an order. is %1.4f' % PE7)

Joint probability of the people who planned to purchase and actually placed an order. is 0.2000


#### Example 4

Anamika is playing a board game. This is a tabletop game that involves counters or pieces moved or placed on a pre-marked surface or "board", according to a set of rules. A six-faced dice used in the game can decide everything including how many steps a player moves their token.

It is her turn, and she wants to roll exactly a twelve to reach her goal.

The only way to get that twelve is to roll a six on each die. Since we already know that rolling two dice are independent events, we can use the joint probability formula to calculate her chances for success. 

Here is the formula:.

**We already know that rolling two dice are independent events. So, we can use the joint probability formula to calculate her chances for success.** 

**Formula:**

P(X, Y) = P(X) * P(Y) where X and Y are the events of getting a six on the face of the first dice and second dice respectively. P(X) and P(Y) are the probabilities of event X and event Y happening respectively.

P(X)   = 1/ 6 ## There are six faces in a dice numbered from 1 to 6. 
P(Y)   = 1/ 6
P(X,Y) = P(X) * P(Y)

In [7]:
PX  = 1/6 # = P(X)
PY  = 1/6 # = P(Y)
PXY = PX * PY # PXY = P(X, Y)
print('Joint probability of the getting a six when two dice are rolled is %1.4f' % PXY)

Joint probability of the getting a six when two dice are rolled is 0.0278


#### Exercise 3

#### The following table describes loan default status and their gender. Find out the gender that has maximum joint probability of default.

| Gender | Loan status   |       |  Total   | 
| ------ | -----------   |       |     |
|        | Default          | No default |  | 
| Male |  60  | 590 | 650 | 
| Female | 45 | 295 | 350 | 
| Total  | 105 | 885 | 1000 | 


### c. Marginal probability

#### c) Marginal probability refers to the probability of an event without any condition

P(A) = P(A and $B_{1}$) + P(A and $B_{2}$) + P(A and $B_{3}$) + ... + P(A and $B_{k}$) 
where $B_{1}$, $B_{2}$, $B_{3}$, ..., $B_{k}$ are k mutually exclusive and collectively exhaustive events, defined as follows:

* Two events are mutually exclusive if both the events cannot occur simultaneously.
* A set of events are collectively exhaustive if one of the events must occur.

#### Example 5

Use the purchase of **Apple iPhone Xs Max** table.
What is the probability of planned to purchase **Apple iPhone Xs Max**?


P(planned to purchase Apple iPhone Xs Max) 
 =   P(Planned to purchase Apple iPhone Xs Max and placed an order) + 
     P(Planned to purchase Apple iPhone Xs Max and not placed an order) 

In [8]:
# Let P  = P(planned to purchase Apple iPhone Xs Max)
#     P1 = P(Planned to purchase Apple iPhone Xs Max and placed an order) 
#     P2 = P(Planned to purchase Apple iPhone Xs Max and not placed an order) 
P1 = 400 / 2000 
P2 = 100 / 2000
P  = P1 + P2
print('Marginal probability of the people who planned to purchase is %1.4f' % P)

Marginal probability of the people who planned to purchase is 0.2500


Note that you get the same result by adding the number of outcomes that make up the simple event *planned to purchase* and calculate the probability of that *simple event*.

#### Exercise 4

#### Use the table in Exercise 3. Find the following marginal probabilities:
* a. Loan Status default
* b. Loan Status non-default
* c. Gender Male
* d. Gender Female

##### d. Conditional probability 

####  d.  Conditional Probability refers to the probability of event A, given information about the occurrence of another event B

Probability of A given B is written as P(A | B).

$P(A\mid B) = \frac{P(A \quad and \quad B)}{P(B)}$

where P(A and B) = Joint probability of A and B
*     P(A)       = Marginal probability of A
*     P(B)       = Marginal probability of B

#### Example 6

Based on the above table in Exercise 3, calculate the probability of default given Male.

### Solution

In [9]:
# P1 = P(Default and Male)
P1   = 60 / 650
# P2 = P(Male)
P2   = 650 / 1000
#P3  = P(Default | Male) = P(Default and Male) / P(Male)
P3   = P1 / P2
print('P(Default | Male)  is %1.4f' % P3)   

P(Default | Male)  is 0.1420


### Independent Events

Two events, A and B are independent if and only if
P(A | B) = P(A), 

where 
* P(A|B) is the conditional probability of A given B
* P(A)   is the marginal probability of A

Example: A student getting A grade in both Final Stats exam and in final Marketing exam

##### Example 7

What is the probability of getting a "6" in two consecutive trials when rolling a dice?

For each roll of a dice:
* Favorable events = {"6"}
* Total number of outcomes = {"1","2","3","4","5","6"}
* Let P1 be the probability of getting a "6" in the first roll of dice.
* Let P2 be the probability of getting a "6" in the second roll of dice.
* Since first roll of dice does not influence the second roll of dice, these events are independent.

In [10]:
P1 = 1 / 6
P2 = 1 / 6
P   =  P1 * P2 # P = P(Getting a 6 in two consecutive rolls of dice)
print('Getting a 6 in two consecutive rolls of dice is %1.4f' % P) 

Getting a 6 in two consecutive rolls of dice is 0.0278


#### Exercise 5

** What is the probability of getting a 2 on the face of three dices  when they are rolled?
Hint: A dice has six faces and contains values 1,2,3,4,5,6**

##### Exercise 6

** You throw a die three times, what is the probability that one or more of your throws will come up with a 1?
Hint: You need to calculate the probability of getting a 1 on at least one of the throws.**

#### Exercise 7

** The following table describes loan default status at a Financial Institution and their Occupation status.
Calculate the Ocupation status that has maximum joint probability of default.**

| Occupation Status | Loan defaulted | Loan non-default | Total |
| ----------------- | ---- | ----- | ---- |
| Self Employed     | 80 | 240 | 320 |
| Employed in Private Sector | 120 | 860 | 980 |
| Employed in Government Sector | 200 | 3000 | 3200 |
| Total | 400 | 4100 | 4500 | 

#### Exercise 8

** In the above contingency table, what is the conditional probability that**
* a. What is the probability of Loan defaulted?
* b. What is the conditional probability of default, given the occupation category **Self Employed**?

##### e. Bayes Theorem probability

#### e.  Bayes' Theorem is used to revise previously calcualted probabilities based on new information

$P(B_{i}\mid A)$ = $\frac{P(A \mid B_{i}) P(B_{i})}{P(A \mid B_{1})P(B_{1}) + P(A \mid B_{2}) P(B_{2}) + P(A \mid B_{3}) P(B_{3}) + .. + P(A \mid B_{k}) P(B_{k})}$

where 
$B_{i}$ is the ith event of k mutually exclusive and collectively exhaustive events
A is the new event that might impact P($B_{i}$)

### Example 8

A certain Electronic equipment is manufactured by three companies, X, Y and Z. 
* 75% are manufactured by X
* 15% are manufactured by Y
* 10% are manufactured by Z

The defect rates of electronic equipement manufactured by companies X, Y and Z are 4%, 6% and 8%.

If an electronic equipment is randomly found to be defective, what is the probability that it is manufactured by X?

* Let P(X),P(Y) and P(Z) be probabilities of the electronic equipment manufactured by companies X, Y and Z respectively. 
* Let P(D) be the probability of defective electronic equipment.

We are interested in calculating the probability P(X|D).

P(X|D) = $\frac{P(D | X) P(X)} {P(D)}$

Bayes' rule in our case is given below:

$P(X \mid D)$ = $\frac{P(D \mid X) P(X)} {P(D \mid X)P(D) + P(D \mid Y) P(D) + P(D \mid Z) P(D)}$

In [11]:
# Let P1 = P(D|X)
# Let P2 = P(X)
# Let P3 = P(D∣X)P(D)+P(D∣Y)P(D)+P(D∣Z)P(D)
# Let P  = P(X|D) = (P1 * P2) / P3

P1       =  0.04 # prob. of defective item manufactured by X
P2       =  0.75
P3       =  0.75 * 0.04 + 0.15 * 0.06 + 0.10 * 0.08 
P        =  round((P1 * P2)/P3,4)
print('P(X|D)  is %1.4f' % P)              

P(X|D)  is 0.6383


### Example 9

Given the following statistics, what is the probability that a women has cancer if she has a positive mammogram result?

a) 	1% of over 50 have breast cancer.
b) 	90% of women who have breast cancer test positive on mammograms,
c) 	8% of women will have false positive

Let 
* event A   denote woman has breast cancer
* event ~A  denote woman has no breast cancer
* event T   denote mammogram test is positive 
* event ~T  denote mammogram test is negative 

Let P(A) denote the probability of women over 50 years of age having breast cancer.
P(A)      = 0.01 
So, P(~A) = 1 - 0.01 = 0.99

Let P(T|A) denote the conditional probability of women given postive result on mammograms and  having breast cancer .

P(T|A)   = 0.9

Let P(T|~A) denote the conditional probability of women given the positive result on mammograms and not having breast cancer .

P(T|~A)  = 0.08

$P(T \mid A)$ = $\frac{P(A \mid T) P(A)} {P(A \mid T)P(A) + P(~A \mid T) P(T)}$

In [12]:
## Let P = P(T∣A)
P = (0.9 * 0.01) / ((0.9 * 0.01) + (0.08 * 0.99)) 
print('The probability of a women having cancer, given a positive test result is %1.4f' % P) 

The probability of a women having cancer, given a positive test result is 0.1020


### Example 10

A1 Electronic World is considering marketing a new model of televisions. In the past, 40% of the new model televisions have been successful, and 60% have been unsuccessful. Before introducing the new model television, the marketing research department conducts an extensive study and releases a report, either favorble or unfavorable.

In the past, 80% of the successful new-model televisions had received favorable market reports, and 30% of the unsuccessful new-model televisions had received favorable reports. For the new model of television under consideration, the marketing research department has issued a favorable report.

**What is the probability that the television is successful given a favorable report?**

Let the following events be represented as follows:
* S  denote the successful television
* US denote the unsuccessful television
* F  denote the favorable report
* UF denote the unfavorable television
 
 The equation (Bayes' Theorem ) for this problem is:

$P(S\mid F)$ = $\frac{P(F\mid S)P(S)}{P(F\mid S)P(S)) + P(F\mid US)P(US))}$

**Prior Probability** - - Revealed by data in the past
* P(S)  = 0.40
* P(US) = 0.60

**Conditional Probability**

* $P(F\mid S)$   = 0.80    ** Favorable Report received when the new model is successful**
* $P(F\mid US)$  = 0.20    ** Favorable Report received when the new model is unsuccessful**

Using the equation given above,

$P(S\mid F)$ = $\frac{P(F\mid S)P(S)}{P(F\mid S)P(S)) + P(F\mid US)P(US))}$

$P(S\mid F)$ = $\frac{(0.80) (0.40)}{(0.80)(0.40) + (0.30)(0.60)}$

In [13]:
# Let P = P(S∣F)
P = (0.80 * 0.40)/ ((0.80 * 0.40) + (0.30 * 0.60))
print('The probability that the television is successful given a favorable report is %1.4f' % P)

The probability that the television is successful given a favorable report is 0.6400


#### Exercise 9

** A1 Construction company is determining whether it should submit a bid for  a new shopping mall. In the past, A1's main competitor, Pyramid Construction company has submitted bids 70% of the time. If  Pyramid Construction company does not bid on a job, the probability that A1 Construction Company will get the job is 0.50.** 

If the Pyramid Construction company bids on a job, the probability that A1 Construction company will get the job is 0.25.

* a. If A1 Construction company gets the job, what is the probability that Pyramid Construction company did not bid?

* b. What is the probability that A1 Construction company will get the job?

Hint: Use Bayes theorem

#### Exercice 10

You flip a coin and roll a six-sided dice, what is the probability that the coin comes up tail and the dice comes up with 6?**

#### End of chapter