* ### [Permutations and Combinations](#Permutations_and_Combinations)
* ### [Conditional probability and Bayes Theorem](#bayes)
* ### [Distributions](#distributions)
* ### [Binomial distribution](#binomial)
* ### [Poisson distribution](#poisson)
* ### [Normal distribution](#normal)
* ### [Student's T-Distribution](#tstudent)
* ### [Linear Regression](#linearregression)
* ### [Multiple Regression](#multipleregression)
* ### [Chi Square Test](#chisquare)


<a id='Permutations_and_Combinations'></a>
### Permutations and Combinations

</div> 
    <div align="left"> 
        <img src="permutations_combinations.png" alt="Permutations_and_Combinations" style="width: 600px"/>  
    </div>

**[Permutations and Combinations with Python: Itertools](https://docs.python.org/3.1/library/itertools.html)**


In [1]:
import itertools 
l = list(itertools.permutations(range(1, 4))) 
print(l) 

[(1, 2, 3), (1, 3, 2), (2, 1, 3), (2, 3, 1), (3, 1, 2), (3, 2, 1)]


<a id='bayes'></a>
### Conditional probability
The idea that we want to know the probability of event A, given that event B has occurred, is conditional probability. This is written as **𝑃(𝐴|𝐵)**

</div> 
    <div align="left"> 
        <img src="conditionalprobaility.png" alt="Conditional probability" style="width: 300px; margin:0px 50px; border:1px solid black"/>
</div>

*Ex. A company finds that out of every 100 projects, 48 are completed on time, 62 are completed under budget, and 16 are completed both on time and under budget.Given that a project is completed on time, what is the probability that it is under budget?*
</div> 
    <div align="left"> 
        <img src="conditionalprobailityexercise.png" alt="Conditional probability Exercise" style="width: 300px;"/>
</div>


##### [Addition & Multiplication Rules](#https://www.mathgoodies.com/lessons/vol6/dependent_events)

* ###### Addition 
When two events, A and B, are mutually exclusive, the probability that **A or B** will occur is the sum of the probability of each event.\
P(A or B) = P(A) + P(B)\
When two events, A and B, are non-mutually exclusive, the probability that **A or B** will occur is:\
P(A or B) = P(A) + P(B) - P(A and B)

* ###### Multiplication
When two events, A and B, are independent, the probability of **both occurring** is:\
P(A and B) = P(A) · P(B)\
When two events, A and B, are dependent, the probability of **both occurring** is:\
P(A and B)  =  P(A) · P(B|A)

### Bayes Theorem
</div> 
    <div align="left">    
        <img src="bayes.png" alt="Bayes Theorem" style="width: 300px; margin:0px 50px; border:1px solid black"/> 
</div>

*Ex.: A company learns that 1 out of 500 of their products are defective, or 0.2%. The company buys a diagnostic tool that correctly identifies a defective part 99% of the time. If a part is diagnosed as defective, what is the probability that it really is defective?\
𝑃(𝐴|𝐵) probability of being defective if testing positive\
𝑃(𝐵|𝐴) probability of testing positive if defective\
𝑃(𝐴) probability of being defective\
𝑃(𝐵) probability of testing positive*
</div> 
    <div align="left">    
        <img src="bayesex1.png" alt="Bayes Exercise" style="width: 300px; margin:0px 50px; border:1px solid black"/>
        <img src="bayesex2.png" alt="Bayes Exercise" style="width: 300px; margin:0px 50px; border:1px solid black"/>
        <img src="bayesex3.png" alt="Bayes Exercise" style="width: 300px; margin:0px 50px; border:1px solid black"/>
        <img src="bayesex4.png" alt="Bayes Exercise" style="width: 300px; margin:0px 50px; border:1px solid black"/>
</div>

<a id='distributions'></a>
### Distributions

* **A distribution describes all of the probable outcomes of a variable**
* In a discrete distribution, the sum of all the individual probabilities must equal 1
* In a continuous distribution, the area under the probability curve equals 1  

Discrete probability distributions are also **called probability mass functions**: Uniform Distribution, Binomial Distribution, Poisson Distribution.  

Continuous probability distributions are also called **probability density functions**: Normal Distribution, Exponential Distribution, Beta Distribution

<a id='binomial'></a>
### Binomial distribution
Binomial means there are **two** discrete,mutually exclusive outcomes of a trial.  
A series of trials n will follow a binary distribution so long as:
1. the probability of success *p* is constant
2. trials are independent of one another

*Ex.: If you roll a die 16 times, what is the probability that a five comes up 3 times?*

In [2]:
from scipy.stats import binom

binom.pmf(3,16,1/6)

0.2423137603371325

<a id='poisson'></a>
### Poisson distribution
While a binomial distribution considers the number of successes out of n trials a **Poisson Distribution** considers the number of successes per unit of time (or any other continuous unit, e.g. distance) over the course of many units.  
The **cumulative mass function** is simply the sum of all the discrete probabilities.


*Ex.: A warehouse **typically** receives **8** deliveries between 4 and 5pm on Friday.  
What is the probability that only 4 deliveries will arrive between 4 and 5pm this Friday?*

In [3]:
from scipy.stats import poisson

poisson.pmf(4,8)

0.057252288495362

*Ex.:  A warehouse **typically** receives **8** deliveries between 4 and 5pm on Friday.  
What is the probability that **fewer than 3** will arrive between 4 and 5pm this Friday?*

In [4]:
from scipy.stats import poisson

poisson.cdf(2,8)

0.013753967744002971

*Ex.:  A warehouse **typically** receives **8** deliveries between 4 and 5pm on Friday.  
What is the probability that **no deliveries** arrive between 4:00 and 4:05 this Friday?*

In [5]:
from scipy.stats import poisson

poisson.pmf(0,8/12)

0.513417119032592

<a id='normal'></a>
### Normal distribution
The probability of a specific outcome is zero.
We can only find probabilities *over a specified interval or range of outcomes*.

All normal curves exhibit the same behavior:
* symmetry about the mean;
* 68.27%, 95.45%, 99.73% of values fall within respectively **one, two and three standard deviations**;
* however, the mean *does not* have to be zero, and σ *does not* have to equal one.

We can take any normal distribution and standardize it to a **standard normal distribution** (Z distribution). A standard normal distribution is a normal distribution that have the mean equal to zero, and σ equal to one.  
We'll be able to take any value from a normal distribution and standardize it through a Z score.  
Using this Z Score, we can then calculate a particular x value's percentile. 
So, if we can model our data as a normal distribution, we can convert the values in the normal distribution to a standard normal distribution to calculate a percentile.  
[Funny explanation](http://www.z-table.com/)

In [6]:
from scipy import stats

z = .7
stats.norm.cdf(z) # Return the corresponding percentile

0.758036347776927

In [7]:
from scipy import stats

p = .95
stats.norm.ppf(p) #ppf: Percent Point Function (Inverse of CDF)

1.6448536269514722

<a id='tstudent'></a>
### Student's T-Distribution
Using the t table, the Student’s t test determines if there is a significant difference between two sets of data.
Due to variance and outliers, it’s not enough just to compare mean values. **A t test also considers sample variances**.  

Types of Student’s t-test:
* **One sample t test**: tests the null hypothesis that the population mean is equal to a specified value 𝜇 based on a sample mean
* **Independent two sample t test**: tests the null hypothesis that two sample means 𝑥1 and 𝑥2 are equal
* **Dependent, paired sample t test**: used when the samples are dependent:
 * one sample has been tested twice (repeated)
 * two samples have been matched or "paired"

In the case of **independent two-sample** t test the calculation of the t-statistic differs slightly for the following scenarios:
* equal sample sizes, equal variance
* unequal sample sizes, equal variance
* equal or unequal sample sizes, unequal variance (the most common)

Calculate the t statistic:  
𝑡 = 𝑠𝑖𝑔𝑛𝑎𝑙 / 𝑛𝑜𝑖𝑠𝑒 = 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒 𝑖𝑛 𝑚𝑒𝑎𝑛𝑠 / 𝑠𝑎𝑚𝑝𝑙𝑒 𝑣𝑎𝑟𝑖𝑎𝑏𝑖𝑙𝑖𝑡𝑦 = 𝑥1−𝑥2 / sqrt((𝑠1)2 / 𝑛1 + (𝑠2)2 / 𝑛2),  
where: 𝑥1,𝑥2 = sample means,  (𝑠1)2,(𝑠2)2 = sample variances,  𝑛1,𝑛2= sample sizes

In [8]:
from scipy.stats import ttest_ind

In [9]:
a = [40,36,42,36,35,35,41,43,34]
b = [43,41,44,39,37,35,44,46,40]

In [10]:
#Calculate the T-test for the means of two independent samples of scores.
#https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_ind.html

ttest_ind(a,b).statistic

-1.7999999999999998

In [11]:
ttest_ind(a,b).pvalue/2 # default is 2 tail so I divide the result in case I need to compare with one tail

0.04537161736045921

##### [T-test using Python and Numpy](https://towardsdatascience.com/inferential-statistics-series-t-test-using-numpy-2718f8f9bf2f)

In [12]:
## Import the packages
import numpy as np
from scipy import stats


## Define 2 random distributions
#Sample Size
N = 10
#Gaussian distributed data with mean = 2 and var = 1
a = np.random.randn(N) + 2
#Gaussian distributed data with with mean = 0 and var = 1
b = np.random.randn(N)


## Calculate the Standard Deviation
#Calculate the variance to get the standard deviation

#For unbiased max likelihood estimate we have to divide the var by N-1, and therefore the parameter ddof = 1
var_a = a.var(ddof=1)
var_b = b.var(ddof=1)

#std deviation
s = np.sqrt((var_a + var_b)/2)
s



## Calculate the t-statistics
t = (a.mean() - b.mean())/(s*np.sqrt(2/N))



## Compare with the critical t-value
#Degrees of freedom
df = 2*N - 2

#p-value after comparison with the t 
p = 1 - stats.t.cdf(t,df=df)


print("t = " + str(t))
print("p = " + str(2*p))
### You can see that after comparing the t statistic with the critical t value (computed internally) we get a good p value of 0.0005 and thus we reject the null hypothesis and thus it proves that the mean of the two distributions are different and statistically significant.


## Cross Checking with the internal scipy function
t2, p2 = stats.ttest_ind(a,b)
print("t = " + str(t2))
print("p = " + str(p2))

t = 6.042142612547705
p = 1.0339249661095451e-05
t = 6.042142612547705
p = 1.0339249661063279e-05


##### [How to Code the Student’s t-Test from Scratch in Python](https://machinelearningmastery.com/how-to-code-the-students-t-test-from-scratch-in-python/)

In [13]:
#https://machinelearningmastery.com/how-to-code-the-students-t-test-from-scratch-in-python/

from math import sqrt
from numpy.random import seed
from numpy.random import randn
from numpy import mean
from scipy.stats import sem
from scipy.stats import t
 
# function for calculating the t-test for two independent samples
def independent_ttest(data1, data2, alpha):
    
    # calculate means
    mean1, mean2 = mean(data1), mean(data2)
    
    # calculate standard errors
    se1, se2 = sem(data1), sem(data2)
    
    # standard error on the difference between the samples
    sed = sqrt(se1**2.0 + se2**2.0)
    
    # calculate the t statistic
    t_stat = (mean1 - mean2) / sed
    
    # degrees of freedom
    df = len(data1) + len(data2) - 2
    
    # calculate the critical value
    cv = t.ppf(1.0 - alpha, df)
    
    # calculate the p-value
    p = (1.0 - t.cdf(abs(t_stat), df)) * 2.0
    
    # return everything
    return t_stat, df, cv, p
 
# seed the random number generator
seed(42)

'''# generate two independent samples
data1 = 5 * randn(100) + 50
data2 = 5 * randn(100) + 51'''
data1 = [40,36,42,36,35,35,41,43,34]
data2 = [43,41,44,39,37,35,44,46,40]

# calculate the t test
alpha = 0.05
t_stat, df, cv, p = independent_ttest(data1, data2, alpha)
print('t=%.3f, df=%d, cv=%.3f, p=%.3f' % (t_stat, df, cv, p))

# interpret via critical value
if abs(t_stat) <= cv:
    print('Accept null hypothesis that the means are equal.')
else:
    print('Reject the null hypothesis that the means are equal.')

    # interpret via p-value
if p > alpha:
    print('Accept null hypothesis that the means are equal.')
else:
    print('Reject the null hypothesis that the means are equal.')

t=-1.800, df=16, cv=1.746, p=0.091
Reject the null hypothesis that the means are equal.
Accept null hypothesis that the means are equal.


<a id='anova'></a>
### ANOVA - Analysis of Variance
In this section we introduce a new, distribution the **F Distribution** used to answer the questions: 
* *What is the probability that two samples come from populations that have the same variance?*
* *What is the probability that three or more samples come from the same population?*  

We compute an F value , and compare it to a critical value determined by our degrees of freedom (the number of groups, and the number of items in each group).  
ANOVA considers two types of variance:
+ **Between Groups**: how far group means stray
from the total mean. 
+ **Within Groups**: how far individual values stray from their respective group mean.  

The F value we’re trying to calculate is simply the *ratio* between these two variances:  
𝐹=𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒𝐵𝑒𝑡𝑤𝑒𝑒𝑛𝐺𝑟𝑜𝑢𝑝𝑠 / 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒𝑊𝑖𝑡ℎ𝑖𝑛𝐺𝑟𝑜𝑢𝑝𝑠
 = (𝑆𝑆𝐺 / 𝑑𝑓𝑔𝑟𝑜𝑢𝑝𝑠) / (𝑆𝑆𝐸 / 𝑑𝑓𝑒𝑟𝑟𝑜𝑟)

+ SSG = Sum of Squares Groups 
+ 𝑑𝑓𝑔𝑟𝑜𝑢𝑝𝑠 = degrees of freedom (groups)
+ SSE = Sum of Squares Error 
+ 𝑑𝑓𝑒𝑟𝑟𝑜𝑟= degrees of freedom (errors)

**Two-Way ANOVA** lets us test two independent variables *at the same time*.

In [14]:
from scipy import stats

df_groups=2 #degrees of freedom numerator (groups)
df_error=27 #degrees of freedom denominator (groups)

stats.f.ppf(1-.05, dfn=df_groups, dfd=df_error)

3.3541308285291986

<a id='linearregression'></a>
### Linear Regression
The goal of regression is to develop an equation or formula that best describes the relationship between variables.  
Recall that the equation of a line follows the form 𝑦=𝑚𝑥+𝑏   
In a linear regression the **equation that is used to describe the relationship between variables** is that of a line **𝑦=𝑏0+𝑏1𝑥** where *b0* is the slope of the line, and *𝑏1* is where the line crosses the y axis. Our goal is to predict the value of dependent variable (y) based on that of independent variable (x). 

How to derive 𝑏1 and 𝑏0?  
𝑏1 = 𝜌𝑥𝑦 * (𝜎𝑦 / 𝜎𝑥)
𝑏1= sommatoria((𝑥𝑖−x)*(yi-y)) / sommatoria((𝑥𝑖−x)2),  
where 𝜌𝑥𝑦 = 𝑃𝑒𝑎𝑟𝑠𝑜𝑛𝐶𝑜𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛Coefficient and 𝜎𝑥,𝜎𝑦 = 𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛𝑠  
𝑏0 =𝑦 − 𝑏1𝑥

**Limitations of Linear**  
Regression Anscombe’s Quartet illustrates the pitfalls of relying on pure calculation.
Each graph results in the same calculated regression line.



In [15]:
from scipy.stats import linregress

x = [34, 35, 39, 42, 43, 47]
y = [102, 109, 137, 148, 150, 158]

slope = round(linregress(x,y).slope, 1)

intercept = round(linregress(x,y).intercept, 1)

print( f' y = {intercept} + {slope}x')

 y = -46.0 + 4.5x


<a id='multipleregression'></a>
### Multiple Regression
Multiple regression lets us compare several independent variables to one dependent variable at the same time.  
Multiple regression: 
𝑦= 𝑏0 + 𝑏1𝑥1 +𝑏2𝑥2 + ⋯  
The general formula is expanded:
* 𝑏1 is the coefficient on 𝑥1
* 𝑏1 reflects the change in 𝑦 for a given change in 𝑥1, all else remaining constant

However, some predictors might correlate with each other. This is called **multicollinearity**.  
Once we’ve chosen our variables x 1 and x 2, we’ll usually test for multicollinearity. We want to know if our two independent variables are closely related to each other If they are, it makes sense to discard one!

In [16]:
from sklearn.linear_model import LinearRegression

x1 ,x2 = [1,3,2,3,1], [8,4,9,6,3]
y = [29,31,36,35,19]

reg = LinearRegression()

reg.fit(list(zip(x1,x2)), y) 

b1,b2 = reg.coef_[0], reg.coef_[1]

b0 = reg.intercept_

print( f' y = {b0:.{3}} + {b1:.{3}}x1 + {b2:.{3}}x2')

 y = 8.0 + 5.0x1 + 2.0x2


<a id='chisquare'></a>
### Chi square test
A Chi square Test (also written 𝜒2) is used to determine the probability of an observed frequency of events given an expected frequency.  
The key concept is to find the **goodness of fit**.  
For example, if we flip a coin 18 times and observe that it comes up heads 12 times, can we say that this is due to chance, or do we assume that our coin is biased?  

The chi square formula considers the sum of square distances between observed values O and expected values E, divided by each expected value:
𝜒2 = sommatoria((𝑂−𝐸)2 / 𝐸)  

**When not to use Chi Square**  
Chi square statistics don’t perform well for small expected frequencies. Each cell should have a value greater than 5.

In [17]:
from scipy.stats import chi2

chi2.isf(0.05,5)

11.070497693516355