### Effect Size
- Statistics hypothesis reports on likelihood of the oberseved results given an assumption such as no association between variables or difference between groups.
- Hypothesis tests do not comment on the size of the effect if the association or difference is statistically significant.
- Effect size for quantifying the association between variables, such as Pearson's correlation coefficient.
- Effect size measures for quantifying the difference between groups, such as Cohen's d measures.

##### Two main groups for calculating the effect size:
- <b>Association:</b> Statistical methods for quantifying an association between variables (eg: correation)
- <b>Difference:</b> Statistical methods for quantifying the difference between variables (eg: difference between means)

#####  Pearson’s correlation coeﬃcient
The Pearson’s correlation coeﬃcient measures the degree of linear association between two real-valued variables. It is a unit-free eﬀect size measure, that can be interpreted in a standard way, as follows:
- -1.0: Perfect negative relationship.
- -0.7: Strong negative relationship
- -0.5: Moderate negative relationship
- -0.3: Weak negative relationship
- 0.0: No relationship.
- 0.3: Weak positive relationship
- 0.5: Moderate positive relationship
- 0.7: Strong positive relationship
- 1.0: Perfect positive relationship.

In [2]:
from numpy.random import randn,seed
from scipy.stats import pearsonr

#Seed
seed(1)

In [3]:
#Prepare data
data1 = 10 * randn(1000) + 50
data2 = data1 + (10 * randn(1000) + 50)

In [7]:
#Calculate pearson's correlation
corr,_ = pearsonr(data1,data2)
print('Pearson Correlation %.3f'%(corr))

Pearson Correlation 0.698


We can see that the pearson correlation is positive between two features

#### Calculate Diﬀerence Eﬀect Size
- Cohen’s d measures the diﬀerence between the mean from two Gaussian-distributed variables.

| Effect Size | Cohen's d |
|-------------|-----------|
| Small       | 0.20      |
| Medium      | 0.50      |
| Large       | 0.80      |

##### Cohen's d formaula:

d = (µ₁ − µ₂) / s

s = sqrt(((n₁ − 1) * s₁² + (n₂ − 1) * s₂²) / (n₁ + n₂ − 2))

- There is no built in function for Cohen's formula, let's write the function

In [8]:
from numpy.random import randn,seed
from numpy import var,mean,sqrt

def cohend(d1,d2):
    #calculate the size of the samples
    n1,n2 = len(d1), len(d2)

    #calculate the variance of the samples
    s1,s2 = var(d1,ddof=1), var(d2,ddof=1)

    #calculate the pooled standard deviation
    s = sqrt(((n1-1)*s1 + (n2-1)*s2) / (n1 + n2 -2))

    #Calculate the means of the samples
    u1, u2 = mean(d1), mean(d2)

    #Calculate the effect size
    return (u1 - u2) / s

In [9]:
#Prepare data
data1 = 10 * randn(1000) + 60
data2 = 10 * randn(1000) + 50

print("Cohend:",cohend(data1,data2))

Cohend: 0.9832261114114446


#### Conclusion
- In this notebook, we have discussed Pearson-coefficient to find the similarilty between realtionships
- Cohend's d to find the difference between the mean of the samples or in other words difference between the relationship