# <div style="text-align: center"> Introduction to Python and Machine Learning

## <div style="text-align: center">Introduction to Statistics in Python - Stats II

---

We have conducted the exploratory analysis of our tips dataset. We will answer today the research question about the differences in average amount of tips of female and male waiters. 

About this dataset: https://www.kaggle.com/ranjeetjain3/seaborn-tips-dataset

In [None]:
import numpy as np
import pandas as pd
from datetime import datetime as dt

import os

import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
from scipy import stats 

In [None]:
dataset_names = sns.get_dataset_names()
print(dataset_names)

In [None]:
tips = sns.load_dataset('tips')

In [None]:
tips.head()

### Standard Error

It is a measure of how far the estimate to be off, on average. More technically, it is the standard deviation of the sampling distribution of a statistic (mostly the mean). Please do not confuse it with *standard deviation*. Standard deviation is a measure of the variability of the observed quantity. Standard error, on the other hand, describes variability of the estimate. 

To illustrate this, let's do the following.

In [None]:
print("Sample Mean:", tips.tip.mean(), "\n", "Sample Standard Deviation:", tips.tip.std())

In [None]:
sns.distplot(tips.tip)
plt.axvline(x=np.mean(tips.tip), color='red')

In [None]:
from scipy.stats import sem

In [None]:
sem(tips.tip, ddof=0)

Lets try to manually compute the same statistics.

In [None]:
print(np.std(tips.tip)/np.sqrt(len(tips.tip)))

---

### Confidence intervals

The most often you will see 95% confidence intervals (CI). What does they mean? The interpretation is simple - when we draw samples from the population and calculate the mean of each of them, 95% will be expected to be within the particular range. This range is almost two standard errors (1.96) above and two (1.96) below the **mean of those sample means** (1.96 because it gives us interval that cover 95% of population).

In [None]:
print("Sample Mean:", tips.tip.mean(), "\n", "Sample Standard Deviation:", tips.tip.std())

In [None]:
from scipy.stats import norm

In [None]:
#Confidence interval on the mean
norm.interval(0.95, loc=tips.tip.mean(), scale = tips.tip.std()/np.sqrt(len(tips)))

Lets try to manually calculate CI.

In [None]:
tips.tip.mean() + (1.96 * sem(tips.tip, ddof=1))

In [None]:
tips.tip.mean() - (1.96 * sem(tips.tip, ddof=1))

There is only a 5% chance that the range 2.82 and 3.17 excludes the mean of the population.

In [None]:
sns.pointplot(tips.tip) #pointplot visualises confidence intervals

Null hypothesis testing - approach which allows us to confirm or reject our predictions regarding reality.

# Hypothesis Testing


We would like to know if the effects we see in the sample (observed data) are likely to occur in the population. 

The way classical hypothesis testing works is by conducting a statistical test to answer the following question:
> Given the sample and an effect, what is the probability of seeing that effect just by chance?

Here are the steps on how we would do this

1. Define null and alternative hypothesis - something you want to test
<br> H0 - absence of the effect
<br>H1 - presence of the effect
2. Specify significance level (alpha = 0.05 typically)
2. Generate sample and compute test statistic
3. Compute p-value (probability of our result being false)
4. Compare p to alpha
5. Interpret the result (p-value lower than 0.05 - reject the null hypothesis about no effect)

If p-value is very low (most often than now, below 0.05), the effect is considered statistically significant. That means that effect is unlikely to have occured by chance. The inference? The effect is likely to be seen in the population too. 

This process is very similar to the *proof by contradiction* paradigm. We first assume that the effect is false. That's the null hypothesis. Next step is to compute the probability of obtaining that effect (the p-value). If p-value is very low(<0.05 as a rule of thumb), we reject the null hypothesis. 

## Question: Are tips significantly different for males in comparison to females?

### Independent samples t-test (between-subject).

**Null Hypothesis**: Mean tips aren't significantly different for males and females.
<br>**Alternative Hypothesis**: Mean tips are significantly different for males and females.

Perform **t-test** and determine the p-value. 

In [None]:
males_tips = tips[tips['sex']=='Male']
females_tips = tips[tips['sex']=='Female']

### Assumption of t-test

One assumption is that variances in two groups are equal which can be tested using [Levene test for equal variances](https://en.wikipedia.org/wiki/Levene%27s_test). If p-value is less than 0.05, then we cannot rejest hypothesis that variances are NOT equal.

H0: variances in groups are equal.
<br> H1: variances in groups are not equal.

In [None]:
stats.levene(males_tips['tip'], females_tips['tip'])

Another assumption is that the data used came from a normal distribution. 
<br>
There's a [Shapiro-Wilk test](https://en.wikipedia.org/wiki/Shapiro-Wilk) to test for normality. If p-value is less than 0.05, then there's a low chance that the distribution is normal.

H0: variable in population is normally distributed. 
<br> H1: variable in population is non-normally distributed.

In [None]:
stats.shapiro(males_tips['tip'])

In [None]:
stats.shapiro(females_tips['tip'])

In [None]:
print(stats.shapiro(males_tips['tip']), stats.shapiro(females_tips['tip'])) #to write it in one line

Distribution is different from normal. T-test can be performed only if your sample size is big.

In [None]:
stats.ttest_ind(males_tips['tip'], females_tips['tip'], equal_var=True)

P-value is the probability that the effect obtrained was by chance. And here, p-value is above 0 - not-significant difference between males and females in the amount of tips.

But as the assumption of normality was not met - better to calculate nonparametric test which does not have this assumption.

In [None]:
from scipy.stats import mannwhitneyu

In [None]:
mannwhitneyu(males_tips['tip'], females_tips['tip'])

Both tests show that there is no difference between the males and females in the amount of tips.

In [None]:
plt.figure(figsize=(15,5))
sns.distplot(females_tips['tip'])

sns.distplot(males_tips['tip'])

Write the interpretation of the result here:
<br>
t test (statistic) value = 
<br>
p = 
<br>

This means that ...

For significant results calculate also effect size - standardized measure of strength of effect.

In [None]:
import numpy
def CohenEffectSize(group1, group2):
    """Compute Cohen's d.

    group1: Series or NumPy array
    group2: Series or NumPy array

    returns: float
    """
    diff = group1.mean() - group2.mean()

    n1, n2 = len(group1), len(group2)
    var1 = group1.var()
    var2 = group2.var()

    pooled_var = (n1 * var1 + n2 * var2) / (n1 + n2)
    d = diff / numpy.sqrt(pooled_var)
    return d

In [None]:
CohenEffectSize(females_tips['tip'], males_tips['tip'])

### Paired-samples t-test - for dependent groups (within-subject effects).

In case of this analysis, groups should either contain same participants or some meaningfully paired samples. 

Example can be measuring an effectiveness of new drug on one group using pretest and posttest measurements. We compare how people were feeling after receiving this new drug in comparison to before administing this drug.

Download the data from [Kaggle](https://www.kaggle.com/kwadwoofosu/predict-test-scores-of-students). This dataset contains information about students results on writing test completed before and after the peer assesment intervention. 

In [None]:
from scipy import stats
import scipy as sp
import numpy as np
from matplotlib import pyplot as plt
%matplotlib inline
import pandas as pd
import seaborn as sns
import statistics

In [None]:
grades = pd.read_csv (r'/Users/akovbasiuk/Desktop/SPINAKER/Class 7/test_scores.csv')

In [None]:
grades.head()

## Question: Was the intervention effective? Have students received different points in pretest and posttest measurements?

<div class="alert alert-block alert-success">
⚠️TASK 1 (1 min)
<br>

In order to conduct related samples t-test length of your two features should be identical. Check if the length of `pretest` is the same as `posttest`.
    
</div>

<div class="alert alert-block alert-success">
⚠️TASK 2 (4 min)
<br>

Conduct related samples testing using Students t test. Assumptions are: normality of distribution in both samples, significant sample size and related samples.

</div>

<div class="alert alert-block alert-success">
⚠️TASK 3 (1 min)
<br>

Was the intervention effective? Compare means to find out.

</div>

Write the interpretation of the result here:
<br>
t test (statistic) value = 
<br>
p = 
<br>

This means that ...

<div class="alert alert-block alert-success">
⚠️TASK 4 (5 min)
<br>

Now think about which test should be performed to check if there were differences in students post-test grades depending on `teaching methods`. At first check unique values of variable `teaching method`, assumptions and conduct proper analysis.
If the results were significant, check which mean was higher and report effect size. Visualize results using distribution plot and pointplot. Try to visualize individual points together with pointplot. 
    
</div>

#### Consequences of broken assumptions
T-tests are tolerant for data not meeting normality assumption, but your sample size should be large and have equal counts in groups. Non-parametric tests are more conservative, which means that its more difficult to detect significant effect using them.

General idea: first try to conduct parametric test and if assumptions are broken - conduct non-parametric alternative.

Now we know how to:
* explore relationship between variables (correlation) (two continuous variables)
* test normality of distribution and homogeneity of variance
* compare two repeated samples (two continuous variables)
* compare two independent groups (continuous DV and binary IV)

Which test to use if we want to understand the relationship between two categorical variables?

### Chi-square test of independency of variables

H0: categorical variables are independent
<br>H1: categorical variables are dependent

In [None]:
from scipy.stats import chi2_contingency

<div class="alert alert-block alert-success">
⚠️TASK 4 (5 min)
<br>

Check unique values in `school type` and `school setting` and create crosstab/contingency table using pandas library. 
    
</div>

Now lets calculate Chi-square test. 

Returns: 
* chi2float - the test statistic.
* pfloat - p-value of the test
* dofint - degrees of freedom
* expectedndarray - same shape as observed in contingency table.

Is there a dependency between our variables?

## What to do if you want to compare more groups

Analysis of variance and its various types are not in the scope of this course. You can read about them more [here](https://en.wikipedia.org/wiki/Analysis_of_variance)

## What to study next

Read about the [interaction](https://en.wikipedia.org/wiki/Interaction_(statistics) which better represents the complexity of our world than just simple relationships between variables.