# Inferential Statistics (Part II)

![Image](./images/ab_testing.JPG)

---

## Hypothesis Tests

Differences between A (sample, experimental group) and B (population, control group):

- __Null hypothesis:__ The hypothesis that chance is to blame.

- __Alternative hypothesis:__ Counterpoint to the null (what you hope to prove).

- __p-value:__ The probability that the result is due to chance. You may find a nice explanation [here](https://youtu.be/9jW9G8MO4PQ).

- __Alpha:__ The probability threshold of "unusualness" that chance results must surpass for actual outcomes to be deemed statistically significant.

- __Type 1 Error:__ Mistakenly concluding an effect is real (when it is due to chance).

- __Type 2 Error:__ Mistakenly concluding an effect is due to chance (when it is real).

![Image](./images/error_types.JPG)

--

Use cases:

- Changes in digital products

- Medical treatments

- Marketing actions

- Social policies

--

Statistics and parameters according to data type:

- __Numeric feature:__ mean, standard deviation, variance of the values.

- __Categorical feature:__ mean, standard deviation, variance of the percentages.

In [None]:
# imports

import numpy as np
import pandas as pd

from statsmodels.stats.weightstats import ztest
from scipy import stats

import seaborn as sns
import matplotlib.pylab as plt

In [None]:
house_prices = pd.read_csv('./datasets/house_prices.csv')
house_prices.info()

In [None]:
# House prices statistics

ax = house_prices['SalePrice'].plot.box(figsize=(5, 8))
ax.set_ylabel('House prices (USD)')
plt.tight_layout()
plt.grid()

In [None]:
# House prices distribution

ax = house_prices['SalePrice'].plot.hist(figsize=(12, 8))
ax.set_xlabel('House prices (USD)')
plt.tight_layout()
plt.grid()

#### Does house prices have a Normal distribution?

![Image](./images/normal_distribution.JPG)

In [None]:
# House prices sampling

house_sampling_distribution = pd.Series([house_prices['SalePrice'].sample(31).mean() for _ in range(1460)])
house_sampling_distribution

In [None]:
# House prices sampling distribution (checking that central limit theorem works!!!)

ax = house_sampling_distribution.plot.hist(figsize=(12, 8), color = "green")
ax.set_xlabel('House prices sampling means (USD)')
plt.tight_layout()
plt.grid()

In [None]:
# House sampling prices Z-Score

house_sampling_prices_z_score = stats.zscore(house_sampling_distribution)
ax = house_sampling_prices_z_score.plot.hist(figsize=(12, 8), color = "red")
ax.set_xlabel('House prices sampling Z-Scores')
plt.tight_layout()
plt.grid()

---

### [Z-Test](https://www.statsmodels.org/dev/generated/statsmodels.stats.weightstats.ztest.html)

When considering a __sample of data__, Z-score is used to measure the number of standard deviations by which the data points in the sample differ from the mean.

![Image](./images/z_score_formula.JPG)

When considering the __sampling distribution__, Z-score or Z-statistics is defined as the number of standard deviations between the sample mean and the population mean (mean of the sampling distribution).

![Image](./images/z_statistics_formula.JPG)

---

__Z-Test uses samples to draw conclusions about populations__. All Z tests assume your data follow a normal distribution. However, due to the central limit theorem, you can ignore this assumption when your sample is __large enough__.



---

### One sample Z-test

In [None]:
# Test parameters 

hypothesis_mean = 180000

sample_mean = house_prices['SalePrice'].mean()

alpha = 0.05

print(f'Hypothesis mean: {hypothesis_mean} USD',
      f'\nSample mean: {sample_mean} USD',
      f'\nProbability threshold: {alpha}')

#### Null and Alternative Hypothesis

Null Hypothesis (H0): Hypothesis mean = Sample mean

Alternative Hypothesis (H1): Hypothesis Mean is different from Sample mean

In [None]:
# Z-test (the probability that the sample mean is not within the 95% of the population mean)

Z_score, p_value = ztest(house_prices['SalePrice'], value=hypothesis_mean)

print(f'Z_score: {Z_score}', f'\np-value: {p_value}')

---

### Two samples Z-test (New vs. Old Houses)

In [None]:
# your-code




---