`Reference: `https://www.youtube.com/watch?v=kd6zKBa9Rfk&list=PLmmRdcVKtuR6XQnPjfX3OcWSVjsxU7gHm&index=2

In business there are so many decisions has to be made everyday, instead of making them by hunch, doing experiments and creating hypothesis is more data centric approach of taking decisions in business.

# Hypothesis Testing

![hypothesis_testing_flow_mean.png](attachment:hypothesis_testing_flow_mean.png)

- Hypothesis is a mechanism for taking decision in inferential statistics. It helps on proving or disapprove the claims which are being tested
- Hypothesis testing also helps to create structure of the problem statement
- The process provide conclusive evidence to take decisions based on the given output from this method


**Statistical Hypothesis:** When a researcher already has an idea about the outcome, even before doing the experiment. The Statistical hypothesis provides standard structural framework to work on the problem and make a data centric decision.

Statistical Hypothesis consists of 2 parts:
 
1. Null Hypothesis    $H_{o}$
2. Alternative Hypothesis  $H_{a}$

The Null hypothesis states 'NUll' condition exists, meaning the statement that has been said is happening, nothing new happening, old beliefs are true.

The Alternative hypothesis states, the new theory is true, or something new is happening, old beliefs are not true.

## Example:

Census of Height

<img src="../../images/ex_1_hypothesis.png" />

$H_{o} = 160$

$H_{a} \ne 160$

In general if any new thing is to be proposed, it is defined in Alternative hypothesis

Example of Fish farm 


<img src="../../images/ex_2_hypothesis.png" />

$H_{o} \underline< 2$

$H_{a}  > 2$

$H_{o} = 2$

$H_{a}  > 2$

## Tests of Statistical hypothesis

Based on few informations based on 2 examples the statistical hypothesis can be defined in 2 tests:
1. **Two tailed hypothesis:**  The statments are directionless
$$H_{o}: \space \space  \mu = 163$$
$$H_{a}: \space \space  \mu \ne 163$$

this process then needs further investigation about the direction of the test

2. **One tailed tests:**  The Hypothesis statements have  a direction
$$H_{o}: \space \space  \text{length} = 2$$
$$H_{a}: \space \space  \text{length} > 2$$

this tests are used only when the researcher is sure about the outcome of the test would be on greater than the pre learned info

If the Null Hypothesis is rejected and therefore Alternative hypothesis is accepted, it is said as statistically significant result has been obtained, in simpler words, the result obtained is not just luck by chance and the decision has been made to reject the NUll hypothesis.

In our example 2.1 is statistically significant higher than the 2 but for business that might not be the actual significant as a result, so caution has to be taken during interpreting the outcomes of statistical tests.

The outcome of the test depends on the sample in consideration, also whether a slight change is substantive outcome or not, would be completely depending on the use case or the researcher.

## Steps of performing test
- Step 1: Create the hypothesis (Null and Alternate Hypothesis)
- Step 2: Choose Appropriate statistical test
- Step 3: Set Alpha or Type I error for the experiment
- Step 4: Get data and samples
- Step 5: Analyze
- Step 6: State the decision

# Type l or Type ll Errors

- A Type l error is committed by **rejecting a true null hypothesis.**
- A Type II error is committed when a business researcher **fails to reject a false null hypothesis**

<img src="../../images/typ_1_typ_2_errors.png" />

Type l error ${\alpha}$

Type ll error ${\beta}$

alpha can only be committed when the null hypothesis is rejected and beta can only be committed when the null hypothesis is not rejected, a business researcher cannot commit both a Type I error and a Type II error at the same time on the same hypothesis test

beta occurs only when the null hypothesis is not true, the computation of beta varies
with the many possible alternative parameters that might occur.

## Example of Statistical method when population information is known

In [1]:
import pandas as pd
import numpy as np
import math

In [2]:
data = pd.read_csv('dataSetofHeights.csv')
data.head(2)

Unnamed: 0,Heights,Gender
0,160.377337,Male
1,177.637818,Male


In [3]:
data['Heights'].mean()

168.30635473929416

<font color=blue>Hence, population parameters (i.e. population mean) is known, we can use z-score</font>


$$z= \frac{\bar{x}-\mu}{\frac{\sigma}{\sqrt n}}$$

$$H_{o}: \space \space  \mu = 170$$
$$H_{a}: \space \space  \mu \ne 170$$


- step 1: create the hypothesis (Null and Alternate Hypothesis)
- Step 2: Appropriate statistical test
- step 3: let set $\alpha$ as .05  i.e Type l error
- step 4: Get data
- Step 5: Analyze 


### this is a 2 sided test

value of $z$ at .05 making it .025 for 2 sided we know from $z table$  $\underline{+}$1.96 

**Critical value**: 1.96

In [4]:
sampData = data['Heights'][np.argsort(np.random.random(1000))[:70]]

In [5]:
meanSampData = sampData.mean()
hypMean = 170 ## Null Hyothesis
N = 70
standPop = np.std(data['Heights'])

In [6]:
(meanSampData-hypMean)/(standPop/math.sqrt(N))

-2.3745351385960047

<img src="../../images/rej_nonrej.png" />

- as calculated z score -2.45 is less than -1.96 (tabular z score), **we reject the null hypothesis**

- if also we would have got +2.46 is greater than +1.96 **we would have rejected the Null hypothesis**

- Observed value = -2.45
- Critical value = -1.96

## We didn't made a Type l error, as the value is actually not = 170

# p-value  (observed significance level)

Another way to reach a statistical conclusion in hypothesis testing problems is by using the
p-value

The p-value defines the smallest probability ($\alpha$) for which the null hypothesis can be rejected. 

let say $\alpha = .05$

- if $p \underline{<} .05$  strong evidence against the $H_{o}$, hence reject the Null Hypothesis
- if $p > .05$  weak evidence against the $H_{o}$, hence fail to reject the Null Hypothesis
- if near to .05  not sure 

## Example

- let say observed z value is 2.45= .9946

- The p-value would be 1-.994 = .006

- using the above info we can reject the $H_{o}$ at    $\alpha=.05$
- but we will fail to reject the $H_{o}$ at    $\alpha=.001$

- as .006 > .001 but <.05

## t-test for mean estimation of population

$$t= \frac{\bar{x}-\mu}{\frac{\sigma}{\sqrt n}}$$

expected mean hence $\mu$
degree of freedom =N-1

In [7]:
import scipy.stats as st

In [8]:
#a : array_like sample observation
# popmean : float or array_like
#     expected value in null hypothesis. If array_like, then it must have the
#     same shape as `a` excluding the axis dimension
# axis : int or None, optional
#     Axis along which to compute test. If None, compute over the whole
#     array `a`.
# nan_policy : {'propagate', 'raise', 'omit'}, optional
#     Defines how to handle when input contains nan. 'propagate' returns nan,
#     'raise' throws an error, 'omit' performs the calculations ignoring nan
#     values. Default is 'propagate'.


st.ttest_1samp(sampData, 170)

Ttest_1sampResult(statistic=-2.3692778445921836, pvalue=0.020626221350734213)

In [9]:
.021 < .05 #We reject the null hypothesis

True

In [10]:
st.ttest_1samp(sampData,168)

Ttest_1sampResult(statistic=-0.6669782303186921, pvalue=0.5070104016245358)

In [11]:
0.51 < .05 #We fail to reject the null hypothesis

False

In [12]:
st.ttest_1samp(sampData,169)

Ttest_1sampResult(statistic=-1.5181280374554378, pvalue=0.13355092969007007)