# Brief Note on Hypothesis Testing, P-value, and Confidence Interval

When using machine learning, we need to be able to trust our models and the predictions they make. We may use sample data to train our models. This sample data may make certain assumptions about a population.

A hypothesis can be described as a theory or argument that explains some observed phenomenon. In a scientific setting, a hypothesis is meant to be proven (or disproven) through experimentation.

In data science, a crucial part of the modeling process is first coming up with an important question or assumption. For example, we can say something like “different cars use the communal parking lot every day.” The framing of this statement makes it seem like we are declaring it to be true.



However, using a statistical approach, it might be better to frame it as “the cars that use the parking lot are the same.” We have framed the assumption as a null hypothesis, which we shall seek to disprove. The first statement can be thought of as an alternative hypothesis. We shall define these two types of hypotheses later on.



The significance of framing the two statements as we did above is to help eliminate randomness. We can compare it to the phrase “innocent until proven guilty” since we seek to disprove the null hypothesis and validate the alternative hypothesis.



When these two hypotheses are tested, we seek to prove that the data we used is statistically significant. This means that occurrences were not by chance alone. We shall define statistical significance later in this article.



**Steps to test a hypothesis**


A hypothesis test evaluates two statements about a population. The statements are mutually exclusive. The test concludes which statement best reflects the sample data. A hypothesis test helps us determine the statistical significance of a finding.



We say a finding is statistically significant when its likelihood of occurrence is very low, given the null hypothesis. This section describes the steps to test a hypothesis as we define the concepts involved in the testing process.

Scientists carry out experiments to retain or reject a null hypothesis based upon the nature of (or lack of) the relationship between occurrences. A null hypothesis is usually considered to be true until proven otherwise. ($H_0$)

On the other hand, an alternative hypothesis results from the experiment that we hope to show. We want the alternative hypothesis to be true. It is the hypothesis that is the alternate of the null hypothesis. The image below shall aid in the understanding of these two types of hypotheses. ($H_A$)

**Set a significance level**

After forming our null and alternative hypotheses, we should select a significance level. This is the measure of the influence of the evidence that needs to be available in a sample before rejecting the null hypothesis. The significance level is usually 5%. It means that it is probable that the test may suffer a type I error.

![type1](assets/alpha.png)

Since the significance level is 5%, our level of confidence becomes 95%. This means that 95% of hypothesis tests won’t end in a type I error. You may ask why 5% and not any other value is commonly chosen. It simply is standard practice to use 5%. We mentioned a type I error above.

Let’s define what type I and II errors are.

**Type I error**. This is an error characterized by a scenario where we reject a true null hypothesis. The symbol alpha represents it.

**Type II error**. We can define a type II error in a situation where we retain a null hypothesis, but it is false. It is denoted by beta.

**p-value**

In null hypothesis significance testing, the p-value is the probability of obtaining test results at least as extreme as the results actually observed, under the assumption that the null hypothesis is correct.

To know whether to keep or reject the null hypothesis, we can compare our significance level to the p-value. Let’s assume our significance level is 5% (or 0.05). The smaller the p-value, the greater the evidence is favoring the alternative hypothesis.

If the p-value is less than the significance level we selected, we then reject the null hypothesis. This means that if the p-value is less than our 0.05 significance level, we accept that the sample we used supports the alternative hypothesis.

**Confidence Interval**

Statistical hypothesis testing uses data to decide whether a certain state- ment called the null hypothesis is true. The negation of the null hypothesis is called the alternative hypothesis.

If Y is the mean of a sample from a normal population, then

\begin{equation}\overline{Y} + t_{\alpha/2, n-1}s_{\overline{Y}}\end{equation}

\begin{equation}\overline{Y} - t_{\alpha/2, n-1}s_{\overline{Y}}\end{equation}

is a confidence interval with (1 − α) confidence. Where $s_{\overline{Y}}$ $ = s_Y/\sqrt{n}$

Confidence interval = sample mean ± margin of error


Suppose we have a sample of size 25 from a normal distribution, $s^{2}$ = 2.7, Y = 16.1, and we want a 99% confidence interval for $\mu$. First find the critical value. Then, calculate the confidence interval.

\begin{equation}16.1+/-\frac{(2.797)\sqrt{2.7}}{\sqrt{25}}\end{equation}

In [19]:
import numpy as np
error_margin = ((2.797)*np.sqrt(2.7))/np.sqrt(25)
error_margin

0.9191879960051699

In [18]:
upper_bound = 16.1 + error_margin
lower_bound = 16.1 - error_margin
print(upper_bound, '\n')
print(lower_bound, '\n')

17.019187996005172 

15.18081200399483 

