<a href="https://colab.research.google.com/github/rosslogan702/hypothesis_testing_notes/blob/master/one_sample_t_tests.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Hypothesis Testing - One Sample T-Test

# Contents

The focus of this notebook is One Sample T-Tests. 

The notebook will cover the following:


1.   Description
2.   General Hypothesis Testing Approach
3.   Manual Calculation
4.   Practical Examples using SciPy library
5.   Assumptions




# 1. Description

A One Sample T-Test (or univariate T-test) compares a sample mean to a hypothetical population mean.

The One Sample T-Test is concerned with the question:

"*What is the probability that the sample of interest came from a distribution with the desired mean?*"

# 2. General Hypothesis Testing Approach

When conducting a hypothesis test on data there are a number of steps that are required to be performed. These steps are defined below.

## Step 1 - Define a Null Hypothesis

The null hypothesis is a prediction that the outcome of the test will not be statistically significant.

For example, if we want to test if a sample belongs to a population with a specific mean then our null hypothesis will be that the sample does.

It is then the responsibility of the One Sample T-Test to determine if this is probable or not.

For this notebook we are going to assume a significance level of 0.05. 

## Step 2 - Prepare Data & Run Test

The second step is now to prepare the data and perform the relevant hypothesis test. In our case for this notebook that means performing the One Sample T-Test. This will mean collecting our population and the sample that we want to test.

Remember the question that a One Sample T-Test is looking to answer is:

"***What is the probability that the sample of interest came from a distribution with the desired mean?***"

The next step is now to actually run the hypothesis test. In this notebook we are going to do this by using the SciPy library.

## Step 3 - Collect & Analyse Results

Once the hypothesis test has been run, we now need to collect and analyse the results.

The parameter returned from the hypothesis test that we are interested in is the p-value.

This is what tells us whether or not we can reject the null hypothesis. 

In this notebook and the following ones, we are going to assume that if the p-value is less than 0.05 then we can reject the null hypothesis & favour the alternative hypothesis.

Another way of writing this, is that we would say there is a statistically significant difference between the sample and distribution.

# 3. Manual Calculation

This section is going to show how to perform the One Sample T-Test manually. This is based on an example. [2]

Question: A company wants to improve its sales. Historical sales data indicates that the average value of each sale was $100 per transaction. 

After training the sales team & collecting some recent sales data (25 data sample from the sales team), the average after training indicates an average sale value of 130 dollars with a standard deviation of 15 dollars.

Did the training work?

The hypothesis should be tested at a 5% alpha level.

## Step 1 - Define Null Hypothesis

The null hypothesis is that there is no difference in sales. 

Therefore here the null hypothesis is that the population mean is 100 dollars.

## Step 2 - Define Alternative Hypothesis

The alternative hypothesis is that there is a difference (in this case the mean value of sales increased).

Therefore the alternative hypothesis is that the population mean is now greater than 100 dollars.



## Step 3 - Identify data need to calculate the test statistic

The sample question should contain this information or give you data to be able to calculate these. 



*   The sample mean (x bar) - 130 dollars
*   The population mean (mu) - 100 dollars
*   The sample standard deviation (s) - 15 dollars
*   The number of observations (n) - 25




## Step 4 - Calculate the t-value using the t-score formula

t = (x bar - mu)/(s/sqrt(n))  
t = (130 - 100)/((15/sqrt(25)))  
t = 30/3 = 10  

The calculated t-value is 10


## Step 5 - Find the t-table value

To find the t-table value you need:



*   The alpha level: given as 5% in the question
*   The degrees of freedom - which is the number of items in the sample (n) minus 1(in this case 24)

Looking up 24 degrees of freedom in the left column and the corresponding 0.05 in the top row gives a value of 1.711. 

This value means that we would expect most values to fall under 1.711. If our calculated t-value falls within this range, the null hypothesis is likely true.





## Step 6 - Compare the calculated t-value to the critical value from t-table

In this case the value from step 4 does not fall into the range calculated in step 5, so we reject the null hypothesis and favour the alternative hypothesis. 

The value 10 falls into the rejection region. 

The meaning of this is that the mean sale is highly likely to be greater and that the training was a success.

# 4. Practical Examples using SciPy library

Now that we have covered what a One Sample T-Test is we are going to look at some practical examples of where we can use this type of test.

## Example - 1 - Mobile Phone Handset Battery Life

A mobile phone manufacturer claims that their new Model X handset averages 31 hours of battery life before it is required to be charged. 8 Model X handsets are chosen at random and tested for their hours of battery life until they run out. All handsets are tested under the same conditions.

We get the following results from the tests for 8 models:

  Hours of Battery Life: 30	28	32	26	33	25	28	30

Does the hours of battery life for these handsets deviate significantly from 31 hours?

## Step 1 - Define Null Hypothesis

The null hypothesis for this example is that the sample of 8 battery life hours belongs to a population with a target mean of 31 hours.

## Step 2 - Prepare Data & Run Test

In [0]:
from scipy.stats import ttest_1samp
import numpy as np

In [0]:
# Place sample data into numpy array
life_hours = np.array([30, 28, 32, 26, 33, 25, 28, 30])
# Calculate mean
life_hours_mean = np.mean(life_hours)

In [0]:
print("Mean Life Hours : {:.2f}".format(life_hours_mean))

Mean Life Hours : 29.00


In [0]:
# Run test using scipy lib
ttest, pval = ttest_1samp(life_hours, 31)

## Step 3 - Collect & Analyse Results

In [0]:
print("pval: {:.2f}".format(pval))
if pval < 0.05:
  print("Result is statistically significant! The battery life for these handsets deviates significantly from 31 hours")
else:
  print("Result is not statistically signifcant! The battery life for these handsets does not deviate significantly from 31 hours")

pval: 0.08
Result is not statistically signifcant! The battery life for these handsets does not deviate significantly from 31 hours


## Example - 2 - Spearmint Gum Thickness

Example used from: https://onlinecourses.science.psu.edu/statprogram/reviews/statistical-concepts/hypothesis-testing/examples

A manufacturer claims that the thickness of the spearmint gum it produces is 7.5 one-hundredths of an inch. A quality control specialist regularly checks this claim. On one production run, he took a random sample of n = 10 pieces of gum and measured their thickness. He obtained:

7.65, 7.60, 7.65, 7.70, 7.55, 7.55, 7.40, 7.40, 7.50, 7.50

Determine if the sample of measured thicknesses differs significantly from the claim that the thickness of the spearmint gum it produces is 7.5 one-hundredths of an inch.

## Step 1 - Define Null Hypothesis



The null hypothesis for this example is that the sample of 10 pieces of gum belongs to a population with a target mean of 7.5 one-hundredths of an inch.

## Step 2 - Prepare Data & Run Test

In [0]:
# We don't really need to import here since we already have above, but just for completeness sake
from scipy.stats import ttest_1samp
import numpy as np

In [0]:
# Place sample data into numpy array
gum_measurements = np.array([7.65, 7.60, 7.65, 7.70, 7.55, 7.55, 7.40, 7.40, 7.50, 7.50])
# Calculate mean
gum_mean = np.mean(gum_measurements)

In [0]:
print("Mean Gum Measurements : {:.2f}".format(gum_mean))

Mean Gum Measurements : 7.55


In [0]:
# Run test using scipy lib
ttest, pval = ttest_1samp(gum_measurements, 7.5)

## Step 3 Collect & Analyse Results

In [0]:
print("pval: {:.2f}".format(pval))
if pval < 0.05:
  print("Result is statistically significant! The gum thickness for the sample deviates significantly from 7.5 one hundredths of an inch")
else:
  print("Result is not statistically signifcant! The gum thickness for these handsets does not deviate significantly from 7.5 one hundredths of an inch")

pval: 0.16
Result is not statistically signifcant! The gum thickness for these handsets does not deviate significantly from 7.5 one hundredths of an inch


# 5. Assumptions

There are a number of underlying assumptions that are made when performing a One Sample T-Test about the underlying datasets. 

These assumptions are:



*   The dependent variable must be continuous (interval/ratio).
*   The observations are independent of one another.
*   The dependent variable should be approximately normally distributed.
*    The dependent variable should not contain any outliers.

[2]



# References

[1] https://www.statisticshowto.datasciencecentral.com/one-sample-t-test/  
[2] https://www.statisticssolutions.com/manova-analysis-one-sample-t-test/  