# The strange origins of the Student’s t-test


William Sealy Gosset (13 June 1876 – 16 October 1937) was an English statistician, chemist and brewer who served as Head Brewer of Guinness and Head Experimental Brewer of Guinness and was a pioneer of modern statistics.

Gosset’s academic background may seem at odds with his employment as a brewer, but Guinness realized around this time that in order to maintain its dominant market share as the biggest brewer in Ireland, it would have to introduce brewing on a carefully controlled industrial scale. Such a venture would require rigorous quality control; hence the requirement for university trained chemists and statisticians. As any amateur brewer will attest, the brewing of beer has an element of the unknown, with success being not only dependent on the correct procedure, but also an element of luck! It was this reliance on luck for a successful product that Guinness sought to eliminate by scientific procedure. Beer, of course, is a combination of natural products; malted barley, hops and yeast, all mixed with water. These natural products share an inherent variability common to all agricultural products, whose quality is dependent not only upon crop variety, but also on climate, soil conditions, etc. Gosset’s task as Apprentice Brewer was not only to assess the quality of these products, but also to do so in a cost effective manner. This necessitated using experiments with small sample numbers to draw conclusions that could be applied to the large scale brewing process. However, Gosset discovered that in using small samples the distribution of the means deviated from the normal distribution. He therefore could not use conventional statistical methods based upon a normal distribution to draw his conclusions.

https://www.physoc.org/magazine-articles/the-strange-origins-of-the-students-t-test/







### What is the t-test and what is it used for.

In today's data-driven world, businesses can't rely on intuition alone to make critical decisions. Instead, they need to use statistical techniques like hypothesis testing to back up their actions with solid data. One popular test for this purpose is the t-test, which allows you to compare the means of two groups and determine whether they're statistically different.


A paired samples t-test is used to compare the means of two samples when each observation in one sample can be paired with an observation in the other sample.


What is a Paired T Test?
Use a paired t-test when each subject has a pair of measurements, such as a before and after score. A paired t-test determines whether the mean change for these pairs is significantly different from zero. This test is an inferential statistics procedure because it uses samples to draw conclusions about populations.

Paired t tests are also known as a paired sample t-test or a dependent samples t test. These names reflect the fact that the two samples are paired or dependent because they contain the same subjects. Conversely, an independent samples t test contains different subjects in the two samples.

Perhaps you have a sample of wood boards, and you paint half of each board with one paint and the other half with the other paint. Then, you measure the paint durability for both types of paint on all the boards. Each board has two paint durability scores, and you want to determine whether the two paints have different durability.(1)

### What is the t-test and what is it used for.





In [8]:
# import libraries needed for calculations
import scipy.stats as stats
import pandas as pd
import numpy as np

import seaborn as sns
import matplotlib.pyplot as plt
import glob


In [9]:
# code adapted from https://www.datacamp.com/tutorial/an-introduction-to-python-t-tests

# Population Mean 
mu = 10

# Sample Size
N1 = 21

# Degrees of freedom  
dof = N1 - 1

# Generate a random sample with mean = 11 and standard deviation = 1
x = np.random.randn(N1) + 11

# Using the Stats library, compute t-statistic and p-value
t_stat, p_val = stats.ttest_1samp(a=x, popmean = mu)
print("t-statistic = " + str(t_stat))  
print("p-value = " + str(p_val)) 

t-statistic = 2.6970508900132777
p-value = 0.01386717695026893


 Code explained:
 
This code imports the necessary libraries, numpy and scipy.stats.
• It then sets the population mean to 10 and the sample size to 21.
• The degrees of freedom are calculated as the sample size minus one.
• Next, a random sample of size 21 is generated using numpy's random.randn function, with a mean of 11 and standard deviation of 1.
• Finally, the t-statistic and p-value are calculated using the stats.ttest_1samp function from the scipy.stats library.
• The t-statistic measures the difference between the sample mean and the population mean in units of the standard error, while the p-value represents the probability of obtaining a t-statistic as extreme or more extreme than the observed value, assuming the null hypothesis (that the population mean is equal to the sample mean) is true.
• The results are printed to the console using the print function. 

Lets me do a t test from my working enviroment.
I want to know if different calipers are reading the same disc diffusion zone when measuring antibiotic sensitivity.


In [10]:
df = pd.read_csv("Data/ampicillin.csv"
                )
print(df.head())

   ids Caliper   Value
0    1        A     18
1    2        A     17
2    3        A     17
3    4        A     17
4    5        A     16


In [11]:
df.describe
df


Unnamed: 0,ids,Caliper,Value
0,1,A,18
1,2,A,17
2,3,A,17
3,4,A,17
4,5,A,16
5,6,A,16
6,7,A,16
7,8,A,17
8,9,A,16
9,10,A,17


In [15]:
df.describe()


Unnamed: 0,ids,Value
count,48.0,48.0
mean,24.5,16.916667
std,14.0,0.767237
min,1.0,16.0
25%,12.75,16.0
50%,24.5,17.0
75%,36.25,17.0
max,48.0,19.0


In [14]:
from scipy.stats import ttest_rel

from scipy.stats import ttest_ind

#define samples
group1 = df[df['Caliper']=='A']
group2 = df[df['Caliper']=='B']

#perform independent two sample t-test

ttest_ind(group1['Value'], group2['Value'])



KeyError: 'Caliper'

In [None]:
help(ttest_rel)

In [None]:
help(ttest_ind)

# References:

1. https://statisticsbyjim.com/hypothesis-testing/paired-t-test/

2. A Statistical Biography of William Sealy Gosset https://gwern.net/doc/statistics/decision/1990-pearson-studentastatisticalbiographyofwilliamsealygosset.pdf

3. Student's t-test https://www.youtube.com/watch?v=pTmLQvMM-1M

4. All About t-Tests (one sample, independent, & paired sample) https://www.youtube.com/watch?v=rK3mXS3gHyI&t=177s

5. How to use Python to Perform a Paired Sample T-test https://www.marsja.se/how-to-use-python-to-perform-a-paired-sample-t-test/

6. How to Conduct a Two Sample T-Test in Python https://www.geeksforgeeks.org/how-to-conduct-a-two-sample-t-test-in-python/

7. Python - Paired Samples t-Test https://www.youtube.com/watch?v=DZyDbEzaiK0

8. Learn How to Perform T-test using Python https://www.youtube.com/watch?v=ZR6bf8_s-hw

9. How to use Python to do Paired Sample T-test - SciPy, Pandas, and Pingouin https://www.bing.com/videos/search?q=scipy+t-tests&&view=detail&mid=2A20DA19C6DD858D6D302A20DA19C6DD858D6D30&&FORM=VRDGAR&ru=%2Fvideos%2Fsearch%3Fq%3Dscipy%2Bt-tests%26FORM%3DHDRSC6

10. Basic Statistics in Python: t tests with SciPy https://neuraldatascience.io/5-eda/ttests.html

11. Taking on T-Tests with SciPy https://medium.com/analytics-vidhya/taking-on-t-tests-with-scipy-b05eaaa0ee52

12. PYTHON FOR DATA SCIENCE https://www.pythonfordatascience.org/independent-samples-t-test-python/

13. SciPy – Statistical Significance Tests https://www.geeksforgeeks.org/scipy-statistical-significance-tests/

14. scipy.stats.ttest_ind https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_ind.html

15. SciPy Statistical Significance Tests 

16. Practice t-tests with these free online datasets https://stringfestanalytics.com/t-test-dataset-practice-guide/

17. Student's t-test (Paired and Unpaired) - A Level Biology https://www.youtube.com/watch?v=k_OhNaQCSTc

18. T Test (Students T Test) – Understanding the math and how it works https://www.machinelearningplus.com/statistics/t-test-students-understanding-the-math-and-how-it-works/

19. How to Do a T-Test for Beginners https://www.youtube.com/watch?v=qvPWQ-e03tQ (really good for just the basics on paired and tailed)

20. Choosing a Statistical Test https://www.youtube.com/watch?v=UaptUhOushw (choosing statistical tests for healthcare data)

21. Data Cleaning in Pandas | Python Pandas Tutorials https://www.youtube.com/watch?v=bDhvCp3_lYw

22. Independent t-Test [unpaired t-Test] https://www.youtube.com/watch?v=ujLHJKrgx1A (good for explaination of different types of the t-tests)

23. Python for Data Analysis: Hypothesis Testing and T-Tests https://www.youtube.com/watch?v=CIbJSX-biu0

24. Python for Data 24: Hypothesis Testing https://www.kaggle.com/code/hamelg/python-for-data-24-hypothesis-testing/notebook

25. Learn How to Perform T-test using Python https://www.youtube.com/watch?v=ZR6bf8_s-hw&t=37s

https://github.com/bhattbhavesh91/GA_Sessions/blob/master/t_test_independence/T_Test_Sales.ipynb








