# **T-Test**

A t-test is a statistical test used to determine if there is a significant difference between the means of two groups. It's particularly useful when dealing with small sample sizes and when the population standard deviation is unknown.

## **Types of t-tests**

### ***One-Sample t-test***

Purpose: To determine if the mean of a single group is different from a known value (population mean).

Example: Testing if the average height of a sample of people is different from a known average height of the population.

#### ***Assumptions One-Sample t-test:***
1. The data is normally distributed
2. The sample is randomly selected from the population.
3. The obeservations in the sample must be independent, which means that the value of one obervetion should not influence the value of another observation.
4. population standard deviation is not known.

#### ***Understand with Example***

A nutritionist claims that the average daily intake of calories for adults in a certain city is 2200 calories. To test this claim, a sample of 30 adults from the city is selected, and their daily caloric intake is recorded. The sample has an average daily intake of 2150 calories with a standard deviation of 100 calories.

#### ***State the Hypotheses***

Null Hypothesis (H₀): The mean daily intake of calories for adults in the city is 2200 calories. i.e H0:μ=2200

Alternative Hypothesis (H₁): The mean daily intake of calories for adults in the city is not 2200 calories. i.e H1 :μ!=2200

#### ***Calculate the Test Statistic:***

![image.png](attachment:image.png)

where x̄ is sample mean.

μ is population mean under the null hypothesis

s is sample standard deviation

n is sample size​

that is t = (2150-2200)/(100/root(30)) = -2.73

#### ***Degrees of Freedom:***
Degrees of Freedom

df = n-1 = 30-1 = 29

#### ***Find the Critical t-value:***

For a two-tailed test at 𝛼=0.05 use a t-table to find the critical t-value with 29 degrees of freedom.

https://www.sjsu.edu/faculty/gerstman/StatPrimer/t-table.pdf

here critical value is 2.045.

#### ***Make a Decision:***

If the absolute value of the calculated t-statistic is greater than the critical t-value, reject the null hypothesis.

since 2.73 > 2.045, we reject the null hypothesis.

##### ***P_value approch***

In [9]:
import scipy.stats as stats

# Sample data
sample_mean = 2150
sample_std = 100
sample_size = 30
population_mean = 2200

# Calculate the t-statistic
t_statistic = (sample_mean - population_mean) / (sample_std / (sample_size ** 0.5))

# Calculate the degrees of freedom
df = sample_size - 1

# Calculate the p-value
p_value = stats.t.sf(abs(t_statistic), df) * 2  # Two-tailed test

# Print results
print(f"T-statistic: {t_statistic}")
print(f"P-value: {p_value}")

# Determine significance
alpha = 0.05
if p_value < alpha:
    print("Reject the null hypothesis.")
else:
    print("Fail to reject the null hypothesis.")


T-statistic: -2.7386127875258306
P-value: 0.01043738949886733
Reject the null hypothesis.


### ***Independent Samples t-test (Two-Sample t-test)***

Purpose: To compare the means of two independent groups.

Example: Comparing the average test scores of students from two different classes.

#### ***Assumptions of Independent Samples t-test:***
1. The data is normally distributed.
2. The variances of the two groups are equal
3. The samples are independent of each other.

#### ***How to Perform***

#### ***State the Hypotheses:***

Null Hypothesis (H₀): Assumes no difference in means 

Alternative Hypothesis (H₁): Assumes a difference in means 

#### ***Calculate the Test Statistic:***

The t-test formula for an independent samples t-test is

![image.png](attachment:image.png)

where, x̄1 and x̄2 are the sample means

s1^2 and s2^2 are the sample variances

n1 and n2 are sample sizes

#### ***Determine the Degrees of Freedom (df):***

For an independent samples t-test
 
df = n1+n2-2

#### ***Find the Critical t-value:***

Use a t-distribution table to find the critical value based on the chosen significance level (e.g.α=0.05) and the degrees of freedom.

#### ***Compare the Test Statistic to the Critical t-value:***

If the calculated t-value is greater than the critical t-value (for a two-tailed test, consider the absolute value), reject the null hypothesis.

#### ***Understand with Example***

You have two groups of participants who have followed two different diets for a month. You want to test if there is a significant difference in the weight loss between the two diet groups using an independent samples t-test with a significance level of 0.05.

weight_loss_group_A =  [-3.2, -4.1, -3.5, -5.0, -4.7, -3.8, -3.9, -4.3, -5.1, -4.0, -3.6, -4.8, -4.2, -5.3, -4.1, -4.5, -3.7, -5.0, -4.9, -4.4, -3.3, -4.6, -4.1, -3.8, -4.7, -3.5, -4.0, -5.2, -4.8, -3.9]

weight_loss_group_B =  [-2.8, -3.1, -2.5, -3.2, -2.9, -2.4, -3.0, -2.7, -3.4, -2.6, -3.1, -2.8, -3.3, -3.0, -2.5, -2.9, -2.7, -3.2, -2.8, -3.0, -2.4, -3.3, -2.5, -2.9, -3.1, -2.7, -2.8, -3.4, -3.1, -2.6]

#### ***State the Hypotheses:***

Null Hypothesis: μA = μB(The means of the two groups are equal)

Alternative Hypothesis: μA != μB(The means of the two groups are not equal)


In [1]:
import scipy.stats as stats

# Sample data
group_A = [-3.2, -4.1, -3.5, -5.0, -4.7, -3.8, -3.9, -4.3, -5.1, -4.0, -3.6, -4.8, -4.2, -5.3, -4.1, -4.5, -3.7, -5.0, -4.9, -4.4, -3.3, -4.6, -4.1, -3.8, -4.7, -3.5, -4.0, -5.2, -4.8, -3.9]
group_B = [-2.8, -3.1, -2.5, -3.2, -2.9, -2.4, -3.0, -2.7, -3.4, -2.6, -3.1, -2.8, -3.3, -3.0, -2.5, -2.9, -2.7, -3.2, -2.8, -3.0, -2.4, -3.3, -2.5, -2.9, -3.1, -2.7, -2.8, -3.4, -3.1, -2.6]

# Perform the independent samples t-test
t_statistic, p_value = stats.ttest_ind(group_A, group_B)

# Print results
print(f"T-statistic: {t_statistic}")
print(f"P-value: {p_value}")

# Determine significance
alpha = 0.05
if p_value < alpha:
    print("Reject the null hypothesis: There is sufficient evidence to suggest a significant difference between the weight loss of the two diet groups.")
else:
    print("Fail to reject the null hypothesis: There is not enough evidence to suggest a significant difference between the weight loss of the two diet groups.")


T-statistic: -11.39711363490571
P-value: 1.9555275335598556e-16
Reject the null hypothesis: There is sufficient evidence to suggest a significant difference between the weight loss of the two diet groups.


### ***Paired Samples t-test (Dependent t-test)***

Purpose: To compare the means of two related groups.

Example: Comparing the average test scores of students before and after a training program.

#### ***Assumptions of Paired Samples t-test***
1. The differences between the paired observations are normally distributed.
2. The pairs are dependent or matched in some meaningful way.

#### ***Calculate the Differences:***

di = afteri ​− beforei, where 𝑑𝑖 is the difference for the 𝑖-th pair.
​
#### ***Calculate the Mean of the Differences:***

![image.png](attachment:image.png)

#### ***Calculate the Standard Deviation of the Differences:***

![image-2.png](attachment:image-2.png)

#### ***Calculate the t-statistic:***

![image-3.png](attachment:image-3.png)

Compare the t-statistic with the critical t-value from the t-distribution table for n−1 degrees of freedom and the desired significance level 𝛼


#### ***Understand with Example***

Suppose we want to evaluate the effectiveness of a new training program on employees' productivity. We measure the productivity of 30 employees before and after the training program. Here are the productivity scores (on a scale of 0-100) for each employee before and after the training.

Before Training: [78, 82, 85, 79, 88, 77, 81, 85, 90, 79, 83, 82, 84, 88, 78, 77, 80, 85, 89, 84, 81, 83, 80, 84, 86, 87, 82, 83, 89, 85]

After Training: [85, 86, 89, 84, 91, 82, 84, 90, 92, 85, 86, 87, 90, 92, 84, 82, 83, 90, 91, 89, 86, 87, 86, 89, 90, 91, 85, 87, 92, 88]

#### ***State the Hypotheses:***

Null Hypothesis (H0): The mean difference in productivity before and after the training is zero. (𝜇𝐷=0)

Alternative Hypothesis (𝐻1): The mean difference in productivity before and after the training is not zero. (𝜇𝐷!=0)


In [2]:
import scipy.stats as stats

# Sample data
before_training = [78, 82, 85, 79, 88, 77, 81, 85, 90, 79, 83, 82, 84, 88, 78, 77, 80, 85, 89, 84, 81, 83, 80, 84, 86, 87, 82, 83, 89, 85]
after_training = [85, 86, 89, 84, 91, 82, 84, 90, 92, 85, 86, 87, 90, 92, 84, 82, 83, 90, 91, 89, 86, 87, 86, 89, 90, 91, 85, 87, 92, 88]

# Perform the dependent (paired) t-test
t_statistic, p_value = stats.ttest_rel(before_training, after_training)

# Print results
print(f"T-statistic: {t_statistic}")
print(f"P-value: {p_value}")

# Determine significance
alpha = 0.05
if p_value < alpha:
    print("Reject the null hypothesis: There is sufficient evidence to suggest a significant difference in productivity before and after the training.")
else:
    print("Fail to reject the null hypothesis: There is not enough evidence to suggest a significant difference in productivity before and after the training.")


T-statistic: -18.639642837168104
P-value: 1.0977120388163886e-17
Reject the null hypothesis: There is sufficient evidence to suggest a significant difference in productivity before and after the training.
