<a href="https://colab.research.google.com/github/ranikumari777/CollegeProject-demo/blob/master/Copy_of_Z_T__test_example_.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Problem:
A coffee company claims that its coffee cans contain an average of 400 grams of coffee. A quality inspector suspects that the actual average content is different. She takes a sample of 100 coffee cans and finds the average content to be 398 grams with a known population standard deviation of 8 grams. At a 5% significance level, can we conclude that the average coffee content is different from 400 grams?


In [1]:
import numpy as np
from scipy import stats

# Given data
population_mean = 400   # Claimed mean weight
population_std = 8      # Known standard deviation
sample_size = 100       # Sample size
sample_mean = 398       # Observed sample mean

# Calculate the Z-score
z_score = (sample_mean - population_mean) / (population_std / np.sqrt(sample_size))

# Calculate the p-value for a two-tailed test
p_value = 2 * (1 - stats.norm.cdf(abs(z_score)))

print(f"Z-score: {z_score}")
print(f"P-value: {p_value}")

# Conclusion
alpha = 0.05  # Significance level
if p_value < alpha:
    print("Reject the null hypothesis: There is a significant difference in coffee can weight.")
else:
    print("Fail to reject the null hypothesis: No significant difference in coffee can weight.")


Z-score: -2.5
P-value: 0.012419330651552318
Reject the null hypothesis: There is a significant difference in coffee can weight.


Problem:
A bakery advertises that its cookies contain an average of 50 grams of sugar. A customer doubts this and takes a random sample of 15 cookies, finding an average sugar content of 48 grams with a sample standard deviation of 5 grams. At a 5% significance level, can we conclude that the actual sugar content is different from 50 grams?


In [2]:
from scipy import stats

# Given data
population_mean = 50    # Claimed mean sugar content
sample_size = 15        # Sample size
sample_mean = 48        # Observed sample mean
sample_std = 5          # Sample standard deviation

# Generate a random sample close to the observed sample mean
np.random.seed(0)
sample = np.random.normal(sample_mean, sample_std, sample_size)

# Perform the one-sample t-test
t_statistic, p_value = stats.ttest_1samp(sample, population_mean)

print(f"T-statistic: {t_statistic}")
print(f"P-value: {p_value}")

# Conclusion
alpha = 0.05  # Significance level
if p_value < alpha:
    print("Reject the null hypothesis: There is a significant difference in sugar content.")
else:
    print("Fail to reject the null hypothesis: No significant difference in sugar content.")


T-statistic: 1.274699248751995
P-value: 0.22316646747711905
Fail to reject the null hypothesis: No significant difference in sugar content.


Problem:
An education researcher wants to compare test scores between two classes that use different teaching methods. In Class A, the mean score of 15 students was 78 with a standard deviation of 10. In Class B, the mean score of 15 students was 85 with a standard deviation of 8. At a 5% significance level, is there a significant difference between the two classes' scores?
1.	Set up hypotheses:
o	Null Hypothesis (H₀): The mean score of Class A = Mean score of Class B.
o	Alternative Hypothesis (H₁): The mean scores are different.
2.	Perform a Two-Sample T-test.


In [3]:
# Given data for Class A
mean_a = 78
std_a = 10
size_a = 15

# Given data for Class B
mean_b = 85
std_b = 8
size_b = 15

# Generate random samples based on the mean and std deviation
np.random.seed(0)
class_a_scores = np.random.normal(mean_a, std_a, size_a)
class_b_scores = np.random.normal(mean_b, std_b, size_b)

# Perform the two-sample t-test
t_statistic, p_value = stats.ttest_ind(class_a_scores, class_b_scores)

print(f"T-statistic: {t_statistic}")
print(f"P-value: {p_value}")

# Conclusion
alpha = 0.05  # Significance level
if p_value < alpha:
    print("Reject the null hypothesis: There is a significant difference in test scores between the classes.")
else:
    print("Fail to reject the null hypothesis: No significant difference in test scores between the classes.")


T-statistic: -0.49679550858596827
P-value: 0.6232115193256857
Fail to reject the null hypothesis: No significant difference in test scores between the classes.


Chi-squared Test
The Chi-squared test is used for categorical data to determine if there's a significant association between two variables.

In [4]:
import numpy as np
from scipy.stats import chi2_contingency

# Example: Contingency table (observed frequencies)
data = np.array([[50, 30],  # Group 1 (e.g., Yes/No)
                 [20, 40]])  # Group 2

# Perform Chi-squared test
chi2, p, dof, expected = chi2_contingency(data)

print("Chi-squared Statistic:", chi2)
print("P-value:", p)
print("Degrees of Freedom:", dof)
print("Expected Frequencies:")
print(expected)

if p < 0.05:
    print("Reject the null hypothesis: Variables are dependent.")
else:
    print("Fail to reject the null hypothesis: Variables are independent.")


Chi-squared Statistic: 10.529166666666667
P-value: 0.0011750518530845063
Degrees of Freedom: 1
Expected Frequencies:
[[40. 40.]
 [30. 30.]]
Reject the null hypothesis: Variables are dependent.


ANOVA (Analysis of Variance)
ANOVA is used to compare means of three or more groups.

In [5]:
from scipy.stats import f_oneway

# Example: Data for three groups
group1 = [20, 21, 19, 22, 20]
group2 = [30, 29, 31, 32, 30]
group3 = [40, 39, 41, 42, 40]

# Perform one-way ANOVA
f_stat, p_value = f_oneway(group1, group2, group3)

print("F-statistic:", f_stat)
print("P-value:", p_value)

if p_value < 0.05:
    print("Reject the null hypothesis: At least one group mean is different.")
else:
    print("Fail to reject the null hypothesis: All group means are equal.")


F-statistic: 384.61538461538123
P-value: 1.3134436249241147e-11
Reject the null hypothesis: At least one group mean is different.


T-test
The T-test is used to compare means between two groups.

(a) One-sample T-test

In [6]:
from scipy.stats import ttest_1samp

# Example: Data and population mean
data = [22, 21, 20, 23, 24, 25]
pop_mean = 20

# Perform one-sample T-test
t_stat, p_value = ttest_1samp(data, pop_mean)

print("T-statistic:", t_stat)
print("P-value:", p_value)

if p_value < 0.05:
    print("Reject the null hypothesis: Sample mean is significantly different from population mean.")
else:
    print("Fail to reject the null hypothesis: No significant difference.")


T-statistic: 3.273268353539886
P-value: 0.022118466722329432
Reject the null hypothesis: Sample mean is significantly different from population mean.


(b) Two-sample T-test (Independent samples)

In [7]:
from scipy.stats import ttest_ind

# Example: Data for two independent groups
group1 = [22, 23, 24, 25, 26]
group2 = [30, 29, 28, 27, 31]

# Perform independent two-sample T-test
t_stat, p_value = ttest_ind(group1, group2)

print("T-statistic:", t_stat)
print("P-value:", p_value)

if p_value < 0.05:
    print("Reject the null hypothesis: Group means are significantly different.")
else:
    print("Fail to reject the null hypothesis: No significant difference between group means.")


T-statistic: -5.0
P-value: 0.001052825793366539
Reject the null hypothesis: Group means are significantly different.


Z-test
Z-tests are used for large sample sizes or known population variance.


In [8]:
from statsmodels.stats.weightstats import ztest

# Example: Data for one sample
data = [100, 102, 98, 101, 97, 99, 100]
pop_mean = 100

# Perform one-sample Z-test
z_stat, p_value = ztest(data, value=pop_mean)

print("Z-statistic:", z_stat)
print("P-value:", p_value)

if p_value < 0.05:
    print("Reject the null hypothesis: Sample mean is significantly different from population mean.")
else:
    print("Fail to reject the null hypothesis: No significant difference.")


Z-statistic: -0.659912017596093
P-value: 0.5093102917194883
Fail to reject the null hypothesis: No significant difference.
