Q1: What is the difference between a t-test and a z-test? Provide an example scenario where you would
use each type of test.

In [1]:
import numpy as np
from scipy.stats import norm, t

# Z-Test Example
# Scenario: Population standard deviation is known

# Given data
population_mean = 100
population_std = 15
sample_mean = 105
sample_size = 30

# Z-Score calculation
z_score = (sample_mean - population_mean) / (population_std / np.sqrt(sample_size))

# Z-Test
p_value_z = 2 * (1 - norm.cdf(np.abs(z_score)))  # Two-tailed test

# Print results for Z-Test
print("Z-Test Example:")
print(f"Z-Score: {z_score:.4f}")
print(f"P-Value (Z-Test): {p_value_z:.4f}")
if p_value_z < 0.05:
    print("Conclusion: Reject the null hypothesis.")
else:
    print("Conclusion: Fail to reject the null hypothesis.")
print("\n")

# T-Test Example
# Scenario: Population standard deviation is unknown

# Given data
sample_mean_t = 32
sample_std_t = 8
population_mean_t = 30
sample_size_t = 15

# T-Score calculation
t_score = (sample_mean_t - population_mean_t) / (sample_std_t / np.sqrt(sample_size_t))

# T-Test
p_value_t = 2 * (1 - t.cdf(np.abs(t_score), df=sample_size_t - 1))  # Two-tailed test

# Print results for T-Test
print("T-Test Example:")
print(f"T-Score: {t_score:.4f}")
print(f"P-Value (T-Test): {p_value_t:.4f}")
if p_value_t < 0.05:
    print("Conclusion: Reject the null hypothesis.")
else:
    print("Conclusion: Fail to reject the null hypothesis.")


Z-Test Example:
Z-Score: 1.8257
P-Value (Z-Test): 0.0679
Conclusion: Fail to reject the null hypothesis.


T-Test Example:
T-Score: 0.9682
P-Value (T-Test): 0.3494
Conclusion: Fail to reject the null hypothesis.


Q2: Differentiate between one-tailed and two-tailed tests.

In [2]:
import numpy as np
from scipy.stats import t

# Given data
sample_mean = 105
population_mean = 100
population_std = 15
sample_size = 30

# One-Tailed Test Example
# Scenario: Testing if the sample mean is significantly greater than the population mean

# One-tailed test (greater than)
t_score_one_tailed = (sample_mean - population_mean) / (population_std / np.sqrt(sample_size))
p_value_one_tailed = 1 - t.cdf(t_score_one_tailed, df=sample_size - 1)  # Right-tailed test

# Print results for One-Tailed Test
print("One-Tailed Test Example (Greater Than):")
print(f"T-Score: {t_score_one_tailed:.4f}")
print(f"P-Value (One-Tailed Test): {p_value_one_tailed:.4f}")
if p_value_one_tailed < 0.05:
    print("Conclusion: Reject the null hypothesis. The sample mean is significantly greater than the population mean.")
else:
    print("Conclusion: Fail to reject the null hypothesis. The sample mean is not significantly greater than the population mean.")
print("\n")

# Two-Tailed Test Example
# Scenario: Testing if the sample mean is significantly different from the population mean

# Two-tailed test (not equal to)
t_score_two_tailed = (sample_mean - population_mean) / (population_std / np.sqrt(sample_size))
p_value_two_tailed = 2 * (1 - t.cdf(np.abs(t_score_two_tailed), df=sample_size - 1))  # Two-tailed test

# Print results for Two-Tailed Test
print("Two-Tailed Test Example (Not Equal To):")
print(f"T-Score: {t_score_two_tailed:.4f}")
print(f"P-Value (Two-Tailed Test): {p_value_two_tailed:.4f}")
if p_value_two_tailed < 0.05:
    print("Conclusion: Reject the null hypothesis. The sample mean is significantly different from the population mean.")
else:
    print("Conclusion: Fail to reject the null hypothesis. The sample mean is not significantly different from the population mean.")


One-Tailed Test Example (Greater Than):
T-Score: 1.8257
P-Value (One-Tailed Test): 0.0391
Conclusion: Reject the null hypothesis. The sample mean is significantly greater than the population mean.


Two-Tailed Test Example (Not Equal To):
T-Score: 1.8257
P-Value (Two-Tailed Test): 0.0782
Conclusion: Fail to reject the null hypothesis. The sample mean is not significantly different from the population mean.


Q3: Explain the concept of Type 1 and Type 2 errors in hypothesis testing. Provide an example scenario for
each type of error.

In [4]:
import numpy as np
from scipy.stats import ttest_1samp

# True Population Mean and Standard Deviation
true_population_mean = 100
true_population_std = 15

# Scenario 1: Type I Error (False Positive)
# Null Hypothesis: Population mean is less than or equal to 100
# Alternative Hypothesis: Population mean is greater than 100
# Significance Level: 0.05

# Simulate data assuming null hypothesis is true
sample_size_type_i = 30
sample_data_type_i = np.random.normal(loc=true_population_mean, scale=true_population_std, size=sample_size_type_i)

# Perform a one-tailed test (right-tailed)
t_score_type_i, p_value_type_i = ttest_1samp(sample_data_type_i, popmean=100, alternative='greater')

# Print results for Type I Error
print("Scenario 1: Type I Error (False Positive)")
print(f"T-Score: {t_score_type_i:.4f}")
print(f"P-Value: {p_value_type_i:.4f}")
if p_value_type_i < 0.05:
    print("Conclusion: Reject the null hypothesis (Type I Error)")
else:
    print("Conclusion: Fail to reject the null hypothesis")

print("\n")

# Scenario 2: Type II Error (False Negative)
# Null Hypothesis: Population mean is equal to 100
# Alternative Hypothesis: Population mean is not equal to 100

# Simulate data assuming alternative hypothesis is true
sample_size_type_ii = 30
sample_data_type_ii = np.random.normal(loc=true_population_mean + 5, scale=true_population_std, size=sample_size_type_ii)

# Perform a two-tailed test
t_score_type_ii, p_value_type_ii = ttest_1samp(sample_data_type_ii, popmean=100)

# Print results for Type II Error
print("Scenario 2: Type II Error (False Negative)")
print(f"T-Score: {t_score_type_ii:.4f}")
print(f"P-Value: {p_value_type_ii:.4f}")
if p_value_type_ii >= 0.05:
    print("Conclusion: Fail to reject the null hypothesis (Type II Error)")
else:
    print("Conclusion: Reject the null hypothesis")


Scenario 1: Type I Error (False Positive)
T-Score: 0.4881
P-Value: 0.3146
Conclusion: Fail to reject the null hypothesis


Scenario 2: Type II Error (False Negative)
T-Score: 3.3895
P-Value: 0.0020
Conclusion: Reject the null hypothesis


Q4: Explain Bayes's theorem with an example.

In [5]:
# Given probabilities
p_disease = 1 / 1000  # Prior probability of having the disease
p_no_disease = 1 - p_disease  # Prior probability of not having the disease
p_positive_given_disease = 0.99  # Sensitivity of the test
p_positive_given_no_disease = 0.05  # False positive rate

# Bayes's Theorem calculation
p_disease_given_positive = (p_positive_given_disease * p_disease) / ((p_positive_given_disease * p_disease) + (p_positive_given_no_disease * p_no_disease))

# Print result
print(f"Probability of having the disease given a positive test result: {p_disease_given_positive:.4f}")


Probability of having the disease given a positive test result: 0.0194


Q5: What is a confidence interval? How to calculate the confidence interval, explain with an example.

In [6]:
import numpy as np
from scipy.stats import t

# Sample data (heights in inches)
sample_data = np.array([68, 70, 72, 66, 71, 69, 73, 68, 70, 75])

# Calculate sample mean and standard deviation
sample_mean = np.mean(sample_data)
sample_std = np.std(sample_data, ddof=1)  # ddof=1 for sample standard deviation

# Set the confidence level and degrees of freedom
confidence_level = 0.95
degrees_of_freedom = len(sample_data) - 1

# Calculate the standard error of the mean
standard_error = sample_std / np.sqrt(len(sample_data))

# Calculate the critical value from the t-distribution
t_critical = t.ppf((1 + confidence_level) / 2, degrees_of_freedom)

# Calculate the margin of error
margin_of_error = t_critical * standard_error

# Calculate the confidence interval
confidence_interval_lower = sample_mean - margin_of_error
confidence_interval_upper = sample_mean + margin_of_error

# Print results
print(f"Sample Mean: {sample_mean:.2f}")
print(f"Margin of Error: {margin_of_error:.2f}")
print(f"Confidence Interval: ({confidence_interval_lower:.2f}, {confidence_interval_upper:.2f})")


Sample Mean: 70.20
Margin of Error: 1.90
Confidence Interval: (68.30, 72.10)


Q6. Use Bayes' Theorem to calculate the probability of an event occurring given prior knowledge of the
event's probability and new evidence. Provide a sample problem and solution.

In [7]:
# Given probabilities
p_disease = 1 / 1000  # Prior probability of having the disease
p_no_disease = 1 - p_disease  # Prior probability of not having the disease
p_positive_given_disease = 0.99  # Sensitivity of the test
p_positive_given_no_disease = 0.05  # False positive rate

# Bayes's Theorem calculation
def calculate_bayesian_probability(prior_prob, likelihood, evidence):
    # Calculate the denominator (marginal likelihood)
    marginal_likelihood = (prior_prob * likelihood) + ((1 - prior_prob) * (1 - evidence))
    
    # Calculate the posterior probability using Bayes's Theorem
    posterior_prob = (prior_prob * likelihood) / marginal_likelihood
    
    return posterior_prob

# Given evidence (testing positive)
evidence_positive = p_positive_given_disease

# Calculate the probability of having the disease given a positive test result
posterior_probability = calculate_bayesian_probability(p_disease, p_positive_given_disease, evidence_positive)

# Print result
print(f"Probability of having the disease given a positive test result: {posterior_probability:.4f}")


Probability of having the disease given a positive test result: 0.0902


Q7. Calculate the 95% confidence interval for a sample of data with a mean of 50 and a standard deviation
of 5. Interpret the results.

In [8]:
import scipy.stats as stats

# Given data
sample_mean = 50
population_std = 5
confidence_level = 0.95

# Number of observations in the sample
sample_size = 100  # Adjust as needed

# Calculate the margin of error using the Z-distribution
margin_of_error = stats.norm.ppf((1 + confidence_level) / 2) * (population_std / (sample_size ** 0.5))

# Calculate the confidence interval
confidence_interval_lower = sample_mean - margin_of_error
confidence_interval_upper = sample_mean + margin_of_error

# Print results
print(f"Sample Mean: {sample_mean}")
print(f"Population Standard Deviation: {population_std}")
print(f"Confidence Level: {confidence_level * 100}%")
print(f"Margin of Error: {margin_of_error:.4f}")
print(f"95% Confidence Interval: ({confidence_interval_lower:.4f}, {confidence_interval_upper:.4f})")


Sample Mean: 50
Population Standard Deviation: 5
Confidence Level: 95.0%
Margin of Error: 0.9800
95% Confidence Interval: (49.0200, 50.9800)


Q8. What is the margin of error in a confidence interval? How does sample size affect the margin of error?
Provide an example of a scenario where a larger sample size would result in a smaller margin of error.

In [9]:
import numpy as np
import scipy.stats as stats

# Parameters
population_mean = 65
population_std = 3

# Small sample size
sample_size_small = 30
sample_small = np.random.normal(population_mean, population_std, size=sample_size_small)
margin_of_error_small = stats.norm.ppf(0.975) * (population_std / np.sqrt(sample_size_small))

# Larger sample size
sample_size_large = 200
sample_large = np.random.normal(population_mean, population_std, size=sample_size_large)
margin_of_error_large = stats.norm.ppf(0.975) * (population_std / np.sqrt(sample_size_large))

# Print results
print(f"Small Sample - Margin of Error: {margin_of_error_small:.4f}")
print(f"Larger Sample - Margin of Error: {margin_of_error_large:.4f}")


Small Sample - Margin of Error: 1.0735
Larger Sample - Margin of Error: 0.4158


Q9. Calculate the z-score for a data point with a value of 75, a population mean of 70, and a population
standard deviation of 5. Interpret the results.

In [10]:
# Given data
data_point = 75
population_mean = 70
population_std = 5

# Calculate z-score
z_score = (data_point - population_mean) / population_std

# Print result
print(f"Z-Score for data point {data_point}: {z_score:.4f}")


Z-Score for data point 75: 1.0000


Q10. In a study of the effectiveness of a new weight loss drug, a sample of 50 participants lost an average
of 6 pounds with a standard deviation of 2.5 pounds. Conduct a hypothesis test to determine if the drug is
significantly effective at a 95% confidence level using a t-test.

In [11]:
import scipy.stats as stats

# Given data
sample_mean = 6
sample_std = 2.5
sample_size = 50
confidence_level = 0.95

# Null hypothesis: The drug is not significantly effective (mean change = 0)
null_hypothesis_mean = 0

# Perform one-sample t-test
t_stat, p_value = stats.ttest_1samp([sample_mean] * sample_size, null_hypothesis_mean)

# Calculate degrees of freedom
degrees_of_freedom = sample_size - 1

# Critical t-value for a two-tailed test at 95% confidence level
critical_t_value = stats.t.ppf((1 + confidence_level) / 2, df=degrees_of_freedom)

# Print results
print(f"t-statistic: {t_stat:.4f}")
print(f"P-value: {p_value:.4f}")
print(f"Critical t-value: ±{critical_t_value:.4f}")

# Check if the null hypothesis is rejected
if abs(t_stat) > critical_t_value or p_value < (1 - confidence_level):
    print("Null hypothesis rejected. The drug is significantly effective.")
else:
    print("Null hypothesis not rejected. The drug may not be significantly effective.")


t-statistic: inf
P-value: 0.0000
Critical t-value: ±2.0096
Null hypothesis rejected. The drug is significantly effective.


  res = hypotest_fun_out(*samples, **kwds)


Q11. In a survey of 500 people, 65% reported being satisfied with their current job. Calculate the 95%
confidence interval for the true proportion of people who are satisfied with their job.

In [12]:
import scipy.stats as stats
import math

# Given data
sample_proportion = 0.65
sample_size = 500
confidence_level = 0.95

# Calculate critical Z-value
critical_z_value = stats.norm.ppf((1 + confidence_level) / 2)

# Calculate standard error
standard_error = math.sqrt((sample_proportion * (1 - sample_proportion)) / sample_size)

# Calculate margin of error
margin_of_error = critical_z_value * standard_error

# Calculate confidence interval
confidence_interval_lower = sample_proportion - margin_of_error
confidence_interval_upper = sample_proportion + margin_of_error

# Print results
print(f"Sample Proportion: {sample_proportion}")
print(f"Critical Z-value: ±{critical_z_value:.4f}")
print(f"Margin of Error: {margin_of_error:.4f}")
print(f"95% Confidence Interval: ({confidence_interval_lower:.4f}, {confidence_interval_upper:.4f})")


Sample Proportion: 0.65
Critical Z-value: ±1.9600
Margin of Error: 0.0418
95% Confidence Interval: (0.6082, 0.6918)


Q12. A researcher is testing the effectiveness of two different teaching methods on student performance.
Sample A has a mean score of 85 with a standard deviation of 6, while sample B has a mean score of 82
with a standard deviation of 5. Conduct a hypothesis test to determine if the two teaching methods have a
significant difference in student performance using a t-test with a significance level of 0.01.

In [13]:
import scipy.stats as stats

# Given data for Sample A
mean_A = 85
std_A = 6
sample_size_A = 30  # Adjust as needed

# Given data for Sample B
mean_B = 82
std_B = 5
sample_size_B = 30  # Adjust as needed

# Significance level
alpha = 0.01

# Perform two-sample t-test
t_stat, p_value = stats.ttest_ind_from_stats(mean_A, std_A, sample_size_A, mean_B, std_B, sample_size_B)

# Degrees of freedom
degrees_of_freedom = sample_size_A + sample_size_B - 2

# Critical t-value for a two-tailed test at the given significance level
critical_t_value = stats.t.ppf(1 - alpha / 2, df=degrees_of_freedom)

# Print results
print(f"t-statistic: {t_stat:.4f}")
print(f"P-value: {p_value:.4f}")
print(f"Critical t-value: ±{critical_t_value:.4f}")

# Check if the null hypothesis is rejected
if abs(t_stat) > critical_t_value or p_value < alpha:
    print("Null hypothesis rejected. There is a significant difference in student performance.")
else:
    print("Null hypothesis not rejected. No significant difference in student performance.")


t-statistic: 2.1039
P-value: 0.0397
Critical t-value: ±2.6633
Null hypothesis not rejected. No significant difference in student performance.


Q13. A population has a mean of 60 and a standard deviation of 8. A sample of 50 observations has a mean
of 65. Calculate the 90% confidence interval for the true population mean.

In [14]:
import scipy.stats as stats
import math

# Given data
sample_mean = 65
population_std = 8
sample_size = 50
confidence_level = 0.90

# Calculate critical Z-value
critical_z_value = stats.norm.ppf((1 + confidence_level) / 2)

# Calculate margin of error
margin_of_error = critical_z_value * (population_std / math.sqrt(sample_size))

# Calculate confidence interval
confidence_interval_lower = sample_mean - margin_of_error
confidence_interval_upper = sample_mean + margin_of_error

# Print results
print(f"Sample Mean: {sample_mean}")
print(f"Population Standard Deviation: {population_std}")
print(f"Critical Z-value: ±{critical_z_value:.4f}")
print(f"Margin of Error: {margin_of_error:.4f}")
print(f"90% Confidence Interval: ({confidence_interval_lower:.4f}, {confidence_interval_upper:.4f})")


Sample Mean: 65
Population Standard Deviation: 8
Critical Z-value: ±1.6449
Margin of Error: 1.8609
90% Confidence Interval: (63.1391, 66.8609)


Q14. In a study of the effects of caffeine on reaction time, a sample of 30 participants had an average
reaction time of 0.25 seconds with a standard deviation of 0.05 seconds. Conduct a hypothesis test to
determine if the caffeine has a significant effect on reaction time at a 90% confidence level using a t-test.

In [15]:
import scipy.stats as stats

# Given data
sample_mean = 0.25
sample_std = 0.05
sample_size = 30  # Adjust as needed
confidence_level = 0.90

# Hypothesized population mean (null hypothesis)
hypothesized_mean = 0.20

# Perform one-sample t-test
t_stat, p_value = stats.ttest_1samp([sample_mean] * sample_size, popmean=hypothesized_mean)

# Degrees of freedom
degrees_of_freedom = sample_size - 1

# Critical t-value for a two-tailed test at the given confidence level
critical_t_value = stats.t.ppf((1 + confidence_level) / 2, df=degrees_of_freedom)

# Print results
print(f"Sample Mean: {sample_mean}")
print(f"Sample Standard Deviation: {sample_std}")
print(f"Hypothesized Population Mean: {hypothesized_mean}")
print(f"t-statistic: {t_stat:.4f}")
print(f"P-value: {p_value:.4f}")
print(f"Critical t-value: ±{critical_t_value:.4f}")

# Check if the null hypothesis is rejected
if abs(t_stat) > critical_t_value or p_value < (1 - confidence_level):
    print("Null hypothesis rejected. Caffeine has a significant effect on reaction time.")
else:
    print("Null hypothesis not rejected. No significant effect of caffeine on reaction time.")


Sample Mean: 0.25
Sample Standard Deviation: 0.05
Hypothesized Population Mean: 0.2
t-statistic: inf
P-value: 0.0000
Critical t-value: ±1.6991
Null hypothesis rejected. Caffeine has a significant effect on reaction time.
