Q1. What is the differnce between a t-test and z-test? Provide an example scenario where you would use each type of test.

In [1]:
## Test checks the mean between two samples that the both samples of mean are drawn from same population
import numpy as np
from scipy import stats

# Example data for t-test
group1_scores = np.array([82, 85, 79, 90, 88, 87, 84, 85, 90, 83])
group2_scores = np.array([78, 80, 85, 81, 79, 88, 82, 86, 80, 84])

# Performing independent t-test
t_statistic, p_value = stats.ttest_ind(group1_scores, group2_scores)
print("T-test Results:")
print("T-statistic:", t_statistic)
print("P-value:", p_value)
if p_value < 0.05:
    print("There is a significant difference between the two groups.")
else:
    print("There is no significant difference between the two groups.")

# Example data for z-test
sample_mean = 115
pop_mean = 100
pop_std = 15
sample_size = 50

# Performing one-sample z-test
z_statistic = (sample_mean - pop_mean) / (pop_std / np.sqrt(sample_size))
p_value = stats.norm.cdf(z_statistic)
print("\nZ-test Results:")
print("Z-statistic:", z_statistic)
print("P-value:", p_value)
if p_value < 0.05:
    print("The sample mean is significantly different from the population mean.")
else:
    print("The sample mean is not significantly different from the population mean.")


T-test Results:
T-statistic: 1.9630264600785698
P-value: 0.06528585052386887
There is no significant difference between the two groups.

Z-test Results:
Z-statistic: 7.0710678118654755
P-value: 0.9999999999992313
The sample mean is not significantly different from the population mean.


Q2: Differentiate between one-tailed and two-tailed tests.

In [3]:
import numpy as np
from scipy import stats

# Example data
sample_data = np.array([82, 85, 79, 90, 88, 87, 84, 85, 90, 83])

# Population mean (null hypothesis)
pop_mean = 85

# One-tailed test
# Null hypothesis: Population mean is less than or equal to the sample mean
# Alternative hypothesis: Population mean is greater than the sample mean
one_tailed_p_value = stats.ttest_1samp(sample_data, pop_mean).pvalue / 2  # divide by 2 for one-tailed
print("One-tailed p-value:", one_tailed_p_value)

# Two-tailed test
# Null hypothesis: Population mean is equal to the sample mean
# Alternative hypothesis: Population mean is not equal to the sample mean
two_tailed_p_value = stats.ttest_1samp(sample_data, pop_mean).pvalue
print("Two-tailed p-value:", two_tailed_p_value)


One-tailed p-value: 0.39706951905985977
Two-tailed p-value: 0.7941390381197195


Q4:  Explain Bayes's theorem with an example

In [4]:
import numpy as np
from scipy import stats

# Example scenario for Type I error (False Positive)
# True population mean is 100
# Null hypothesis: Population mean is equal to 100
# Significance level (alpha) = 0.05

# Generate sample data
np.random.seed(42)
sample_data_type1 = np.random.normal(loc=100, scale=10, size=100)

# Perform t-test
t_statistic_type1, p_value_type1 = stats.ttest_1samp(sample_data_type1, 100)

# Check if we reject the null hypothesis
alpha = 0.05
if p_value_type1 < alpha:
    print("Type I Error: Null hypothesis rejected (False Positive)")
else:
    print("No Type I Error: Null hypothesis not rejected")

# Example scenario for Type II error (False Negative)
# True population mean is 100
# Null hypothesis: Population mean is equal to 100
# Alternative hypothesis: Population mean is greater than 100
# Significance level (alpha) = 0.05

# Generate sample data
np.random.seed(42)
sample_data_type2 = np.random.normal(loc=105, scale=10, size=100)

# Perform one-tailed t-test
t_statistic_type2, p_value_type2 = stats.ttest_1samp(sample_data_type2, 100)

# Check if we fail to reject the null hypothesis
if p_value_type2 >= alpha:
    print("Type II Error: Null hypothesis not rejected (False Negative)")
else:
    print("No Type II Error: Null hypothesis rejected")


No Type I Error: Null hypothesis not rejected
No Type II Error: Null hypothesis rejected


Q5: What is a confidence interval? How to calculate the confidence interval, explain with an example.

In [5]:
import numpy as np
from scipy import stats

# Example data
np.random.seed(42)
sample_data = np.random.normal(loc=100, scale=10, size=100)  # Sample data with mean=100, std=10

# Calculate sample statistics
sample_mean = np.mean(sample_data)
sample_std = np.std(sample_data, ddof=1)  # ddof=1 for sample standard deviation
sample_size = len(sample_data)

# Set confidence level (e.g., 95%)
confidence_level = 0.95

# Calculate standard error
standard_error = sample_std / np.sqrt(sample_size)

# Calculate margin of error (using z-score for a large sample size)
z_score = stats.norm.ppf((1 + confidence_level) / 2)  # Two-tailed z-score
margin_of_error = z_score * standard_error

# Calculate confidence interval
lower_bound = sample_mean - margin_of_error
upper_bound = sample_mean + margin_of_error

print("Sample Mean:", sample_mean)
print("Confidence Interval:", (lower_bound, upper_bound))


Sample Mean: 98.96153482605907
Confidence Interval: (97.18155741526742, 100.74151223685071)


Q6. Use Bayes' Theorem to calculate the probability of an event occurring given prior knowledge of the
event's probability and new evidence. Provide a sample problem and solution.

In [6]:
# Define probabilities
sensitivity = 0.95  # P(Positive|Disease)
specificity = 0.90  # P(Negative|No Disease)
prevalence = 0.05   # P(Disease)

# Calculate complementary probabilities
false_positive_rate = 1 - specificity  # P(Positive|No Disease)
true_negative_rate = 1 - false_positive_rate  # P(Negative|No Disease)

# Apply Bayes' Theorem
p_positive_given_disease = sensitivity  # P(Positive|Disease)
p_negative_given_no_disease = true_negative_rate  # P(Negative|No Disease)

# Calculate denominator P(Positive)
p_positive = (prevalence * sensitivity) + ((1 - prevalence) * false_positive_rate)

# Calculate posterior probability P(Disease|Positive)
p_disease_given_positive = (p_positive_given_disease * prevalence) / p_positive

print("Probability of having the disease given a positive test result:", p_disease_given_positive)


Probability of having the disease given a positive test result: 0.3333333333333334


Q7. Calculate the 95% confidence interval for a sample of data with a mean of 50 and a standard deviation
of 5. Interpret the results.

In [7]:
import numpy as np

# Given data
sample_mean = 50
population_std = 5
confidence_level = 0.95
sample_size = 100  # Let's assume a sample size of 100

# Calculate Z-score for 95% confidence level
z_score = 1.96  # Approximate Z-score for 95% confidence level

# Calculate standard error
standard_error = population_std / np.sqrt(sample_size)

# Calculate margin of error
margin_of_error = z_score * standard_error

# Calculate confidence interval
lower_bound = sample_mean - margin_of_error
upper_bound = sample_mean + margin_of_error

print("95% Confidence Interval:", (lower_bound, upper_bound))


95% Confidence Interval: (49.02, 50.98)


Q8. What is the margin of error in a confidence interval? How does sample size affect the margin of error?
Provide an example of a scenario where a larger sample size would result in a smaller margin of error.

In [8]:
import numpy as np
from scipy import stats

# Population parameters
population_mean = 100
population_std = 15

# Sample sizes to compare
sample_sizes = [50, 100, 200, 500]

# Calculate margin of error for each sample size
for sample_size in sample_sizes:
    # Generate sample data
    sample_data = np.random.normal(loc=population_mean, scale=population_std, size=sample_size)
    
    # Calculate standard error
    standard_error = population_std / np.sqrt(sample_size)
    
    # Calculate margin of error (using Z-score for a 95% confidence interval)
    z_score = stats.norm.ppf(0.975)  # Two-tailed Z-score for 95% confidence level
    margin_of_error = z_score * standard_error
    
    print(f"Sample Size: {sample_size}, Margin of Error: {margin_of_error}")


Sample Size: 50, Margin of Error: 4.157711473049033
Sample Size: 100, Margin of Error: 2.939945976810081
Sample Size: 200, Margin of Error: 2.0788557365245164
Sample Size: 500, Margin of Error: 1.3147838108648724


Q9. Calculate the z-score for a data point with a value of 75, a population mean of 70, and a population
standard deviation of 5. Interpret the results.

In [9]:
# Given data
x = 75  # Value of the data point
population_mean = 70
population_std = 5

# Calculate Z-score
z_score = (x - population_mean) / population_std

print("Z-score:", z_score)


Z-score: 1.0


Q10. In a study of the effectiveness of a new weight loss drug, a sample of 50 participants lost an average
of 6 pounds with a standard deviation of 2.5 pounds. Conduct a hypothesis test to determine if the drug is
significantly effective at a 95% confidence level using a t-test.

In [10]:
from scipy import stats

# Given data
sample_mean = 6
sample_std = 2.5
sample_size = 50
null_hypothesis_mean = 0
alpha = 0.05

# Calculate the t-statistic
t_statistic = (sample_mean - null_hypothesis_mean) / (sample_std / (sample_size ** 0.5))

# Determine the critical t-value(s) for the two-tailed test
critical_t_value = stats.t.ppf(1 - alpha / 2, df=sample_size - 1)

# Print results
print("T-statistic:", t_statistic)
print("Critical t-value:", critical_t_value)

# Determine if null hypothesis should be rejected
if abs(t_statistic) > critical_t_value:
    print("Null hypothesis rejected. The drug is significantly effective.")
else:
    print("Null hypothesis not rejected. There is not enough evidence to conclude that the drug is significantly effective.")


T-statistic: 16.970562748477143
Critical t-value: 2.009575234489209
Null hypothesis rejected. The drug is significantly effective.


Q11. In a survey of 500 people, 65% reported being satisfied with their current job. Calculate the 95%
confidence interval for the true proportion of people who are satisfied with their job.

In [11]:
import numpy as np

# Given data
sample_proportion = 0.65  # 65% reported being satisfied with their job
sample_size = 500
confidence_level = 0.95

# Calculate the Z-score for the desired confidence level
z_score = 1.96  # Approximate Z-score for a 95% confidence level

# Calculate standard error
standard_error = np.sqrt((sample_proportion * (1 - sample_proportion)) / sample_size)

# Calculate margin of error
margin_of_error = z_score * standard_error

# Calculate confidence interval
lower_bound = sample_proportion - margin_of_error
upper_bound = sample_proportion + margin_of_error

print("95% Confidence Interval for the true proportion of people satisfied with their job:")
print("Lower bound:", lower_bound)
print("Upper bound:", upper_bound)


95% Confidence Interval for the true proportion of people satisfied with their job:
Lower bound: 0.608191771144905
Upper bound: 0.6918082288550951


Q12. A researcher is testing the effectiveness of two different teaching methods on student performance.
Sample A has a mean score of 85 with a standard deviation of 6, while sample B has a mean score of 82
with a standard deviation of 5. Conduct a hypothesis test to determine if the two teaching methods have a
significant difference in student performance using a t-test with a significance level of 0.01.

In [12]:
from scipy import stats

# Given data for Sample A
mean_A = 85
std_A = 6
n_A = 30  # Sample size for Sample A

# Given data for Sample B
mean_B = 82
std_B = 5
n_B = 30  # Sample size for Sample B

# Significance level
alpha = 0.01

# Calculate the t-statistic
std_error = ((std_A**2 / n_A) + (std_B**2 / n_B))**0.5
t_statistic = (mean_A - mean_B) / std_error

# Determine degrees of freedom
df = n_A + n_B - 2

# Find critical t-value for two-tailed test
critical_t_value = stats.t.ppf(1 - alpha / 2, df)

# Print results
print("T-statistic:", t_statistic)
print("Critical t-value:", critical_t_value)

# Determine if null hypothesis should be rejected
if abs(t_statistic) > critical_t_value:
    print("Reject the null hypothesis. There is a significant difference in student performance between the two teaching methods.")
else:
    print("Fail to reject the null hypothesis. There is no significant difference in student performance between the two teaching methods.")


T-statistic: 2.1038606199548298
Critical t-value: 2.6632869538098674
Fail to reject the null hypothesis. There is no significant difference in student performance between the two teaching methods.


Q13. A population has a mean of 60 and a standard deviation of 8. A sample of 50 observations has a mean
of 65. Calculate the 90% confidence interval for the true population mean.

In [13]:
import numpy as np

# Given data
sample_mean = 65
population_std = 8
sample_size = 50
confidence_level = 0.90

# Calculate the Z-score for the desired confidence level
z_score = 1.645  # Z-score for a 90% confidence level

# Calculate standard error
standard_error = population_std / np.sqrt(sample_size)

# Calculate margin of error
margin_of_error = z_score * standard_error

# Calculate confidence interval
lower_bound = sample_mean - margin_of_error
upper_bound = sample_mean + margin_of_error

print("90% Confidence Interval for the true population mean:")
print("Lower bound:", lower_bound)
print("Upper bound:", upper_bound)


90% Confidence Interval for the true population mean:
Lower bound: 63.13889495191701
Upper bound: 66.86110504808299


Q14. In a study of the effects of caffeine on reaction time, a sample of 30 participants had an average
reaction time of 0.25 seconds with a standard deviation of 0.05 seconds. Conduct a hypothesis test to
determine if the caffeine has a significant effect on reaction time at a 90% confidence level using a t-test.

In [14]:
from scipy import stats

# Given data
sample_mean = 0.25
sample_std = 0.05
sample_size = 30
population_mean_without_caffeine = 0.27  # Assumed population mean without caffeine
alpha = 0.10

# Calculate the t-statistic
t_statistic = (sample_mean - population_mean_without_caffeine) / (sample_std / (sample_size ** 0.5))

# Determine degrees of freedom
df = sample_size - 1

# Find critical t-values for two-tailed test
critical_t_value_lower = stats.t.ppf(alpha / 2, df)
critical_t_value_upper = stats.t.ppf(1 - alpha / 2, df)

# Print results
print("T-statistic:", t_statistic)
print("Critical t-values (lower, upper):", critical_t_value_lower, ",", critical_t_value_upper)

# Determine if null hypothesis should be rejected
if t_statistic < critical_t_value_lower or t_statistic > critical_t_value_upper:
    print("Reject the null hypothesis. Caffeine has a significant effect on reaction time.")
else:
    print("Fail to reject the null hypothesis. There is no significant effect of caffeine on reaction time.")


T-statistic: -2.1908902300206665
Critical t-values (lower, upper): -1.6991270265334977 , 1.6991270265334972
Reject the null hypothesis. Caffeine has a significant effect on reaction time.
