##### What is the difference between a t-test and a z-test? Provide an example scenario where you would use each type of test.

###### Difference Between T-Test and Z-Test
>T-Test: Used when the sample size is small (usually n<30) or the population standard deviation is unknown.

>Example: Testing if the average score of a class of 25 students is significantly different from the national average.

>Z-Test: Used for large samples (usually 
n≥30) or when the population standard deviation is known.

>Example: Checking if the mean height of 100 people is significantly different from the population mean.

In [1]:
from scipy.stats import ttest_1samp, norm

# T-Test example
sample_data = [23, 25, 20, 30, 28, 22]  # Sample data for a class of 25 students
population_mean = 24  # National average score
t_statistic, p_value = ttest_1samp(sample_data, population_mean)
print("T-Test Results:", t_statistic, p_value)

# Z-Test example
sample_mean = 70
population_mean = 68
std_dev = 5
n = 100
z_score = (sample_mean - population_mean) / (std_dev / (n ** 0.5))
p_value_z = 2 * (1 - norm.cdf(abs(z_score)))
print("Z-Test Results:", z_score, p_value_z)

T-Test Results: 0.4323377011671177 0.683506977874608
Z-Test Results: 4.0 6.334248366623996e-05


##### Differentiate between one-tailed and two-tailed tests.

>One-Tailed Test: Tests for a specific direction of the effect (e.g., greater than or less than).

>Two-Tailed Test: Tests for any significant difference, regardless of direction.

In [2]:
from scipy.stats import ttest_1samp

# Two-tailed example (testing if mean is different)
sample_data = [22, 19, 24, 25, 20]
population_mean = 21
t_statistic, p_value = ttest_1samp(sample_data, population_mean)
print("Two-Tailed Test Results:", t_statistic, p_value)

# For one-tailed, use p_value / 2 if the direction is known.
if t_statistic > 0:
    one_tailed_p_value = p_value / 2
else:
    one_tailed_p_value = 1 - (p_value / 2)
print("One-Tailed Test Results:", t_statistic, one_tailed_p_value)

Two-Tailed Test Results: 0.8770580193070292 0.4299733794885493
One-Tailed Test Results: 0.8770580193070292 0.21498668974427465


##### Explain the concept of Type 1 and Type 2 errors in hypothesis testing. Provide an example scenario for each type of error.

>Type 1 Error: Rejecting a true null hypothesis (false positive).

>Example: Concluding a new drug works when it does not.

>Type 2 Error: Failing to reject a false null hypothesis (false negative).

>Example: Concluding a drug has no effect when it actually does.

##### Explain Bayes's theorem with an example.

>Bayes's theorem calculates the probability of an event based on prior knowledge of conditions related to the event.

>Example: If a test for a disease has a 99% accuracy rate, Bayes’s theorem can help determine the likelihood someone has the disease based on their positive test result.

>Formula: P(A∣B)= 
P(B∣A)⋅P(A)/p(B)

##### What is a confidence interval, How to calculate the confidence interval, explain with an example.

A confidence interval estimates a range within which a population parameter lies, based on a sample.

>Formula: ![image.png](attachment:image.png)

CI	=	confidence interval
bar{x}	=	sample mean
z	=	confidence level value
{s}	=	sample standard deviation
{n}	=	sample size

Example: For a sample mean of 50, 
σ=5, and 
n=30, a 95% CI is calculated.

In [4]:
import scipy.stats as stats

# Given values
sample_mean = 50
std_dev = 5
n = 30
confidence_level = 0.95

# Calculate the confidence interval
z_score = stats.norm.ppf(1 - (1 - confidence_level) / 2)
margin_of_error = z_score * (std_dev / (n ** 0.5))
ci_lower = sample_mean - margin_of_error
ci_upper = sample_mean + margin_of_error
print("95% Confidence Interval:", (ci_lower, ci_upper))

95% Confidence Interval: (48.210805856282846, 51.789194143717154)


##### Use Bayes' Theorem to calculate the probability of an event occurring given prior knowledge of the event's probability and new evidence. Provide a sample problem and solution.

In [5]:
# Given
p_disease = 0.01  # Probability of having the disease
p_positive_given_disease = 0.99  # Probability of positive test if disease is present
p_positive_given_no_disease = 0.05  # Probability of positive test if no disease

# Total probability of positive test
p_positive = (p_positive_given_disease * p_disease) + \
             (p_positive_given_no_disease * (1 - p_disease))

# Bayes' theorem: P(Disease|Positive)
p_disease_given_positive = (p_positive_given_disease * p_disease) / p_positive
print("P(Disease | Positive Test):", p_disease_given_positive)

P(Disease | Positive Test): 0.16666666666666669


##### Calculate the 95% confidence interval for a sample of data with a mean of 50 and a standard deviation of 5. Interpret the results.

In [6]:
import scipy.stats as stats

# Given values
sample_mean = 50
std_dev = 5
n = 30
confidence_level = 0.95

# Calculate the confidence interval
z_score = stats.norm.ppf(1 - (1 - confidence_level) / 2)
margin_of_error = z_score * (std_dev / (n ** 0.5))
ci_lower = sample_mean - margin_of_error
ci_upper = sample_mean + margin_of_error
print("95% Confidence Interval:", (ci_lower, ci_upper))

95% Confidence Interval: (48.210805856282846, 51.789194143717154)


##### What is the margin of error in a confidence interval? How does sample size affect the margin of error, Provide an example of a scenario where a larger sample size would result in a smaller margin of error.

The margin of error decreases as sample size increases, improving estimate precision.

>Example: For a survey, doubling the sample size reduces the margin of error, providing a more precise estimate.

In [8]:
# Smaller vs. larger sample size example
confidence_level = 0.95
std_dev = 10
sample_mean = 100

# Small sample (n=25)
n_small = 25
z_score = stats.norm.ppf(1 - (1 - confidence_level) / 2)
margin_error_small = z_score * (std_dev / (n_small ** 0.5))
print("Margin of Error with n=25:", margin_error_small)

# Large sample (n=100)
n_large = 100
margin_error_large = z_score * (std_dev / (n_large ** 0.5))
print("Margin of Error with n=100:", margin_error_large)

Margin of Error with n=25: 3.919927969080108
Margin of Error with n=100: 1.959963984540054


##### Calculate the z-score for a data point with a value of 75, a population mean of 70, and a population standard deviation of 5. Interpret the results.

In [9]:
x = 75
population_mean = 70
population_std_dev = 5

z_score = (x - population_mean) / population_std_dev
print("Z-Score:", z_score)

Z-Score: 1.0


##### In a study of the effectiveness of a new weight loss drug, a sample of 50 participants lost an average of 6 pounds with a standard deviation of 2.5 pounds. Conduct a hypothesis test to determine if the drug is significantly effective at a 95% confidence level using a t-test.

In [10]:
from scipy.stats import t

sample_mean = 6
population_mean = 0
std_dev = 2.5
n = 50
confidence_level = 0.95

# Calculate t-score
t_statistic = (sample_mean - population_mean) / (std_dev / (n ** 0.5))
t_critical = t.ppf(1 - (1 - confidence_level) / 2, df=n - 1)
print("T-Statistic:", t_statistic)
print("Critical Value:", t_critical)

T-Statistic: 16.970562748477143
Critical Value: 2.009575234489209


##### In a survey of 500 people, 65% reported being satisfied with their current job. Calculate the 95% confidence interval for the true proportion of people who are satisfied with their job.

In [11]:
p = 0.65  # Sample proportion
n = 500
confidence_level = 0.95

# Calculate the margin of error and confidence interval
z_score = stats.norm.ppf(1 - (1 - confidence_level) / 2)
margin_error = z_score * ((p * (1 - p)) / n) ** 0.5
ci_lower = p - margin_error
ci_upper = p + margin_error
print("95% Confidence Interval for Proportion:", (ci_lower, ci_upper))

95% Confidence Interval for Proportion: (0.6081925393809212, 0.6918074606190788)


##### A researcher is testing the effectiveness of two different teaching methods on student performance. Sample A has a mean score of 85 with a standard deviation of 6, while sample B has a mean score of 82 with a standard deviation of 5. Conduct a hypothesis test to determine if the two teaching methods have a significant difference in student performance using a t-test with a significance level of 0.01.

In [12]:
from scipy.stats import t

mean_a = 85
std_a = 6
n_a = 30
mean_b = 82
std_b = 5
n_b = 30
confidence_level = 0.99

# Pooled standard deviation
pooled_std_dev = (((n_a - 1) * std_a ** 2 + (n_b - 1) * std_b ** 2) / (n_a + n_b - 2)) ** 0.5
t_statistic = (mean_a - mean_b) / (pooled_std_dev * ((1 / n_a + 1 / n_b) ** 0.5))
t_critical = t.ppf(1 - (1 - confidence_level) / 2, df=n_a + n_b - 2)
print("T-Statistic:", t_statistic)
print("Critical Value:", t_critical)

T-Statistic: 2.10386061995483
Critical Value: 2.6632869538098674


##### A population has a mean of 60 and a standard deviation of 8. A sample of 50 observations has a mean of 65. Calculate the 90% confidence interval for the true population mean.

In [13]:
population_mean = 60
sample_mean = 65
std_dev = 8
n = 50
confidence_level = 0.90

# Calculate confidence interval
z_score = stats.norm.ppf(1 - (1 - confidence_level) / 2)
margin_error = z_score * (std_dev / (n ** 0.5))
ci_lower = sample_mean - margin_error
ci_upper = sample_mean + margin_error
print("90% Confidence Interval:", (ci_lower, ci_upper))

90% Confidence Interval: (63.13906055411732, 66.86093944588268)


##### In a study of the effects of caffeine on reaction time, a sample of 30 participants had an average reaction time of 0.25 seconds with a standard deviation of 0.05 seconds. Conduct a hypothesis test to determine if the caffeine has a significant effect on reaction time at a 90% confidence level using a t-test.

In [14]:
from scipy.stats import t

sample_mean = 0.25
population_mean = 0  # Assuming no effect is the null hypothesis
std_dev = 0.05
n = 30
confidence_level = 0.90

# Calculate t-score
t_statistic = (sample_mean - population_mean) / (std_dev / (n ** 0.5))
t_critical = t.ppf(1 - (1 - confidence_level) / 2, df=n - 1)
print("T-Statistic:", t_statistic)
print("Critical Value:", t_critical)

T-Statistic: 27.386127875258307
Critical Value: 1.6991270265334972
