Q1: What is the difference between a t-test and a z-test? Provide an example scenario where you would use each type of test.

A t-test is used when the sample size is small (typically less than 30) and when the population standard deviation is unknown. 

Example scenario for t-test: A company wants to test whether a new training program improves the productivity of its employees. The company randomly selects a sample of 20 employees and measures their productivity before and after the training program. The null hypothesis is that the mean productivity before the training program is the same as the mean productivity after the training program. A two-sample t-test can be used to test this hypothesis.

A z-test, on the other hand, is used when the sample size is large (typically greater than 30) and when the population standard deviation is known.

Example scenario for z-test: A researcher wants to test whether a new drug reduces blood pressure. The researcher selects a sample of 100 individuals with high blood pressure and administers the drug to them. The null hypothesis is that the mean blood pressure of the population is the same before and after taking the drug. Since the sample size is large and the population standard deviation is known, a z-test can be used to test this hypothesis.

Q2: Differentiate between one-tailed and two-tailed tests.

In a one-tailed test, the null hypothesis is tested against an alternative hypothesis that the population parameter lies either above or below a certain value, depending on the direction specified in the hypothesis.

In a two-tailed test, the null hypothesis is tested against an alternative hypothesis that the population parameter is not equal to a certain value. 

Q3: Explain the concept of Type 1 and Type 2 errors in hypothesis testing. Provide an example scenario for each type of error.

Type I error occurs when the null hypothesis is rejected even though it is true. It is also called a false positive error. 

For example, a pharmaceutical company wants to test whether a new drug is more effective than an existing drug for treating a particular condition. The null hypothesis is that the new drug is no more effective than the existing drug. If the company rejects the null hypothesis and concludes that the new drug is more effective, but in reality, there is no significant difference, this is a Type I error.

Type II error occurs when the null hypothesis is accepted even though it is false. It is also called a false negative error.

For example, a researcher wants to test whether a new teaching method improves students' exam scores. The null hypothesis is that the new teaching method does not improve scores. If the researcher accepts the null hypothesis and concludes that there is no significant difference in scores, but in reality, the new teaching method does improve scores, this is a Type II error.

Q4: Explain Bayes's theorem with an example.

In [1]:
import numpy as np

# prior probabilities
prior = np.array([0.5, 0.5])  # uniform prior distribution

# likelihoods
likelihood = np.array([0.8, 0.2])  # 80 out of 100 are Type A, 20 out of 100 are Type B

# evidence
evidence = np.sum(likelihood * prior)  # probability of observing the evidence

# posterior probabilities
posterior = likelihood * prior / evidence

print("Posterior probabilities:", posterior)

Posterior probabilities: [0.8 0.2]


Q5: What is a confidence interval? How to calculate the confidence interval, explain with an example.

A confidence interval is a range of values that is likely to contain the true value of a population parameter with a specified degree of confidence.

In [2]:
import scipy.stats as stats
import numpy as np

# sample statistics
n = 100
X_bar = 68
sigma = 3

# confidence level
alpha = 0.05
z_alpha_2 = stats.norm.ppf(1 - alpha/2)

# confidence interval
CI = X_bar + z_alpha_2 * sigma / np.sqrt(n)
print("Lower bound of the 95% confidence interval:", X_bar - CI)
print("Upper bound of the 95% confidence interval:", X_bar + CI)

Lower bound of the 95% confidence interval: -0.5879891953620131
Upper bound of the 95% confidence interval: 136.587989195362


Q6. Use Bayes' Theorem to calculate the probability of an event occurring given prior knowledge of the event's probability and new evidence. Provide a sample problem and solution.

In [3]:
# prior probabilities
p_D = 0.01
p_notD = 0.99

# likelihoods
p_pos_given_D = 0.95
p_neg_given_notD = 0.95

# total probability of testing positive
p_pos = p_pos_given_D * p_D + (1 - p_neg_given_notD) * p_notD

# posterior probability of having the disease given testing positive
p_D_given_pos = p_pos_given_D * p_D / p_pos

print("The probability of having the disease given testing positive is:", p_D_given_pos)

The probability of having the disease given testing positive is: 0.16101694915254225


Q7. Calculate the 95% confidence interval for a sample of data with a mean of 50 and a standard deviation of 5. Interpret the results.

In [1]:
import math

mean = 50
std_dev = 5
sample_size = 100 # let's assume the sample size is 100

lower_bound = mean - 1.96 * (std_dev / math.sqrt(sample_size))
upper_bound = mean + 1.96 * (std_dev / math.sqrt(sample_size))

print(f"The 95% confidence interval is ({lower_bound:.2f}, {upper_bound:.2f})")

The 95% confidence interval is (49.02, 50.98)


Q8. What is the margin of error in a confidence interval? How does sample size affect the margin of error? Provide an example of a scenario where a larger sample size would result in a smaller margin of error.

The margin of error is the maximum amount by which the sample estimate might differ from the true population parameter at a given level of confidence. 

The sample size affects the margin of error in the following way: as the sample size increases, the margin of error decreases. 

In [2]:
import random
import statistics

# generate a population of 10,000 values with mean=50 and standard deviation=5
population = [random.normalvariate(50, 5) for _ in range(10000)]

# take a sample of size 100 and calculate its mean and standard deviation
sample = random.sample(population, 100)
sample_mean = statistics.mean(sample)
sample_std_dev = statistics.stdev(sample)

# calculate the margin of error for a 95% confidence interval
z_score = 1.96 # critical value for 95% confidence interval
standard_error = sample_std_dev / math.sqrt(100)
margin_of_error = z_score * standard_error

print(f"The sample mean is {sample_mean:.2f} with a margin of error of {margin_of_error:.2f}")


The sample mean is 50.02 with a margin of error of 0.90


Q9. Calculate the z-score for a data point with a value of 75, a population mean of 70, and a population standard deviation of 5. Interpret the results.

To calculate the z-score, we use the following formula:

z = (x - μ) / σ

where x is the value of the data point, μ is the population mean, and σ is the population standard deviation.

Substituting the given values, we get:

z = (75 - 70) / 5

z = 1

Therefore, the z-score for a data point with a value of 75, a population mean of 70, and a population standard deviation of 5 is 1.

Q10. In a study of the effectiveness of a new weight loss drug, a sample of 50 participants lost an averagevof 6 pounds with a standard deviation of 2.5 pounds. Conduct a hypothesis test to determine if the drug is significantly effective at a 95% confidence level using a t-test.

In [6]:
import scipy.stats as stats

# Sample statistics
n = 50
x_bar = 6
s = 2.5

# Hypotheses
mu_0 = 0
alpha = 0.05

# Calculate the t-value
t_value = (x_bar - mu_0) / (s / (n ** 0.5))

# Calculate the degrees of freedom
df = n - 1

# Calculate the critical t-value
t_crit = stats.t.ppf(alpha/2, df), stats.t.ppf(1-alpha/2, df)

# Decision
if t_value < t_crit[0] or t_value > t_crit[1]:
    print("Reject the null hypothesis.")
else:
    print("Fail to reject the null hypothesis.")

Reject the null hypothesis.


Q11. In a survey of 500 people, 65% reported being satisfied with their current job. Calculate the 95% confidence interval for the true proportion of people who are satisfied with their job.

In [7]:
import math

# Sample statistics
p = 0.65
n = 500
z = 1.96  # for a 95% confidence level

# Calculate the standard error
se = math.sqrt((p * (1 - p)) / n)

# Calculate the confidence interval
lower_ci = p - z * se
upper_ci = p + z * se

# Print the result
print("The 95% confidence interval for the true proportion of people who are satisfied with their job is ({:.4f}, {:.4f}).".format(lower_ci, upper_ci))

The 95% confidence interval for the true proportion of people who are satisfied with their job is (0.6082, 0.6918).


Q12. A researcher is testing the effectiveness of two different teaching methods on student performance. Sample A has a mean score of 85 with a standard deviation of 6, while sample B has a mean score of 82 with a standard deviation of 5. Conduct a hypothesis test to determine if the two teaching methods have a significant difference in student performance using a t-test with a significance level of 0.01.

In [8]:
import scipy.stats as stats

# Sample statistics
n_A = 50
x_bar_A = 85
s_A = 6

n_B = 50
x_bar_B = 82
s_B = 5

# Hypotheses
alpha = 0.01

# Calculate the pooled standard deviation
s_pooled = ((n_A - 1) * s_A ** 2 + (n_B - 1) * s_B ** 2) / (n_A + n_B - 2)
s_pooled = s_pooled ** 0.5

# Calculate the t-value
t_value = (x_bar_A - x_bar_B) / (s_pooled * (1/n_A + 1/n_B) ** 0.5)

# Calculate the degrees of freedom
df = n_A + n_B - 2

# Calculate the critical t-value
t_crit = stats.t.ppf(alpha/2, df), stats.t.ppf(1-alpha/2, df)

# Decision
if t_value < t_crit[0] or t_value > t_crit[1]:
    print("Reject the null hypothesis.")
else:
    print("Fail to reject the null hypothesis.")

Reject the null hypothesis.


Q13. A population has a mean of 60 and a standard deviation of 8. A sample of 50 observations has a mean of 65. Calculate the 90% confidence interval for the true population mean.

In [9]:
import scipy.stats as stats
import math

# Sample statistics
n = 50
x_bar = 65
mu = 60
sigma = 8

# Hypotheses
alpha = 0.1  # 90% confidence level

# Calculate the standard error
se = sigma / math.sqrt(n)

# Calculate the critical value
z_crit = stats.norm.ppf(1 - alpha/2)

# Calculate the confidence interval
lower_bound = x_bar - z_crit * se
upper_bound = x_bar + z_crit * se

# Print the result
print(f"The 90% confidence interval for the true population mean is ({lower_bound:.2f}, {upper_bound:.2f}).")

The 90% confidence interval for the true population mean is (63.14, 66.86).


Q14. In a study of the effects of caffeine on reaction time, a sample of 30 participants had an average reaction time of 0.25 seconds with a standard deviation of 0.05 seconds. Conduct a hypothesis test to determine if the caffeine has a significant effect on reaction time at a 90% confidence level using a t-test.

In [10]:
import scipy.stats as stats
import math

# Sample statistics
n1 = 30
x1 = 0.25
s1 = 0.05

# Assumed population mean and alpha level
mu = 0
alpha = 0.1  # 90% confidence level

# Degrees of freedom
df = n1 - 1

# Calculate the t-statistic
t = (x1 - mu) / (s1 / math.sqrt(n1))

# Calculate the critical value
t_crit = stats.t.ppf(alpha/2, df=df, loc=0, scale=1)

# Calculate the p-value
p_value = stats.t.sf(abs(t), df=df) * 2

# Print the results
print(f"t = {t:.2f}")
print(f"t critical = {t_crit:.2f}")
print(f"p-value = {p_value:.4f}")

if abs(t) > abs(t_crit):
    print("Reject the null hypothesis.")
else:
    print("Fail to reject the null hypothesis.")


t = 27.39
t critical = -1.70
p-value = 0.0000
Reject the null hypothesis.
