Q1: What is the difference between a t-test and a z-test? Provide an example scenario where you would 
use each type of test.

Solution:

The main difference between a t-test and a z-test is that a t-test is used when the sample size is small (n < 30) or when the population standard deviation is unknown, while a z-test is used when the sample size is large (n > 30) and the population standard deviation is known12.

For example, suppose we want to test whether the mean height of students in a particular school is equal to 5 feet. If we have data on the heights of all students in the school, we can use a z-test since we know the population standard deviation. However, if we only have data on a small sample of students, we would use a t-test since we don’t know the population standard deviation

Q2: Differentiate between one-tailed and two-tailed tests

Solution:

A one-tailed test is a statistical hypothesis test in which the alternative hypothesis is directional, meaning that it specifies either an increase or a decrease in the parameter being tested. For example, suppose we want to test whether a new drug is more effective than an existing drug. In this case, the alternative hypothesis would be that the new drug is more effective than the existing drug.

On the other hand, a two-tailed test is a statistical hypothesis test in which the alternative hypothesis is non-directional, meaning that it specifies only that the parameter being tested is not equal to a certain value. For example, suppose we want to test whether the mean weight of apples is different from 100 grams. In this case, the alternative hypothesis would be that the mean weight of apples is not equal to 100 grams.

Q3: Explain the concept of Type 1 and Type 2 errors in hypothesis testing. Provide an example scenario for 
each type of error.

Solution:

In hypothesis testing, a Type I error occurs when a null hypothesis is rejected when it is actually true. This is also known as a false positive. For example, suppose a medical test is designed to detect a certain disease. A Type I error would occur if the test indicates that the patient has the disease when they actually do not.

On the other hand, a Type II error occurs when a null hypothesis is not rejected when it is actually false. This is also known as a false negative. For example, suppose a medical test is designed to detect a certain disease. A Type II error would occur if the test indicates that the patient does not have the disease when they actually do.

The probability of making a Type I error is denoted by alpha (α), while the probability of making a Type II error is denoted by beta (β). The power of a statistical test is defined as  1 - β and represents the probability of correctly rejecting the null hypothesis when it is actually false.

Q4:  Explain Bayes's theorem with an example.

Solution:

Bayes’ theorem states that the probability of an event occurring given that another event has occurred is equal to the probability of both events occurring divided by the probability of the second event occurring

Suppose you are a doctor and you have a patient who has tested positive for a certain disease. The test is known to be 99% accurate, meaning that if someone has the disease, there is a 99% chance that the test will be positive. However, if someone does not have the disease, there is still a 1% chance that the test will be positive. The prevalence of the disease in the population is 1 in 1000 people.

Using Bayes’ theorem, we can calculate the probability that the patient actually has the disease given that they have tested positive. Let A be the event that the patient has the disease and B be the event that they test positive. Then we want to calculate P(A|B), which is the probability that the patient has the disease given that they have tested positive.

Using Bayes’ theorem, we have:

P(A|B) = P(B|A) * P(A) / P(B)

where P(B|A) is the probability of testing positive given that the patient has the disease (which is 0.99), P(A) is the prevalence of the disease (which is 0.001), and P(B) is the probability of testing positive (which can be calculated using the law of total probability as P(B|A) * P(A) + P(B|not A) * P(not A), where not A means “does not have the disease”).

Plugging in these values, we get:

P(A|B) = 0.99 * 0.001 / (0.99 * 0.001 + 0.01 * 0.999) ≈ 0.0909

Q5: What is a confidence interval? How to calculate the confidence interval, explain with an example.

Solution:

A confidence interval is a range of values that is likely to contain the true value of a population parameter with a certain level of confidence.

CI = X̄ ± Z * (S / √n)

where CI is the confidence interval, X̄ is the sample mean, Z is the Z-value you get from the normal standard distribution (which corresponds to your desired level of confidence), S is the population standard deviation, and n is the sample size.



In [5]:
from scipy import stats
import math

In [4]:
# suppose you want to estimate the average height of all students in a school. 

n = 100

sam_mean = 65

sam_std = 3

sig_level = 0.05

In [6]:
z = stats.norm.ppf(1-sig_level/2)

In [7]:
z

1.959963984540054

In [15]:
def estimate(x,std,n):  # x : mean , std : sample standard deviation , n : sample size
    
    p_std = std/math.sqrt(n)
    l_bound = x - z * p_std
    u_bound = x + z * p_std
    
    return (l_bound,u_bound)

In [16]:
estimate(sam_mean , sam_std , n)

(49.02001800772997, 50.97998199227003)

Q6. Use Bayes' Theorem to calculate the probability of an event occurring given prior knowledge of the 
event's probability and new evidence. Provide a sample problem and solution.

Solution:

Suppose you are a doctor and you have a patient who has tested positive for a certain disease. The test is known to be 99% accurate, meaning that if someone has the disease, there is a 99% chance that the test will be positive. However, if someone does not have the disease, there is still a 1% chance that the test will be positive. The prevalence of the disease in the population is 1 in 1000 people.

Using Bayes’ theorem, we can calculate the probability that the patient actually has the disease given that they have tested positive. Let A be the event that the patient has the disease and B be the event that they test positive. Then we want to calculate P(A|B), which is the probability that the patient has the disease given that they have tested positive.

Using Bayes’ theorem, we have:

P(A|B) = P(B|A) * P(A) / P(B)

where P(B|A) is the probability of testing positive given that the patient has the disease (which is 0.99), P(A) is the prevalence of the disease (which is 0.001), and P(B) is the probability of testing positive (which can be calculated using the law of total probability as P(B|A) * P(A) + P(B|not A) * P(not A), where not A means “does not have the disease”).

Plugging in these values, we get:

P(A|B) = 0.99 * 0.001 / (0.99 * 0.001 + 0.01 * 0.999) ≈ 0.0909

Q7. Calculate the 95% confidence interval for a sample of data with a mean of 50 and a standard deviation 
of 5. Interpret the results.

Solution:

In [17]:
sig_level = 0.05
sam_mean = 50
sam_std = 5
n = 100

In [18]:
z = stats.norm.ppf(1-sig_level/2)

In [19]:
z

1.959963984540054

In [20]:
def estimate(x,std,n):  # x : mean , std : sample standard deviation , n : sample size
    
    p_std = std/math.sqrt(n)
    l_bound = x - z * p_std
    u_bound = x + z * p_std
    
    return (l_bound,u_bound)

In [21]:
estimate(sam_mean , sam_std , n)

(49.02001800772997, 50.97998199227003)

Q8. What is the margin of error in a confidence interval? How does sample size affect the margin of error? 
Provide an example of a scenario where a larger sample size would result in a smaller margin of error

Solution:

The margin of error is the range of values below and above the sample statistic in a confidence interval. The margin of error is equal to half the width of the entire confidence interval.

Sample size affects the margin of error in that as sample size increases, the margin of error decreases.

For example, suppose we want to estimate the proportion of people in a city who support a particular policy. We take a random sample of 100 people and find that 60% support the policy. We want to construct a 95% confidence interval for the true proportion of people who support the policy. Using the formula for a confidence interval for a proportion, we get:

95% confidence interval = 0.60 ± 1.96 * sqrt(0.60 * (1 - 0.60) / 100)

95% confidence interval = [0.50, 0.70]

The margin of error for this confidence interval is equal to half the width, which would be (0.70 - 0.50) / 2 = 0.101.

Suppose we now take a random sample of 500 people instead and find that 60% support the policy again. Using the same formula as before, we get:

95% confidence interval = 0.60 ± 1.96 * sqrt(0.60 * (1 - 0.60) / 500)

95% confidence interval = [0.55, 0.65]

The margin of error for this confidence interval is equal to half the width, which would be (0.65 - 0.55) / 2 = 0.053.

As you can see from this example, increasing the sample size from n=100 to n=500 resulted in a smaller margin of error (from 0.10 to 0.05). 


Q9. Calculate the z-score for a data point with a value of 75, a population mean of 70, and a population 
standard deviation of 5. Interpret the results.

Solution:

In [22]:
pop_mean = 70
x = 75
pop_std = 5

In [23]:
z = (x - pop_mean)/pop_std

In [24]:
z

1.0

Q10. In a study of the effectiveness of a new weight loss drug, a sample of 50 participants lost an average 
of 6 pounds with a standard deviation of 2.5 pounds. Conduct a hypothesis test to determine if the drug is 
significantly effective at a 95% confidence level using a t-test.

Solution:

In [25]:
# H0 : drug is not effective

# H1 : drug is effective

In [37]:
sam_size = 50
sam_mean = 6
sam_std = 2.5

sig_level = 0.05

In [45]:
critical_value = stats.t.ppf(q = 1-sig_level/2 , df = sam_size-1)

In [46]:
critical_value

2.009575234489209

In [40]:
t =   (6-0)/(sam_std/(sam_size**0.5))

In [41]:
t

16.970562748477143

In [42]:
if t > critical_value or t < -critical_value:
    print("null hypothesis is rejected")
    
else:
    print("fail to reject null hypothesis")

null hypothesis is rejected


Q11. In a survey of 500 people, 65% reported being satisfied with their current job. Calculate the 95% 
confidence interval for the true proportion of people who are satisfied with their job.

Solution:

In [47]:
sam_size = 500
sig_level = 0.05

In [54]:
lower_bound = 0.65 - 1.96* math.sqrt(0.65 * (1-0.65) / (sam_size))

In [55]:
lower_bound

0.608191771144905

In [56]:
upper_bound = 0.65 + 1.96* math.sqrt(0.65 * (1-0.65) / (sam_size))

In [57]:
upper_bound

0.6918082288550951

In [58]:
confidence_interval = (lower_bound , upper_bound)

In [59]:
confidence_interval

(0.608191771144905, 0.6918082288550951)

Q12. A researcher is testing the effectiveness of two different teaching methods on student performance. 
Sample A has a mean score of 85 with a standard deviation of 6, while sample B has a mean score of 82 
with a standard deviation of 5. Conduct a hypothesis test to determine if the two teaching methods have a 
significant difference in student performance using a t-test with a significance level of 0.01.

Solution:

In [60]:
meanA = 85
stdA = 6

meanB = 82
stdB = 5

sig_level = 0.01

In [61]:
t = (meanA-meanB) / math.sqrt((stdA**2 / 50) + (stdB**2 / 50))

In [62]:
t

2.716072381275556

In [74]:
critical_value = stats.t.ppf(q = 1-sig_level/2 , df = 98)

In [75]:
critical_value

2.626931094814024

In [76]:
if t > critical_value or t < -critical_value:
    print("null hypthesis rejected")
    
else:
    print("Fail to reject null hypothesis")

null hypthesis rejected


Q13. A population has a mean of 60 and a standard deviation of 8. A sample of 50 observations has a mean 
of 65. Calculate the 90% confidence interval for the true population mean

Solution:

In [77]:
pop_mean = 60
pop_std = 8

sam_size = 50
sam_mean = 65
sig_level = 0.1

In [78]:
z = stats.norm.ppf(1-sig_level/2)

In [79]:
z

1.6448536269514722

In [80]:
lower_bound = sam_mean - z * (pop_std / math.sqrt(sam_size))

In [81]:
lower_bound

63.13906055411732

In [82]:
upper_bound = sam_mean + z * (pop_std / math.sqrt(sam_size))

In [83]:
upper_bound

66.86093944588268

In [86]:
CI = [lower_bound , upper_bound]

In [87]:
CI

[63.13906055411732, 66.86093944588268]

Q14. In a study of the effects of caffeine on reaction time, a sample of 30 participants had an average 
reaction time of 0.25 seconds with a standard deviation of 0.05 seconds. Conduct a hypothesis test to 
determine if the caffeine has a significant effect on reaction time at a 90% confidence level using a t-test.

Solution:

In [89]:
sam_size = 30
sam_mean = 0.25

sam_std = 0.05

sig_level = 0.1

In [91]:
critical_value = stats.t.ppf(q=1-sig_level/2 , df = 29)

In [92]:
critical_value

1.6991270265334972

In [93]:
t = (sam_mean - 0) / (sam_std / math.sqrt(sam_size))

In [94]:
t

27.386127875258307