Q1: What is the difference between a t-test and a z-test? Provide an example scenario where you would
use each type of test.

The main difference between a t-test and a z-test is that t-tests are used when the sample size is small or when the population variance is unknown, while z-tests are used when the sample size is large and the population variance is known. For example, if a researcher wants to compare the means of two small groups (less than 30 samples) with unknown variances, they would use a t-test. However, if they are comparing the means of two large groups (greater than 30 samples) with known variances, they would use a z-test.

Q2: Differentiate between one-tailed and two-tailed tests.

One-tailed tests are used when the researcher is only interested in one direction of the hypothesis, while two-tailed tests are used when the researcher is interested in both directions of the hypothesis. In a one-tailed test, the null hypothesis is only rejected if the sample statistic falls in the extreme tail of the distribution, either above or below the mean. In a two-tailed test, the null hypothesis is rejected if the sample statistic falls in either tail of the distribution

Q3: Explain the concept of Type 1 and Type 2 errors in hypothesis testing. Provide an example scenario for
each type of error.

Type 1 error occurs when the null hypothesis is rejected even though it is actually true. This is a false positive result. Type 2 error occurs when the null hypothesis is not rejected even though it is false. This is a false negative result. For example, in a medical study, a Type 1 error would be when a drug is approved for use when it actually has harmful side effects. A Type 2 error would be when a drug is not approved for use when it is actually safe and effective.

Q4: Explain Bayes's theorem with an example.

Bayes's theorem is a mathematical formula that describes the probability of an event based on prior knowledge or conditions that might be related to the event. It is used to update the probability of a hypothesis as new evidence is obtained. Bayes's theorem can be written as P(A|B) = P(B|A) x P(A) / P(B), where A and B are events, P(A) is the prior probability of A, P(B|A) is the conditional probability of B given A, P(B) is the prior probability of B, and P(A|B) is the updated probability of A given B.

For example, suppose a doctor knows that a particular disease affects 1% of the population, and a new patient comes in with symptoms that are common to the disease. The doctor performs a test that has a false positive rate of 5%, meaning that 5% of healthy people will test positive for the disease. The doctor can use Bayes's theorem to calculate the probability that the patient actually has the disease given the test result

Q5: What is a confidence interval? How to calculate the confidence interval, explain with an example.

A confidence interval is a range of values that is likely to contain an unknown population parameter. The confidence level is the percentage of time that the confidence interval will contain the true parameter value. For example, a 95% confidence interval means that there is a 95% chance that the true parameter value is within the interval.

To calculate a confidence interval, you need to know the following:

The sample size
The sample mean
The sample standard deviation
The confidence level
Once you have this information, you can use the following formula to calculate the confidence interval:
Confidence Interval = (Sample Mean ± (Z * Standard Error of the Mean))
Where:
Z is the Z-score for the desired confidence level
Standard Error of the Mean = (Sample Standard Deviation / Square Root of the Sample Size)
For example, let's say you want to calculate a 95% confidence interval for the mean height of all adult males in the United States. You randomly sample 100 adult males and find that their mean height is 6 feet tall, with a standard deviation of 3 inches. The Z-score for a 95% confidence interval is 1.96. Therefore, the 95% confidence interval for the mean height of all adult males in the United States is
(6 feet ± (1.96 * 3 inches))
(5.04 feet to 6.96 feet)
This means that we are 95% confident that the true mean height of all adult males in the United States is within the range of 5.04 feet to 6.96 feet.
Confidence intervals are a useful tool for estimating unknown population parameters. They allow us to quantify the uncertainty in our estimates and to make more informed decisions about our research.

Q6. Use Bayes' Theorem to calculate the probability of an event occurring given prior knowledge of the
event's probability and new evidence. Provide a sample problem and solution.

Suppose a company manufactures a certain type of product that has a defect rate of 2%. The company uses a quality control process that is 95% effective in catching defective products, and 99% effective in passing non-defective products. If a product passes the quality control process, what is the probability that it is actually defect-free?
To solve this problem using Bayes' theorem, we first define the following events:
D: the product is defective
~D: the product is not defective
P: the product passes the quality control process
Solution
P(~D|P) = P(P|~D) * P(~D) / P(P)
P(P) = P(P|D) * P(D) + P(P|~D) * P(~D)

P(D) = 0.02
P(~D) = 0.98
P(P|D) = 0.05
P(P|~D) = 0.99

P(P) = P(P|D) * P(D) + P(P|~D) * P(~D)
     = 0.05 * 0.02 + 0.99 * 0.98
     = 0.9756

P(~D|P) = P(P|~D) * P(~D) / P(P)
        = 0.99 * 0.98 / 0.9756
        = 0.9990
Therefore, the probability that a product is not defective given that it passes the quality control process is approximately 0.9990, or 99.90%. This result indicates that if a product passes the quality control process, it is highly likely to be defect-free. However, it is important to note that no quality control process is perfect, and some defective products may still slip through.

In [3]:
# Solution for above with python

# Define the prior probabilities and conditional probabilities
p_defect = 0.02
p_pass_given_defect = 0.05
p_not_defect = 0.98
p_pass_given_not_defect = 0.99

# Calculate the total probability of passing the quality control process
p_pass = p_pass_given_defect * p_defect + p_pass_given_not_defect * p_not_defect

# Calculate the probability of the product being defect-free given it passed the quality control process
p_not_defect_given_pass = (p_pass_given_not_defect * p_not_defect) / p_pass

# Print the result
print(f"The probability of the product being defect-free given it passed the quality control process is {p_not_defect_given_pass:.4f}")

The probability of the product being defect-free given it passed the quality control process is 0.9990


Q7. Calculate the 95% confidence interval for a sample of data with a mean of 50 and a standard deviation
of 5. Interpret the results.

Sample Size is not provided in above Question , I am assuming sample size of 50

In [5]:
import math
import scipy.stats as stats

# Define the sample mean, standard deviation, and sample size
sample_mean = 50
sample_std_dev = 5
sample_size = 50

# Signficance Level
alpha = 0.05

# Calculate the critical value for a 95% confidence interval
t_crit = stats.t.ppf(1-alpha/2, df=sample_size-1)
print(f't statistic for significance level {alpha} , sample size of {sample_size} is : {t_crit:.2f}')

# Calculate the standard error
std_error = sample_std_dev / math.sqrt(sample_size)

# Calculate the lower and upper bounds of the confidence interval
lower_bound = sample_mean - t_crit * std_error
upper_bound = sample_mean + t_crit * std_error

# Print the results
print(f" The 95% confidence interval is ({lower_bound:.2f}, {upper_bound:.2f})")


t statistic for significance level 0.05 , sample size of 50 is : 2.01
 The 95% confidence interval is (48.58, 51.42)


Above results mean that I can say with 95% condfidence that the POPULATION MEAN Lies in between 48.58 and 51.42

Q8. What is the margin of error in a confidence interval? How does sample size affect the margin of error?
Provide an example of a scenario where a larger sample size would result in a smaller margin of error.

The margin of error in a confidence interval is the range of values around the sample statistic (such as the sample mean) within which we expect the true population parameter (such as the population mean) to lie with a certain level of confidence. The margin of error is affected by the sample size, level of confidence, and standard deviation of the population.
As the sample size increases, the margin of error decreases because larger samples provide more information about the population, and hence the sample statistics are more likely to be representative of the population. The decrease in margin of error with an increase in sample size follows the square root law, which states that the margin of error decreases by the square root of the sample size.
For example, suppose we want to estimate the average height of all students in a university using a random sample of students. We take a sample of 50 students and calculate their average height to be 170 cm with a standard deviation of 5 cm. We want to calculate a 95% confidence interval for the true population mean height.
Using the formula for the margin of error for a confidence interval, we get:

Margin of error = z * (standard deviation / sqrt(sample size))
For a 95% confidence interval, the critical z-value is 1.96. So, plugging in the values, we get:

Margin of error = 1.96 * (5 / sqrt(50)) = 1.38 cm
This means that we are 95% confident that the true population mean height is within 1.38 cm of our sample mean of 170 cm. If we had taken a larger sample size of 200 students, the margin of error would have been:

Margin of error = 1.96 * (5 / sqrt(200)) = 0.98 cm
So, a larger sample size would result in a smaller margin of error, making our estimate of the true population mean height more precise.

Q9. Calculate the z-score for a data point with a value of 75, a population mean of 70, and a population
standard deviation of 5. Interpret the results.

In [6]:
# Define the values
x = 75  # Data point value
mu = 70  # Population mean
sigma = 5  # Population standard deviation

# Calculate the z-score
z = (x - mu) / sigma

print(f"The z-score is: {z}")

The z-score is: 1.0


This means that the data point is 1 standard deviation above the population mean. A positive z-score indicates that the data point is above the mean, while a negative z-score indicates that it is below the mean. The magnitude of the z-score tells us how far away the data point is from the mean in terms of standard deviations.

In [7]:
from scipy.stats import norm

# Calculate the percentage of values below a z-score of 1
percent_below = norm.cdf(z) * 100

print(f"The percentage of values below a z-score of 1 is: {percent_below:.2f}%")

The percentage of values below a z-score of 1 is: 84.13%


A z-score of 1 corresponds to approximately 84.13% of values being below this value in a standard normal distribution.

Q10. In a study of the effectiveness of a new weight loss drug, a sample of 50 participants lost an average
of 6 pounds with a standard deviation of 2.5 pounds. Conduct a hypothesis test to determine if the drug is
significantly effective at a 95% confidence level using a t-test.Q10. In a study of the effectiveness of a new weight loss drug, a sample of 50 participants lost an average
of 6 pounds with a standard deviation of 2.5 pounds. Conduct a hypothesis test to determine if the drug is
significantly effective at a 95% confidence level using a t-test.

In [8]:
import numpy as np
from scipy.stats import t

# Null hypothesis: the drug is not significantly effective
# Alternative hypothesis: the drug is significantly effective
alpha = 0.05  # significance level
null_hypothesis = "the drug is not significantly effective"
alternative_hypothesis = "the drug is significantly effective"

mu = 0
sample_mean = 6
sample_std = 2.5
n = 50
df = n - 1  # degrees of freedom

# Calculate the t-score and p-value
t_score = (sample_mean - mu) / (sample_std / np.sqrt(n))
p_value = 2 * (1 - t.cdf(abs(t_score), df))

# Compare p-value with alpha and make a conclusion
print(f"t-statistic: {t_score:.4f}")
print(f"p-value: {p_value}")
if p_value < alpha:
    print(f"Reject the null hypothesis. {alternative_hypothesis}.")
else:
    print(f"Fail to reject the null hypothesis. {null_hypothesis}.")

t-statistic: 16.9706
p-value: 0.0
Reject the null hypothesis. the drug is significantly effective.


Q11. In a survey of 500 people, 65% reported being satisfied with their current job. Calculate the 95%
confidence interval for the true proportion of people who are satisfied with their job.

In [9]:
import math

p = 0.65  # sample proportion
n = 500  # sample size
z_alpha = 1.96  # z-score for 95% confidence level

# Calculate the standard error
se = math.sqrt((p * (1 - p)) / n)

# Calculate the margin of error
me = z_alpha * se

# Calculate the confidence interval
lower_bound = p - me
upper_bound = p + me

# Print the results
print(f"The 95% confidence interval for the proportion of people who are satisfied with their job is ({lower_bound*100:.2f}%, {upper_bound*100:.2f}%)")

The 95% confidence interval for the proportion of people who are satisfied with their job is (60.82%, 69.18%)


Q12. A researcher is testing the effectiveness of two different teaching methods on student performance.
Sample A has a mean score of 85 with a standard deviation of 6, while sample B has a mean score of 82
with a standard deviation of 5. Conduct a hypothesis test to determine if the two teaching methods have a
significant difference in student performance using a t-test with a significance level of 0.01.

In [10]:
import numpy as np
from scipy.stats import t

# Sample A
x1 = 85
s1 = 6
n1 = 30

# Sample B
x2 = 82
s2 = 5
n2 = 30

# Set up null and alternative hypotheses
# H0: mu1 = mu2 (the means are equal)
# Ha: mu1 != mu2 (the means are not equal)
alpha = 0.01
null_hypothesis = "mu1 = mu2 (the means are equal)"
alternative_hypothesis = "mu1 != mu2 (the means are not equal)"

# Calculate pooled standard deviation
sp = np.sqrt(((n1 - 1) * s1 ** 2 + (n2 - 1) * s2 ** 2) / (n1 + n2 - 2))

# Calculate t-statistic
t_stat = (x1 - x2) / (sp * np.sqrt(1 / n1 + 1 / n2))

# Calculate degrees of freedom
df = n1 + n2 - 2

# Calculate critical t-value
t_crit = t.ppf(1-alpha / 2, df)

# Calculate p-value
p_value = 2 * (1 - t.cdf(abs(t_stat), df))

# Print results
print(f"t-statistic: {t_stat:.3f}")
print(f"Degrees of freedom: {df}")
print(f"Critical t-value: {t_crit:.3f}")
print(f"p-value: {p_value:.3f}")

if abs(t_stat) > t_crit:
    print("Reject null hypothesis.")
    print(alternative_hypothesis)
else:
    print("Fail to reject null hypothesis.")
    print(null_hypothesis)

t-statistic: 2.104
Degrees of freedom: 58
Critical t-value: 2.663
p-value: 0.040
Fail to reject null hypothesis.
mu1 = mu2 (the means are equal)


Q13. A population has a mean of 60 and a standard deviation of 8. A sample of 50 observations has a mean
of 65. Calculate the 90% confidence interval for the true population mean.

In [11]:
import math
from scipy.stats import t

pop_mean = 60
pop_std = 8
n = 50
sample_mean = 65
confidence_level = 0.90

# Calculate the t-score
t_score = (sample_mean - pop_mean) / (pop_std / math.sqrt(n))

# Find the critical t-value
df = n - 1
t_crit = t.ppf((1+confidence_level)/2, df)
print(f't statistic for {n} samples with {confidence_level*100:.0f}% confidence is : {t_crit:.4f}')

# Calculate the margin of error
margin_of_error = t_crit * (pop_std / math.sqrt(n))

# Calculate the confidence interval
lower_bound = sample_mean - margin_of_error
upper_bound = sample_mean + margin_of_error

# Print the results
print(f"The {confidence_level*100:.0f}% confidence interval for the true population mean is ({lower_bound:.3f}, {upper_bound:.3f})")


t statistic for 50 samples with 90% confidence is : 1.6766
The 90% confidence interval for the true population mean is (63.103, 66.897)


Q14. In a study of the effects of caffeine on reaction time, a sample of 30 participants had an average
reaction time of 0.25 seconds with a standard deviation of 0.05 seconds. Conduct a hypothesis test to
determine if the caffeine has a significant effect on reaction time at a 90% confidence level using a t-test.

In this example popluation mean is not provided
According to researchers typical human reaction time is 200ms to 300ms
Here i have assumed the population mean as 300ms or 0.3 seconds

In [13]:
import math
import scipy.stats as stats

# Null Hypothesis: Caffeine has no significant effect on reaction time
# Alternative Hypothesis: Caffeine has a significant effect on reaction time
null_hypothesis = "Caffeine has no significant effect on reaction time"
alternative_hypothesis = "Caffeine has a significant effect on reaction time"

sample_mean = 0.25  # sample mean
sample_std_dev = 0.05  # sample standard deviation
n = 30  # sample size
pop_mean = 0.3  # population mean under null hypothesis
alpha = 0.1  # significance level

# Calculate the t-statistic
t_stat = (sample_mean - pop_mean) / (sample_std_dev / math.sqrt(n))

# Calculate the critical t-value
t_crit = stats.t.ppf(1-alpha/2, n-1)

# Calculate the confidence interval
margin_of_error = t_crit * (sample_std_dev / math.sqrt(n))
lower_ci = sample_mean - margin_of_error
upper_ci = sample_mean + margin_of_error

# Print the results
print(f"t-statistic: {t_stat:.3f}")
print(f"t-critical value: {t_crit:.3f}")
print(f"90% Confidence Interval: ({lower_ci:.3f}, {upper_ci:.3f})")

# Determine if the null hypothesis should be rejected or not
if abs(t_stat) > t_crit:
    print("Reject the null hypothesis")
    print(alternative_hypothesis)
else:
    print("Fail to reject the null hypothesis")
    print(null_hypothesis)

t-statistic: -5.477
t-critical value: 1.699
90% Confidence Interval: (0.234, 0.266)
Reject the null hypothesis
Caffeine has a significant effect on reaction time
