**Q1: What is the difference between a t-test and a z-test? Provide an example scenario where you would use each type of test.**

- **t-test:** Used to compare the means of two groups when the sample size is small (typically less than 30) and/or the population standard deviation is unknown. It assumes the data follow a normal distribution.

  **Example:** You could use a t-test to compare the mean exam scores of students who attended two different study programs (Program A vs. Program B) with 25 students in each group.

- **z-test:** Used to compare the means of two groups when the sample size is large (typically greater than 30) and/or the population standard deviation is known. It assumes the data follow a normal distribution.

  **Example:** You could use a z-test to compare the mean height of male and female students in a large university (e.g., a sample of 100 male students and 100 female students).

**Q2: Differentiate between one-tailed and two-tailed tests.**

- **One-tailed test:** A hypothesis test where the critical region (rejection region) is on only one side of the sampling distribution. It tests if a parameter is significantly greater than or less than a specified value.

  **Example:** Testing if a new drug increases reaction time compared to a control (one-tailed: greater than).

- **Two-tailed test:** A hypothesis test where the critical region is on both sides of the sampling distribution. It tests if a parameter is significantly different from a specified value, without specifying in which direction.

  **Example:** Testing if a new teaching method changes student performance (two-tailed: different from).

**Q3: Explain the concept of Type 1 and Type 2 errors in hypothesis testing. Provide an example scenario for each type of error.**

- **Type 1 error (α):** Rejecting the null hypothesis when it is actually true. It represents a false positive.

  **Example:** Concluding a new drug is effective (rejecting null) when it actually has no effect.

- **Type 2 error (β):** Failing to reject the null hypothesis when it is actually false. It represents a false negative.

  **Example:** Concluding a new drug has no effect (failing to reject null) when it actually is effective.

**Q4: Explain Bayes's theorem with an example.**

Bayes's theorem describes the probability of an event based on prior knowledge of conditions related to the event.

- **Formula:** \( P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)} \)

  - \( P(A|B) \): Probability of event A given that B is true.
  - \( P(B|A) \): Probability of event B given that A is true.
  - \( P(A) \) and \( P(B) \): Marginal probabilities of A and B.

**Example:**
  
  Suppose a test for a rare disease is 99% accurate (positive test result given disease is present) and the disease prevalence is 0.1%. What is the probability that a person has the disease if they test positive?

  - \( P(Disease) = 0.001 \)
  - \( P(Positive|Disease) = 0.99 \)
  - \( P(Positive|\neg Disease) = 0.01 \)
  
  Calculate \( P(Disease|Positive) \):

  \( P(Positive) = P(Positive|Disease) \cdot P(Disease) + P(Positive|\neg Disease) \cdot P(\neg Disease) \)

  \( P(Positive) = 0.99 \cdot 0.001 + 0.01 \cdot 0.999 = 0.01089 \)

  \( P(Disease|Positive) = \frac{P(Positive|Disease) \cdot P(Disease)}{P(Positive)} = \frac{0.99 \cdot 0.001}{0.01089} \approx 0.0907 \)

  So, the probability of having the disease given a positive test result is approximately 9.07%.

**Q5: What is a confidence interval? How to calculate the confidence interval, explain with an example.**

- **Confidence interval:** A range of values calculated from sample data that is likely to contain the true population parameter with a specified level of confidence (e.g., 95%, 99%).

- **Calculation:** For a population mean \( \mu \) with known population standard deviation \( \sigma \):

  \( \text{Confidence Interval} = \bar{x} \pm Z \cdot \frac{\sigma}{\sqrt{n}} \)

  Where \( \bar{x} \) is the sample mean, \( Z \) is the critical value from the standard normal distribution corresponding to the desired confidence level, \( \sigma \) is the population standard deviation, and \( n \) is the sample size.

**Example:**

  Calculate the 95% confidence interval for a sample with mean \( \bar{x} = 50 \) and standard deviation \( \sigma = 5 \), and sample size \( n = 100 \).

  - Critical value \( Z \) for 95% confidence is approximately 1.96 (from standard normal distribution).

  \( \text{Confidence Interval} = 50 \pm 1.96 \cdot \frac{5}{\sqrt{100}} \)

  \( \text{Confidence Interval} = 50 \pm 1.96 \cdot 0.5 \)

  \( \text{Confidence Interval} = 50 \pm 0.98 \)

  \( \text{Confidence Interval} = (49.02, 50.98) \)

  Interpretation: We are 95% confident that the true population mean lies between 49.02 and 50.98.

**Q6: Use Bayes' Theorem to calculate the probability of an event occurring given prior knowledge of the event's probability and new evidence. Provide a sample problem and solution.**

Let's use a classic example involving a diagnostic test:

- **Problem:** Suppose a diagnostic test is 99% accurate for detecting a disease when it is present (sensitivity), and it correctly identifies absence of disease 90% of the time (specificity). The disease prevalence in the population is 0.5%. If a person tests positive, what is the probability that they actually have the disease?

- **Given:**
  - Disease prevalence (\( P(Disease) \)) = 0.005 (0.5%)
  - Sensitivity (\( P(Positive|Disease) \)) = 0.99 (99% accurate for positive tests)
  - Specificity (\( P(Negative|\neg Disease) \)) = 0.90 (90% accurate for negative tests)
  
- **Calculate:**
  - \( P(Positive) \): Probability of testing positive (marginal probability).

    \( P(Positive) = P(Positive|Disease) \cdot P(Disease) + P(Positive|\neg Disease) \cdot P(\neg Disease) \)
    
    \( P(Positive) = 0.99 \cdot 0.005 + (1 - 0.90) \cdot (1 - 0.005) \)
    
    \( P(Positive) = 0.9945 \)

  - \( P(Disease|Positive) \): Probability of having the disease given a positive test result.

    \( P(Disease|Positive) = \frac{P(Positive|Disease) \cdot P(Disease)}{P(Positive)} \)
    
    \( P(Disease|Positive) = \frac{0.99 \cdot 0.005}{0.9945} \)
    
    \( P(Disease|Positive) \approx 0.005 \)

  So, the probability that a person has the disease given a positive test result is approximately 0.5%.

**Q7: Calculate the 95% confidence interval for a sample of data with a mean of 50 and a standard deviation of 5. Interpret the results.**

- **Calculation:**

  \( \text{Confidence Interval} = \bar{x} \pm Z \cdot \frac{s}{\sqrt{n}} \)

  Where \( \bar{x} = 50 \), \( s = 5 \), \( n \) (sample size) is not given but assumed to be sufficient.

  - Critical value \( Z \) for 95% confidence is approximately 1.96.

  \( \text{Confidence Interval} = 50 \pm 1.96 \cdot \frac{5}{\sqrt{n}} \)

  Without \( n \), we can only provide a general interval: \( (50 - 1.96 \cdot \frac{5}{\sqrt{n}}, 50 + 1.96 \cdot \frac{5}{\sqrt{n}}) \)

  Interpretation: We are 95% confident that the true population mean lies within this interval.

**Q8: What is the margin of error in a confidence interval? How does sample size affect the margin of error? Provide an example of a scenario where a larger sample size would result in a smaller margin of error.**

- **Margin of Error:** The margin of error (MoE) is the amount added to and subtracted from the sample estimate to achieve the desired confidence level. It depends on the critical value (from the z or t distribution), standard deviation, and sample size.

- **Effect of Sample Size:** A larger sample size decreases

 the standard error (SE), which in turn reduces the margin of error. This is because larger samples provide more precise estimates of the population parameter.

  **Example:**
  
  Suppose you want to estimate the average height of students in a school with 95% confidence. A smaller sample might have a MoE of ±3 inches, while a larger sample might reduce it to ±1 inch due to increased precision in estimating the population mean.

**Q9: Calculate the z-score for a data point with a value of 75, a population mean of 70, and a population standard deviation of 5. Interpret the results.**

- **Calculation:**

  \( z = \frac{x - \mu}{\sigma} = \frac{75 - 70}{5} = 1 \)

  The z-score is 1.

  Interpretation: The data point of 75 is 1 standard deviation above the mean of 70.

**Q10: In a study of the effectiveness of a new weight loss drug, a sample of 50 participants lost an average of 6 pounds with a standard deviation of 2.5 pounds. Conduct a hypothesis test to determine if the drug is significantly effective at a 95% confidence level using a t-test.**

In [2]:
import numpy as np
from scipy import stats

# Given data
sample_mean = 6
sample_std_dev = 2.5
sample_size = 50
population_mean = 0  # Null hypothesis: drug has no effect
confidence_level = 0.95

# Calculate t-statistic
t_statistic = (sample_mean - population_mean) / (sample_std_dev / np.sqrt(sample_size))

# Calculate degrees of freedom
degrees_of_freedom = sample_size - 1

# Calculate critical value (two-tailed test)
critical_value = stats.t.ppf(1 - (1 - confidence_level) / 2, degrees_of_freedom)

# Compare t-statistic with critical value
if np.abs(t_statistic) > critical_value:
    print("Reject null hypothesis: The drug is significantly effective.")
else:
    print("Fail to reject null hypothesis: There is no sufficient evidence to conclude effectiveness.")


Reject null hypothesis: The drug is significantly effective.


**Q11: In a survey of 500 people, 65% reported being satisfied with their current job. Calculate the 95% confidence interval for the true proportion of people who are satisfied with their job.**

In [3]:
import numpy as np
from scipy import stats

# Given data
sample_proportion = 0.65
sample_size = 500
confidence_level = 0.95

# Calculate standard error
std_error = np.sqrt((sample_proportion * (1 - sample_proportion)) / sample_size)

# Calculate margin of error
margin_of_error = stats.norm.ppf((1 + confidence_level) / 2) * std_error

# Calculate confidence interval
lower_bound = sample_proportion - margin_of_error
upper_bound = sample_proportion + margin_of_error

print(f"Confidence interval: ({lower_bound:.4f}, {upper_bound:.4f})")


Confidence interval: (0.6082, 0.6918)


**Q12: A researcher is testing the effectiveness of two different teaching methods on student performance. Sample A has a mean score of 85 with a standard deviation of 6, while sample B has a mean score of 82 with a standard deviation of 5. Conduct a hypothesis test to determine if the two teaching methods have a significant difference in student performance using a t-test with a significance level of 0.01.**

In [4]:
import numpy as np
from scipy import stats

# Given data
mean_A = 85
std_dev_A = 6
n_A = 30  # Sample size for A
mean_B = 82
std_dev_B = 5
n_B = 40  # Sample size for B
alpha = 0.01

# Calculate pooled standard deviation (assuming equal variances)
pooled_std_dev = np.sqrt(((n_A - 1) * std_dev_A**2 + (n_B - 1) * std_dev_B**2) / (n_A + n_B - 2))

# Calculate t-statistic
t_statistic = (mean_A - mean_B) / (pooled_std_dev * np.sqrt(1/n_A + 1/n_B))

# Calculate degrees of freedom
degrees_of_freedom = n_A + n_B - 2

# Calculate critical value (two-tailed test)
critical_value = stats.t.ppf(1 - alpha / 2, degrees_of_freedom)

# Compare t-statistic with critical value
if np.abs(t_statistic) > critical_value:
    print("Reject null hypothesis: There is a significant difference in performance between the teaching methods.")
else:
    print("Fail to reject null hypothesis: There is no sufficient evidence of a significant difference.")


Fail to reject null hypothesis: There is no sufficient evidence of a significant difference.


**Q13: A population has a mean of 60 and a standard deviation of 8. A sample of 50 observations has a mean of 65. Calculate the 90% confidence interval for the true population mean.**

In [5]:
import numpy as np
from scipy import stats

# Given data
population_mean = 60
population_std_dev = 8
sample_mean = 65
sample_size = 50
confidence_level = 0.90

# Calculate standard error
std_error = population_std_dev / np.sqrt(sample_size)

# Calculate margin of error
margin_of_error = stats.norm.ppf((1 + confidence_level) / 2) * std_error

# Calculate confidence interval
lower_bound = sample_mean - margin_of_error
upper_bound = sample_mean + margin_of_error

print(f"Confidence interval: ({lower_bound:.2f}, {upper_bound:.2f})")


Confidence interval: (63.14, 66.86)


**Q14: In a study of the effects of caffeine on reaction time, a sample of 30 participants had an average reaction time of 0.25 seconds with a standard deviation of 0.05 seconds. Conduct a hypothesis test to determine if caffeine has a significant effect on reaction time at a 90% confidence level using a t-test.**

In [1]:

import numpy as np
from scipy import stats

# Given data
sample_mean = 0.25
sample_std_dev = 0.05
sample_size = 30
population_mean = 0  # Null hypothesis: no effect of caffeine
confidence_level = 0.90

# Calculate t-statistic
t_statistic = (sample_mean - population_mean) / (sample_std_dev / np.sqrt(sample_size))

# Calculate degrees of freedom
degrees_of_freedom = sample_size - 1

# Calculate critical value (two-tailed test)
critical_value = stats.t.ppf(1 - (1 - confidence_level) / 2, degrees_of_freedom)

# Compare t-statistic with critical value
if np.abs(t_statistic) > critical_value:
    print("Reject null hypothesis: Caffeine has a significant effect on reaction time.")
else:
    print("Fail to reject null hypothesis: There is no sufficient evidence to conclude caffeine's effect.")


Reject null hypothesis: Caffeine has a significant effect on reaction time.
