#### Q1: What is the difference between a t-test and a z-test? Provide an example scenario where you would use each type of test.

Ans.

**t-test:**  
- Use Case: Used when the sample size is small (n<30) and/or population standard deviation (σ) is unknown.  
- Distribution: Follows a t-distribution, which has heavier tails, making it more adaptable to small sample sizes.  
- Standard Deviation: Uses the sample standard deviation (s) as an estimate of σ.  
- Applicability: More suitable when dealing with small sample sizes or unknown variance.

**z-test:**  
- Use Case: Used when the sample size is large (n>30) and population standard deviation (σ) is known.  
- Distribution: Follows a normal distribution (z-distribution), assuming the Central Limit Theorem holds.  
- Standard Deviation: Uses the known population standard deviation (σ).  
- Applicability: More suitable when dealing with large sample sizes or known variance.

**Example Scenario:**  
1. t-Test Example (Small Sample, Unknown σ)  
Scenario: A pharmaceutical company wants to test whether a new drug reduces blood pressure. They randomly select 20 patients and measure their blood pressure before and after taking the drug.  
- Here, the sample size is small (n=20), and the population standard deviation (σ) is unknown.
- A paired t-test would be used to compare pre- and post-treatment blood pressure.

2. z-Test Example (Large Sample, Known σ)  
Scenario: A shoe manufacturer wants to verify if the average shoe size of men has changed from the known industry standard of size 10. They collect a random sample of 200 men's shoe sizes from different stores.  
- The sample size is large (n=200), and the population standard deviation (σ) is known from historical data.
- A one-sample z-test would be used to compare the sample mean shoe size to the population mean.

---

#### Q2: Differentiate between one-tailed and two-tailed tests.

Ans.

One-Tailed Test (Directional Test)  
- A one-tailed test checks for an effect in only one direction (either greater or smaller).
- The alternative hypothesis (H1) specifies whether the sample mean is either greater than or less than the population mean.
- The rejection region is only on one side of the distribution.

Two-Tailed Test (Non-Directional Test)  
- A two-tailed test checks for an effect in both directions (greater or smaller).
- The alternative hypothesis (H1) states that the sample mean is different from the population mean, but does not specify the direction.
- The rejection region is split into two tails of the distribution.

---

#### Q3: Explain the concept of Type 1 and Type 2 errors in hypothesis testing. Provide an example scenario for each type of error.

Ans.

1.Type 1 Error (False Positive)  
- Definition: Rejecting the null hypothesis (H0) when it is actually true.
- Cause: A random sample might show a significant effect purely by chance.
- Symbol: α (Significance level) – the probability of making a Type 1 error.
- Example Scenario:
  - A company tests a new drug for reducing cholesterol and sets up the following hypotheses: 
    - H0: The drug has no effect on cholesterol levels.
    - H1: The drug lowers cholesterol levels.  

If the test incorrectly concludes that the drug works when it actually does not, this is a Type 1 error. This could lead to the approval of an ineffective drug, potentially wasting resources and harming patients.

2.Type 2 Error (False Negative)  
- Definition: Failing to reject the null hypothesis (H0) when it is actually false.
- Cause: The test might lack enough power (e.g., small sample size) to detect a real effect.
- Symbol: β – the probability of making a Type 2 error.
- Example Scenario:
  - A quality control team at a manufacturing plant checks if a machine produces defective products:
    - 𝐻0: The machine produces acceptable products.
    - 𝐻1: The machine produces defective products.  

If the test fails to detect defects when the machine is actually faulty, this is a Type 2 error. This could lead to defective products being sold to customers, causing reputational damage and financial loss.

---

#### Q4: Explain Bayes's theorem with an example.

Ans.

Bayes's Theorem: Concept & Example:  
- Bayes’s theorem is a fundamental concept in probability theory that describes how to update our beliefs (probabilities) in light of new evidence. It is widely used in fields like machine learning, statistics, medical diagnosis, and spam filtering.

Bayes’s Theorem Formula:  
P(A|B) = (P(B|A).P(A)) / (P(B)])  

where,
- P(A∣B) = Posterior probability (Probability of event 𝐴 given that B has occurred)  
- P(B∣A) = Likelihood (Probability of observing B given that A is true)  
- P(A) = Prior probability (Initial probability of A, before observing B)  
- P(B) = Marginal probability (Total probability of event B occurring)  

---

#### Q5: What is a confidence interval? How to calculate the confidence interval, explain with an example.

Ans.

**Confidence Interval (CI):**  
- A confidence interval (CI) is a range of values used to estimate an unknown population parameter (such as a mean or proportion). It provides a measure of uncertainty around a sample estimate.  
- For example, a 95% confidence interval means that if we were to repeat the experiment many times, approximately 95% of the intervals would contain the true population parameter.

In [1]:
import numpy as np
import scipy.stats as st

data = np.array([165, 170, 172, 168, 175, 160, 178, 185, 169, 172,
                 166, 174, 180, 177, 168, 181, 179, 163, 167, 171,
                 173, 176, 162, 159, 182, 170, 175, 178, 185, 169,
                 172, 166, 174, 180, 177, 168, 181, 179, 163, 167,
                 171, 173, 176, 162, 159, 182, 170, 175, 178, 185])

mean = np.mean(data)
std_dev = np.std(data, ddof=1)  # Use sample standard deviation (ddof=1)
n = len(data)

confidence = 0.95
alpha = 1 - confidence

t_critical = st.t.ppf(1 - alpha/2, df=n-1)

margin_of_error = t_critical * (std_dev / np.sqrt(n))

lower_bound = mean - margin_of_error
upper_bound = mean + margin_of_error

print(f"Sample Mean: {mean:.2f} cm")
print(f"95% Confidence Interval: ({lower_bound:.2f} cm, {upper_bound:.2f} cm)")


Sample Mean: 172.54 cm
95% Confidence Interval: (170.55 cm, 174.53 cm)


---

#### Q6. Use Bayes' Theorem to calculate the probability of an event occurring given prior knowledge of the event's probability and new evidence. Provide a sample problem and solution.

Ans.

Bayes's Theorem: Concept & Example:  
- Bayes’s theorem is a fundamental concept in probability theory that describes how to update our beliefs (probabilities) in light of new evidence. It is widely used in fields like machine learning, statistics, medical diagnosis, and spam filtering.

Bayes’s Theorem Formula:  
P(A|B) = (P(B|A).P(A)) / (P(B)])  

where,
- P(A∣B) = Posterior probability (Probability of event 𝐴 given that B has occurred)  
- P(B∣A) = Likelihood (Probability of observing B given that A is true)  
- P(A) = Prior probability (Initial probability of A, before observing B)  
- P(B) = Marginal probability (Total probability of event B occurring)  

In [2]:
# Given probabilities
P_D = 0.01  # Prior probability of disease
P_not_D = 1 - P_D  # Probability of not having disease
P_T_given_D = 0.90  # True Positive Rate
P_T_given_not_D = 0.05  # False Positive Rate

# Total probability of positive test
P_T = (P_T_given_D * P_D) + (P_T_given_not_D * P_not_D)

# Bayes' Theorem
P_D_given_T = (P_T_given_D * P_D) / P_T

print(f"Probability of having the disease given a positive test result: {P_D_given_T:.4f}")


Probability of having the disease given a positive test result: 0.1538


---

#### Q7. Calculate the 95% confidence interval for a sample of data with a mean of 50 and a standard deviation of 5. Interpret the results.

Ans.

In [3]:
import scipy.stats as st
import numpy as np

sample_mean = 50
sample_std = 5
n = 30  # Assumption
confidence = 0.95

t_critical = st.t.ppf(1 - (1 - confidence) / 2, df=n-1)

margin_of_error = t_critical * (sample_std / np.sqrt(n))

lower_bound = sample_mean - margin_of_error
upper_bound = sample_mean + margin_of_error

print(f"95% Confidence Interval: ({lower_bound:.2f}, {upper_bound:.2f})")


95% Confidence Interval: (48.13, 51.87)


---

#### Q8. What is the margin of error in a confidence interval? How does sample size affect the margin of error? Provide an example of a scenario where a larger sample size would result in a smaller margin of error.

Ans.

**Margin of Error (MOE):**  
- The margin of error (MOE) is the range of uncertainty around a sample estimate. It quantifies how much the sample mean (x bar) is expected to vary from the true population mean (μ) due to random sampling.

The margin of error is inversely proportional to the square root of the sample size:  
- Larger sample size (n) → Smaller MOE (More precision).
- Smaller sample size (n) → Larger MOE (Less precision).

---

#### Q9. Calculate the z-score for a data point with a value of 75, a population mean of 70, and a population standard deviation of 5. Interpret the results.

Ans.

In [4]:
import scipy.stats as st

X = 75
m = 70
std = 5

z_score = (X - m) / std

percentile = st.norm.cdf(z_score) * 100

print(f"Z-score: {z_score:.2f}")
print(f"Percentile Rank: {percentile:.2f}%")


Z-score: 1.00
Percentile Rank: 84.13%


---

#### Q10. In a study of the effectiveness of a new weight loss drug, a sample of 50 participants lost an average of 6 pounds with a standard deviation of 2.5 pounds. Conduct a hypothesis test to determine if the drug is significantly effective at a 95% confidence level using a t-test.

Ans.

In [5]:
import scipy.stats as st
import numpy as np

sample_mean = 6
null_mean = 0
sample_std = 2.5
n = 50
alpha = 0.05

t_statistic = (sample_mean - null_mean) / (sample_std / np.sqrt(n))

df = n - 1

p_value = 2 * (1 - st.t.cdf(abs(t_statistic), df=df))

t_statistic, p_value


(16.970562748477143, 0.0)

---

#### Q11. In a survey of 500 people, 65% reported being satisfied with their current job. Calculate the 95% confidence interval for the true proportion of people who are satisfied with their job.

Ans.

In [6]:
import math

p = 0.65
n = 500
Z = 1.96  # Z-critical value for 95% confidence level

SE = math.sqrt((p * (1 - p)) / n)

ME = Z * SE

lower_bound = p - ME
upper_bound = p + ME

lower_bound, upper_bound


(0.608191771144905, 0.6918082288550951)

---

#### Q12. A researcher is testing the effectiveness of two different teaching methods on student performance. Sample A has a mean score of 85 with a standard deviation of 6, while sample B has a mean score of 82 with a standard deviation of 5. Conduct a hypothesis test to determine if the two teaching methods have a significant difference in student performance using a t-test with a significance level of 0.01.

Ans.

In [7]:
import scipy.stats as st
import numpy as np

mean_A = 85
mean_B = 82
std_A = 6
std_B = 5
n_A = 30  # assumption for sample A
n_B = 30  # assumption for sample B
alpha = 0.01  # significance level

t_statistic = (mean_A - mean_B) / np.sqrt((std_A**2 / n_A) + (std_B**2 / n_B))

df = ((std_A**2 / n_A + std_B**2 / n_B)**2) / (((std_A**2 / n_A)**2) / (n_A - 1) + ((std_B**2 / n_B)**2) / (n_B - 1))

p_value = 2 * (1 - st.t.cdf(abs(t_statistic), df=df))

t_statistic, p_value


(2.1038606199548298, 0.03987998118234137)

---

#### Q13. A population has a mean of 60 and a standard deviation of 8. A sample of 50 observations has a mean of 65. Calculate the 90% confidence interval for the true population mean.

Ans.

In [8]:
import math

sample_mean, sigma, n = 65, 8, 50
Z = 1.645  # Z-critical value for 90% confidence level

SE = sigma / math.sqrt(n)

ME = Z * SE

lower_bound = sample_mean - ME
upper_bound = sample_mean + ME

print(lower_bound, "\t",upper_bound)

63.13889495191701 	 66.86110504808299


---

#### Q14. In a study of the effects of caffeine on reaction time, a sample of 30 participants had an average reaction time of 0.25 seconds with a standard deviation of 0.05 seconds. Conduct a hypothesis test to determine if the caffeine has a significant effect on reaction time at a 90% confidence level using a t-test.

Ans.

In [9]:
import scipy.stats as st
import numpy as np

sample_mean, mu_0, sample_std, n = 0.25, 0.3, 0.05, 30
alpha = 0.10  # significance level for 90% confidence

t_statistic = (sample_mean - mu_0) / (sample_std / np.sqrt(n))

dof = n - 1

p_value = 2 * (1 - st.t.cdf(abs(t_statistic), df=dof))

# Results
t_statistic, p_value


(-5.47722557505166, 6.7391453468790274e-06)