**Q1:** What is the difference between a t-test and a z-test? Provide an example scenario where you would
use each type of test.

A t-test and a z-test are both statistical hypothesis tests used to make inferences about population parameters based on sample data.


**T-Test:**

Typically used when you have a small sample size (usually less than 30) or when the population standard deviation is unknown. The t-test accounts for the increased uncertainty with smaller samples.

**Z-Test:**

 Appropriate for larger sample sizes (typically greater than 30) when you can assume that the sample standard deviation is a good estimate of the population standard deviation.

**Example Scenarios:**

**T-Test:**

**Scenario:**

You want to assess whether a new teaching method improves student test scores. You randomly select 20 students, teach them using the new method, and compare their scores to their scores before the new method.

**Test to Use:** Paired t-test or one-sample t-test (depending on the design).

**Reason:** You have a small sample size, and you are comparing two sets of related scores.

**Z-Test:**

**Scenario:**

You work at a large e-commerce company, and you want to test if the average time it takes for customers to complete an online purchase has changed due to a website redesign. You collect data from 500 recent transactions.

**Test to Use:** One-sample z-test or two-sample z-test (depending on the research question).

**Reason:**

 With a sample size of 500, you can assume that the sample standard deviation is a good estimate of the population standard deviation, and the sample size is sufficiently large for a z-test.

**Q2:** Differentiate between one-tailed and two-tailed tests.

**One-Tailed Test:**

Focus: A one-tailed test is used when you are specifically interested in whether a sample statistic is significantly greater than or less than a population parameter, but not both. In other words, you have a directional hypothesis.

Hypotheses: There are two types of one-tailed tests:

Right-Tailed Test (Upper-Tailed): This is used when you are testing if the sample statistic is significantly greater than the population parameter.

Left-Tailed Test (Lower-Tailed): This is used when you are testing if the sample statistic is significantly less than the population parameter.

Alpha Level: The significance level (alpha) is concentrated in one tail of the distribution, making it easier to reject the null hypothesis in that specific direction.

**Two-Tailed Test:**

Focus: A two-tailed test is used when you are interested in whether a sample statistic is significantly different from a population parameter in any direction, either greater or less than. In other words, you have a non-directional hypothesis.

Hypotheses: In a two-tailed test, you consider the possibility of a significant difference in both directions simultaneously.

Alpha Level: The significance level (alpha) is divided between the two tails of the distribution, making it more conservative. This means you would need stronger evidence to reject the null hypothesis compared to a one-tailed test.

**Example:**

Let's say you are testing whether a new drug is more effective at reducing blood pressure than an existing drug.

**One-Tailed Test:**

Right-Tailed Test: You are interested in whether the new drug is significantly better (higher blood pressure reduction) than the existing drug.

Left-Tailed Test: You are interested in whether the new drug is significantly worse (lower blood pressure reduction) than the existing drug.

**Two-Tailed Test:**

You are interested in whether there is a significant difference in blood pressure reduction between the two drugs, without specifying a particular direction. You want to know if the new drug is significantly different, either better or worse.

**Q3:** Explain the concept of Type 1 and Type 2 errors in hypothesis testing. Provide an example scenario for
each type of error.

**Type 1 Error (False Positive):**

**Definition:**

Type 1 error occurs when you reject a null hypothesis that is actually true. In other words, you conclude that there is a significant effect or difference when there isn't one in reality.

Symbol: Denoted as α (alpha), the significance level.

**Example Scenario:**

Scenario: A pharmaceutical company is testing a new drug for a disease. The null hypothesis (H0) is that the drug has no effect (i.e., it's not better than a placebo). The alternative hypothesis (Ha) is that the drug is effective.

Error: If, based on the sample data, the company rejects the null hypothesis and concludes that the drug is effective (Ha), but in reality, the drug has no effect (H0), it's a Type 1 error. This could lead to the unnecessary marketing and distribution of an ineffective drug.


**Type 2 Error (False Negative):**

**Definition:**

 Type 2 error occurs when you fail to reject a null hypothesis that is actually false. In other words, you conclude that there is no significant effect or difference when there actually is one in reality.

Symbol: Denoted as β (beta).

**Example Scenario:**

Scenario: In a criminal trial, the null hypothesis (H0) is that the defendant is innocent (no guilt). The alternative hypothesis (Ha) is that the defendant is guilty.

Error: If, based on the evidence presented in court, the jury fails to reject the null hypothesis (H0) and acquits the defendant, but the defendant is actually guilty (Ha), it's a Type 2 error. In this case, a potentially guilty person is set free.

The balance between these two types of errors is managed by choosing an appropriate significance level (α) and sample size in hypothesis testing. Lowering α (the significance level) reduces the risk of Type 1 error but increases the risk of Type 2 error, and vice versa. 

**Q4:** Explain Bayes's theorem with an example.

**Bayes's theorem is a fundamental concept in probability theory and statistics used to update the probability of a hypothesis based on new evidence.**

**Bayes's Theorem Formula:**

P(A∣B)= P(B∣A)⋅P(A)/P(B)
​

Where:

P(A∣B) is the posterior probability of hypothesis A being true given evidence B.

P(B∣A) is the likelihood of observing evidence B if hypothesis A is true.

P(A) is the prior probability of hypothesis A being true before considering evidence B.

P(B) is the probability of observing evidence B.

The posterior probability (P(A∣B)) is what we want to calculate, which is the probability of the hypothesis being true after considering the new evidence.

The likelihood (P(B∣A)) is the probability of observing the evidence under the assumption that the hypothesis is true.

The prior probability (P(A)) is the probability of the hypothesis being true before considering the new evidence.

The probability of evidence (P(B)) is the overall probability of observing the evidence.


**Scenario:**

Imagine you're a doctor trying to diagnose a rare medical condition called "RareX" in a patient. The disease is quite uncommon, occurring in only 2% of the population (prior probability, 

P(RareX)=0.02). You have a diagnostic test, TestA, that is not perfect. It correctly identifies RareX in 90% of cases when the patient has the disease (likelihood, 

P(TestA∣RareX)=0.90). However, it also produces false positives in 10% of cases when RareX is not present (likelihood, P(TestA∣¬RareX)=0.10).

You want to find the probability that the patient has RareX given a positive test result (denoted as P(RareX∣TestA)).

Using Bayes's theorem:

Prior Probability (P(RareX)): The prior probability of the patient having RareX before considering the test result is 2% (0.02).

Likelihood (P(TestA∣RareX)): The probability of a positive test result (TestA) when the patient has RareX is 90% (0.90).

Likelihood (P(TestA∣¬RareX)): The probability of a positive test result (TestA) when RareX is not present is 10% (0.10).

Probability of Evidence (P(TestA)):

This is the probability of getting a positive test result, considering both the presence and absence of RareX. You can calculate it using the law of total probability:


P(TestA)=P(TestA∣RareX)⋅P(RareX)+P(TestA∣¬RareX)⋅P(¬RareX)

P(TestA)=(0.90⋅0.02)+(0.10⋅0.98)=0.018+0.098=0.116

Posterior Probability (P(RareX∣TestA)):

Using Bayes's theorem, you can now calculate the probability that the patient has RareX given a positive test result:

P(RareX∣TestA)=P(TestA∣RareX)⋅P(RareX)/ P(TestA)

P(RareX∣TestA)= 0.90*0.02/0.116 ≈ 0.155
So, 
P(RareX∣TestA), the probability that the patient has RareX given a positive test result, is approximately 15.5%. This demonstrates how Bayes's theorem helps update our beliefs about a hypothesis (presence of RareX) based on new evidence (positive TestA result) and prior probabilities.

**Q5:** What is a confidence interval? How to calculate the confidence interval, explain with an example.

**Confidence Interval:** A statistical measure to estimate a range of values for a population parameter with a specified level of confidence.

**Components Needed:**

Sample data from the population.

Population parameter you want to estimate (e.g., mean or proportion).

Desired confidence level (e.g., 90%, 95%, 99%).

Formula for Confidence Interval: The formula depends on the parameter and whether the population standard deviation is known.

**Example:** Calculating a Confidence Interval for a Population Mean

Collect a random sample (e.g., heights of adult males).

Identify the population parameter to estimate (e.g., mean height).

Choose a confidence level (e.g., 95%).

Formula for Mean with Known Population Standard Deviation:

**Confidence Interval = Sample Mean ± (Z-score × Population Standard Deviation / Sample Size)**


Calculate Z-score: Find the appropriate Z-score for the chosen confidence level (e.g., 1.96 for 95% confidence).

Plug in Values: Insert the sample mean, Z-score, population standard deviation, and sample size into the formula.

Interpretation: The result is a range (confidence interval) that provides an estimated interval for the population parameter with the specified level of confidence.

Example Outcome: For a 95% confidence level, the 95% confidence interval is approximately (lower bound, upper bound). This means you can be 95% confident that the true population parameter falls within this range based on your sample.

**Q6.** Use Bayes' Theorem to calculate the probability of an event occurring given prior knowledge of the event's probability and new evidence. Provide a sample problem and solution.

**Scenario:**

Imagine you're a doctor trying to diagnose a rare medical condition called "RareX" in a patient. The disease is quite uncommon, occurring in only 2% of the population (prior probability, 

P(RareX)=0.02). You have a diagnostic test, TestA, that is not perfect. It correctly identifies RareX in 90% of cases when the patient has the disease (likelihood, 

P(TestA∣RareX)=0.90). However, it also produces false positives in 10% of cases when RareX is not present (likelihood, P(TestA∣¬RareX)=0.10).

You want to find the probability that the patient has RareX given a positive test result (denoted as P(RareX∣TestA)).

Using Bayes's theorem:

Prior Probability (P(RareX)): The prior probability of the patient having RareX before considering the test result is 2% (0.02).

Likelihood (P(TestA∣RareX)): The probability of a positive test result (TestA) when the patient has RareX is 90% (0.90).

Likelihood (P(TestA∣¬RareX)): The probability of a positive test result (TestA) when RareX is not present is 10% (0.10).

Probability of Evidence (P(TestA)):

This is the probability of getting a positive test result, considering both the presence and absence of RareX. You can calculate it using the law of total probability:


P(TestA)=P(TestA∣RareX)⋅P(RareX)+P(TestA∣¬RareX)⋅P(¬RareX)

P(TestA)=(0.90⋅0.02)+(0.10⋅0.98)=0.018+0.098=0.116

Posterior Probability (P(RareX∣TestA)):

Using Bayes's theorem, you can now calculate the probability that the patient has RareX given a positive test result:

**P(RareX∣TestA)=P(TestA∣RareX)⋅P(RareX)/ P(TestA)**

P(RareX∣TestA)= 0.90*0.02/0.116 ≈ 0.155
So, 
P(RareX∣TestA), the probability that the patient has RareX given a positive test result, is approximately 15.5%. This demonstrates how Bayes's theorem helps update our beliefs about a hypothesis (presence of RareX) based on new evidence (positive TestA result) and prior probabilities.

**Q7.** Calculate the 95% confidence interval for a sample of data with a mean of 50 and a standard deviation
of 5. Interpret the results.


Sample Mean : 50

Population Standard Deviation : 5

Confidence Level: 95%

Sample Size (n): Assumed as 30

Z-score for a 95% confidence level is approximately 1.96 (standard normal distribution).

**Confidence Interval = Sample Mean ± (Z-score × Population Standard Deviation / Sample Size)**

Confidence Interval ≈ 50 ± 1.79

Resulting 95% Confidence Interval: (48.21, 51.79)

**Interpretation:**

With 95% confidence, we can say that the true population mean falls within the range of 48.21 to 51.79 based on this sample of data.

In most cases, if you were to take many random samples and calculate 95% confidence intervals for each, the true population mean would be within this range about 95% of the time.

This interval represents the margin of error or uncertainty associated with estimating the population mean from this sample.

**Q8.** What is the margin of error in a confidence interval? How does sample size affect the margin of error?
Provide an example of a scenario where a larger sample size would result in a smaller margin of error.

Margin of Error (MOE):

MOE is a measure of uncertainty in a confidence interval.

It quantifies the range within which the true population parameter is likely to fall.

MOE provides precision to the estimate and accounts for sampling variability.

**MOE= Z * σ / square root of (n)**

​Where

(n) = Sample size 

(σ) = Population Standard Deviation 

(Z) = Z score



**Factors Affecting MOE:**

**Confidence Level:**

Higher confidence levels result in larger margins of error because they require wider intervals to be more confident about capturing the true parameter value.

Common confidence levels include 90%, 95%, and 99%.

**Sample Size:**

A larger sample size leads to a smaller margin of error.

Larger samples provide more information about the population, reducing sampling variability.

Smaller samples yield larger margins of error, making the estimate less precise.

**Example Scenario:**

Consider a political survey to estimate voter support for a candidate.

**Scenario 1 (Smaller Sample Size):**

Sample size (n): 200

MOE: ±5%

**Scenario 2 (Larger Sample Size):**

Sample size (n): 1000

MOE: ±2%

In Scenario 2, the larger sample size results in a smaller MOE, indicating a more precise estimate and higher confidence in the results.

**Q9.** Calculate the z-score for a data point with a value of 75, a population mean of 70, and a population
standard deviation of 5. Interpret the results.

Z= 75−70 / 5

Z=1

So, the z-score for the data point with a value of 75, a population mean of 70, and a population standard deviation of 5 is 1.

**Interpretation:**

The z-score represents how many standard deviations the data point is away from the mean. In this case, with a z-score of 1, it means that the data point (75) is 1 standard deviation above the population mean (70). This indicates that the data point is relatively higher than the average value in the population.

Hypotheses:

Null Hypothesis: The new weight loss drug is not significantly effective (μ = 0).

Alternative Hypothesis : The new weight loss drug is significantly effective (μ != 0).

Significance Level:

Set the significance level (α) to 0.05 (corresponding to a 95% confidence level).

Sample Information:

Sample size (n): 50 participants

Sample mean (x): 6 pounds lost

Sample standard deviation (s): 2.5 pounds

Calculations:

Calculate the test statistic t = sqrt(50) * 6 / 2.5 ≈ 16.97

t ≈ 16.97

Determine the degrees of freedom (df): df=49 (since n−1)

Calculate the p-value using the t-distribution 

Decision:

If the p-value < 0.05 (α), reject the null hypothesis.

If the p-value ≥ 0.05 (α), fail to reject the null hypothesis.

In [4]:
import numpy as np
from scipy import stats

# Sample data
sample_data = np.array([6] * 50) 
population_mean = 0 
sample_stddev = 2.5

# Calculate the t-statistic manually
sample_mean = np.mean(sample_data)
standard_error = sample_stddev / np.sqrt(len(sample_data))
t_statistic = (sample_mean - population_mean) / standard_error

# Degrees of freedom
df = len(sample_data) - 1

# Calculate the p-value manually
p_value = 2 * (1 - stats.t.cdf(np.abs(t_statistic), df))

# Set the significance level (alpha)
alpha = 0.05

# Print the results
print(f"Sample Mean: {sample_mean}")
print(f"t-statistic: {t_statistic}")
print(f"p-value: {p_value}")


# Make a decision based on the p-value
if p_value < alpha:
    print("Reject the null hypothesis. The drug is significantly effective.")
else:
    print("Fail to reject the null hypothesis. The drug's effectiveness is not significant.")


Sample Mean: 6.0
t-statistic: 16.970562748477143
p-value: 0.0
Reject the null hypothesis. The drug is significantly effective.


**Q11.** In a survey of 500 people, 65% reported being satisfied with their current job. Calculate the 95%
confidence interval for the true proportion of people who are satisfied with their job.

In [6]:
import numpy as np
import scipy.stats as stats

sample_proportion = 0.65 
sample_size = 500

# Calculate the standard error
standard_error = np.sqrt((sample_proportion * (1 - sample_proportion)) / sample_size)

# Set the confidence level and find the critical Z-value (for 95% confidence)
confidence_level = 0.95
alpha = 1 - confidence_level
z_critical = stats.norm.ppf(1 - alpha / 2)  # Two-tailed test

# Calculate the margin of error
margin_of_error = z_critical * standard_error

# Calculate the confidence interval
lower_bound = sample_proportion - margin_of_error
upper_bound = sample_proportion + margin_of_error

# Print the results
print(f"95% Confidence Interval: ({lower_bound:.4f}, {upper_bound:.4f})")


95% Confidence Interval: (0.6082, 0.6918)


**Q12.** A researcher is testing the effectiveness of two different teaching methods on student performance.
Sample A has a mean score of 85 with a standard deviation of 6, while sample B has a mean score of 82
with a standard deviation of 5. Conduct a hypothesis test to determine if the two teaching methods have a
significant difference in student performance using a t-test with a significance level of 0.01.

To determine if there is a significant difference in student performance between the two teaching methods, you can conduct a two-sample t-test.

The null hypothesis typically states that there is no significant difference between the two groups, while the alternative hypothesis suggests that there is a significant difference. 

In [9]:
import numpy as np
import scipy.stats as stats


mean_A = 85
stddev_A = 6
n_A = 30  
mean_B = 82
stddev_B = 5
n_B = 30

# Significance level
alpha = 0.01

# Calculate the test statistic
t_statistic, p_value = stats.ttest_ind_from_stats(mean_A, stddev_A, n_A, mean_B, stddev_B, n_B)

# Calculate the degrees of freedom
df = n_A + n_B - 2

# Calculate the critical value
t_critical = stats.t.ppf(1 - alpha / 2, df)
print(f"t_critical: {t_critical}")
print(f"t-statistic: {t_statistic}")
print(f"p-value: {p_value}")

if abs(t_statistic) > t_critical:
    print("Reject the null hypothesis. There is a significant difference in student performance.")
else:
    print("Fail to reject the null hypothesis. There is no significant difference in student performance.")


t_critical: 2.6632869538098674
t-statistic: 2.1038606199548298
p-value: 0.03973697161571063
Fail to reject the null hypothesis. There is no significant difference in student performance.


**Q13.** A population has a mean of 60 and a standard deviation of 8. A sample of 50 observations has a mean
of 65. Calculate the 90% confidence interval for the true population mean.

In [10]:
import scipy.stats as stats
import math

sample_mean = 65
population_stddev = 8
sample_size = 50

# Significance level (alpha) for a 90% confidence interval
alpha = 0.10 

# Calculate the critical value (Z) for a one-tailed test
z_critical = stats.norm.ppf(1 - alpha)

standard_error = population_stddev / math.sqrt(sample_size)


margin_of_error = z_critical * standard_error

# Calculate the confidence interval
lower_bound = sample_mean - margin_of_error
upper_bound = sample_mean + margin_of_error

# Print the results
print(f"90% Confidence Interval: ({lower_bound:.3f}, {upper_bound:.3f})")


90% Confidence Interval: (63.550, 66.450)


**Q14.** In a study of the effects of caffeine on reaction time, a sample of 30 participants had an average
reaction time of 0.25 seconds with a standard deviation of 0.05 seconds. Conduct a hypothesis test to
determine if the caffeine has a significant effect on reaction time at a 90% confidence level using a t-test.

To determine if caffeine has a significant effect on reaction time, you can conduct a hypothesis test using a one-sample t-test. The null hypothesis  typically states that there is no significant effect, while the alternative hypothesis suggests that there is a significant effect.

In [13]:
import scipy.stats as stats
import math

sample_mean = 0.25
sample_stddev = 0.05
sample_size = 30

# Significance level (alpha) for a 90% confidence interval
alpha = 0.10


null_mean = 0.25


standard_error = sample_stddev / math.sqrt(sample_size)

t_statistic = (sample_mean - null_mean) / standard_error

# Degrees of freedom
df = sample_size - 1

# Calculate the critical value for a two-tailed test
t_critical = stats.t.ppf(1 - alpha / 2, df)


if abs(t_statistic) <= t_critical:
    print("Fail to reject the null hypothesis. Caffeine does not have a significant effect on reaction time.")
else:
    print("Reject the null hypothesis. Caffeine has a significant effect on reaction time.")


Fail to reject the null hypothesis. Caffeine does not have a significant effect on reaction time.
