# Q1: What is the difference between a t-test and a z-test? Provide an example scenario where you would use each type of test.

### T-test:

- The t-test is used when the sample size is small (typically less than 30) or when the population standard deviation is unknown.
- It is appropriate for testing hypotheses about the mean of a single sample or the difference between the means of two independent samples.
### The test statistic follows a t-distribution.
- Example scenario: Suppose you want to compare the average test scores of students from two different schools (School A and School B). You collect a random sample of 20 students from each school and want to determine if there is a significant difference in their average scores.
### Z-test:

- The z-test is used when the sample size is large (typically greater than 30) and the population standard deviation is known or when the sample size is large enough for the Central Limit Theorem to apply.
- It is appropriate for testing hypotheses about the mean of a single sample or the difference between the means of two independent samples when the sample size is large.
- The test statistic follows a standard normal distribution (z-distribution).
- Example scenario: Suppose you have access to the heights of 500 randomly selected adult males in a city. You want to test if the average height of adult males in the city is significantly different from the national average height, which is 175 cm.

# Q2: Differentiate between one-tailed and two-tailed tests.

### One-tailed test:

- In a one-tailed test, the alternative hypothesis is directional, meaning it specifies that the population parameter is greater than or less than a certain value.
- The critical region for the test is on one side of the sampling distribution.
- One-tailed tests are used when there is a clear directional expectation or hypothesis.
- Example: Testing whether a new drug treatment increases the average lifespan of patients (one-tailed alternative hypothesis: the drug increases lifespan).
### Two-tailed test:

- In a two-tailed test, the alternative hypothesis is non-directional, and it specifies that the population parameter is not equal to a certain value.
- The critical region for the test is on both sides of the sampling distribution.
- Two-tailed tests are used when there is no specific directional expectation or when you want to detect any significant difference, whether it's an increase or decrease.
- Example: Testing whether a coin is biased (two-tailed alternative hypothesis: the coin's probability of landing heads is not equal to 0.5).

# Q3: Explain the concept of Type 1 and Type 2 errors in hypothesis testing. Provide an example scenario for each type of error.

### Type 1 error (False Positive):

- A Type 1 error occurs when you reject a true null hypothesis.
- In other words, you conclude that there is a significant effect or difference when, in reality, there is no such effect or difference in the population.
- The probability of making a Type 1 error is denoted by the symbol alpha (α), which is the significance level set before conducting the test.
- Example scenario for Type 1 error: A pharmaceutical company conducts a clinical trial for a new drug and erroneously concludes that the drug is effective in treating a disease when it actually has no effect.

### Type 2 error (False Negative):

- A Type 2 error occurs when you fail to reject a false null hypothesis.
- In other words, you conclude that there is no significant effect or difference when, in reality, there is such an effect or difference in the population.
- The probability of making a Type 2 error is denoted by the symbol beta (β).
- Example scenario for Type 2 error: A medical test for a specific disease yields a negative result for a patient who actually has the disease, leading to the incorrect conclusion that the patient is healthy.

# Q4:  Explain Bayes's theorem with an example.

 Bayes's Theorem is a mathematical formula used to update the probability of an event occurring based on new evidence. It is expressed as:

P(A|B) = (P(B|A) * P(A)) / P(B)

Where:
P(A|B) is the probability of event A occurring given the evidence B.
P(B|A) is the probability of evidence B given that event A has occurred.
P(A) is the prior probability of event A (before considering any evidence).
P(B) is the prior probability of evidence B (before considering any evidence).

Example scenario: Suppose there is a rare disease that affects 1 in every 10,000 people. A diagnostic test for this disease is known to be 99% accurate, meaning it correctly identifies a person with the disease 99% of the time and correctly identifies a healthy person 99% of the time.

Let:
A = The event that a person has the disease.
B = The event that the test result is positive.

We want to calculate the probability that a person has the disease given that the test result is positive (P(A|B)).

P(A) = 1/10,000 (prior probability of having the disease)
P(B|A) = 0.99 (probability of a positive test result given the person has the disease)
P(B|¬A) = 0.01 (probability of a positive test result given the person does not have the disease)

Now, we can calculate P(B) using the law of total probability:

P(B) = P(B|A) * P(A) + P(B|¬A) * P(¬A)
P(B) = 0.99 * (1/10,000) + 0.01 * (1 - 1/10,000)
P(B) = 0.000099 + 0.9999 ≈ 1

Now, we can apply Bayes's Theorem:

P(A|B) = (P(B|A) * P(A)) / P(B)
P(A|B) = (0.99 * 0.0001) / 1
P(A|B) ≈ 0.000099 / 1 ≈ 0.000099

So, given a positive test result, the probability that a person actually has the disease is approximately 0.0099% (0.000099 or 0.0099%).

# Q5: What is a confidence interval? How to calculate the confidence interval, explain with an example.

 A confidence interval is a range of values within which the true population parameter is likely to fall with a certain level of confidence. It provides a measure of the uncertainty associated with estimating a population parameter from a sample.

To calculate a confidence interval for a population mean, you need the following information:

Sample mean (x̄)
Sample standard deviation (s) or the standard error of the mean (SE)
Sample size (n)
Desired confidence level (typically expressed as a percentage, e.g., 95%)
The formula for the confidence interval of a population mean (assuming a large sample or when the population standard deviation is known) is:

Confidence Interval = x̄ ± Z * (σ/√n)

Where:
x̄ is the sample mean
Z is the critical value from the standard normal distribution corresponding to the desired confidence level (e.g., 1.96 for a 95% confidence level)
σ is the population standard deviation (or sample standard deviation if the population standard deviation is unknown)
n is the sample size

Example:
Suppose we have a sample of 100 students, and we want to calculate a 95% confidence interval for their average height. The sample mean height is 165 cm, and the sample standard deviation is 8 cm.

Confidence Interval = 165 ± 1.96 * (8/√100)

Calculations:
Confidence Interval = 165 ± 1.96 * 0.8
Confidence Interval = 165 ± 1.568

Lower bound of the confidence interval = 165 - 1.568 ≈ 163.432
Upper bound of the confidence interval = 165 + 1.568 ≈ 166.568

Interpretation: We can be 95% confident that the true average height of the population of students lies between approximately 163.4 cm and 166.6 cm.

# Q6. Use Bayes' Theorem to calculate the probability of an event occurring given prior knowledge of the event's probability and new evidence. Provide a sample problem and solution.


Suppose we have the following information:

The prior probability of event A (P(A)) is 0.3 (30%).
The probability of observing evidence B given that event A has occurred (P(B|A)) is 0.8 (80%).
The probability of observing evidence B given that event A has not occurred (P(B|¬A)) is 0.2 (20%).
We want to calculate the probability of event A occurring given the new evidence B (P(A|B)).

Using Bayes's Theorem:

P(A|B) = (P(B|A) * P(A)) / P(B)

Substitute the given values:

P(A|B) = (0.8 * 0.3) / P(B)

To calculate P(B), we use the law of total probability:

P(B) = P(B|A) * P(A) + P(B|¬A) * P(¬A)
P(B) = 0.8 * 0.3 + 0.2 * (1 - 0.3)
P(B) = 0.24 + 0.14
P(B) = 0.38

Now, we can find P(A|B):

P(A|B) = (0.8 * 0.3) / 0.38
P(A|B) = 0.24 / 0.38
P(A|B) ≈ 0.6316

So, the probability of event A occurring given the new evidence B is approximately 63.16%

# Q7. Calculate the 95% confidence interval for a sample of data with a mean of 50 and a standard deviation of 5. Interpret the results.

Given:
Sample mean (x̄) = 50
Sample standard deviation (s) = 5
Sample size (n) = The sample size is not provided, but we'll assume it's large enough for a z-test (e.g., n ≥ 30)

Confidence Interval = x̄ ± Z * (s/√n)

Since the sample size is not provided, we can't determine the exact value of Z for a 95% confidence level. However, the critical value for a 95% confidence level is approximately 1.96 for a large sample (z-score at 95% confidence).

Let's assume the sample size is 100 (just for illustration purposes):

Confidence Interval = 50 ± 1.96 * (5/√100)

Calculations:
Confidence Interval = 50 ± 1.96 * (0.5)
Confidence Interval = 50 ± 0.98

Lower bound of the confidence interval = 50 - 0.98 ≈ 49.02
Upper bound of the confidence interval = 50 + 0.98 ≈ 50.98

Interpretation: We can be 95% confident that the true population mean lies between approximately 49.02 and 50.98.

# Q8. What is the margin of error in a confidence interval? How does sample size affect the margin of error? Provide an example of a scenario where a larger sample size would result in a smaller margin of error.

The margin of error (MOE) in a confidence interval represents the range of values around the sample estimate (e.g., sample mean or proportion) within which the true population parameter is likely to fall with a certain level of confidence. It quantifies the uncertainty associated with estimating the population parameter from a sample.

A larger sample size generally results in a smaller margin of error. This is because as the sample size increases, the sample estimate becomes more precise and closer to the true population parameter. With a larger sample, the sampling distribution becomes narrower, which reduces the spread of the confidence interval and thus decreases the margin of error.

Example scenario: Suppose you want to estimate the average height of students in a school. You collect two samples: Sample X with 50 students and Sample Y with 500 students. Assuming all other conditions are the same, the confidence interval for the population mean height in Sample Y will have a smaller margin of error compared to the confidence interval for the population mean height in Sample X.

# Q9. Calculate the z-score for a data point with a value of 75, a population mean of 70, and a population standard deviation of 5. Interpret the results.

The z-score measures how many standard deviations a data point is away from the population mean. The formula for calculating the z-score is:

z = (X - μ) / σ

Where:
X = The data point's value (75 in this case)
μ = The population mean (70 in this case)
σ = The population standard deviation (5 in this case)

Calculations:
z = (75 - 70) / 5
z = 1

Interpretation: The z-score of 1 indicates that the data point (75) is 1 standard deviation above the population mean (70).

# Q10. In a study of the effectiveness of a new weight loss drug, a sample of 50 participants lost an average of 6 pounds with a standard deviation of 2.5 pounds. Conduct a hypothesis test to determine if the drug is significantly effective at a 95% confidence level using a t-test.

Null Hypothesis (H0): The new weight loss drug has no significant effect on weight loss (μ = 0).
Alternative Hypothesis (Ha): The new weight loss drug is significantly effective (μ < 0, one-tailed test).

Given:
Sample size (n) = 50
Sample mean (x̄) = 6 pounds
Sample standard deviation (s) = 2.5 pounds

Assumptions: The population is approximately normally distributed.

Test statistic for t-test:
t = (x̄ - μ) / (s/√n)

Calculations:
t = (6 - 0) / (2.5/√50)
t = 6 / (2.5/√50)
t ≈ 6 / (2.5/7.07)
t ≈ 6 / 0.354
t ≈ 16.95

Degrees of freedom (df) = n - 1 = 50 - 1 = 49

Critical value for a one-tailed t-test at 95% confidence level with df = 49 is approximately -1.676.

Conclusion: The calculated t-value (16.95) is much larger than the critical value (-1.676). Therefore, we reject the null hypothesis (H0) and conclude that the new weight loss drug is significantly effective at the 95% confidence level in helping participants lose weight.

# Q11. In a survey of 500 people, 65% reported being satisfied with their current job. Calculate the 95% confidence interval for the true proportion of people who are satisfied with their job.

Given:
Sample proportion (p̂) = 65% = 0.65
Sample size (n) = 500

The formula for the confidence interval of a proportion is:

Confidence Interval = p̂ ± Z * √(p̂(1-p̂)/n)

Where:
Z is the critical value from the standard normal distribution corresponding to the desired confidence level (e.g., 1.96 for a 95% confidence level)

Calculations:
Confidence Interval = 0.65 ± 1.96 * √(0.65 * (1 - 0.65) / 500)

Confidence Interval = 0.65 ± 1.96 * √(0.65 * 0.35 / 500)

Confidence Interval = 0.65 ± 1.96 * √(0.2275 / 500)

Confidence Interval = 0.65 ± 1.96 * 0.021342

Lower bound of the confidence interval = 0.65 - 0.0418 ≈ 0.6082
Upper bound of the confidence interval = 0.65 + 0.0418 ≈ 0.6918

Interpretation: We can be 95% confident that the true proportion of people who are satisfied with their job lies between approximately 60.82% and 69.18%.

# Q12. A researcher is testing the effectiveness of two different teaching methods on student performance. Sample A has a mean score of 85 with a standard deviation of 6, while sample B has a mean score of 82 with a standard deviation of 5. Conduct a hypothesis test to determine if the two teaching methods have a significant difference in student performance using a t-test with a significance level of 0.01.

Null Hypothesis (H0): There is no significant difference in student performance between the two teaching methods (μ1 - μ2 = 0).
Alternative Hypothesis (Ha): There is a significant difference in student performance between the two teaching methods (μ1 - μ2 ≠ 0, two-tailed test).

Given:
Sample A mean (x̄1) = 85
Sample A standard deviation (s1) = 6
Sample A size (n1) = The sample size for Sample A is not provided, but we'll assume it's large enough for a t-test (e.g., n1 ≥ 30)

Sample B mean (x̄2) = 82
Sample B standard deviation (s2) = 5
Sample B size (n2) = The sample size for Sample B is not provided, but we'll assume it's large enough for a t-test (e.g., n2 ≥ 30)

Assumptions: The populations are approximately normally distributed, and the samples are independent.

Pooled standard deviation (sp) for the two samples:
sp = √[((n1-1) * s1^2 + (n2-1) * s2^2) / (n1 + n2 - 2)]
sp = √[((n1-1) * 6^2 + (n2-1) * 5^2) / (n1 + n2 - 2)]

Calculate the degrees of freedom (df) for the t-test:
df = n1 + n2 - 2

Test statistic for t-test:
t = (x̄1 - x̄2) / (sp * √(1/n1 + 1/n2))

Calculate the critical value for a t-test at a 0.01 significance level and df = df (from above).

Compare the calculated t-value with the critical value to make a decision about the null hypothesis.

Note: The sample sizes for both samples are required to complete the calculations, but they are not provided in the given information.



# Q13. A population has a mean of 60 and a standard deviation of 8. A sample of 50 observations has a mean of 65. Calculate the 90% confidence interval for the true population mean.

Given:
Population mean (μ) = 60
Population standard deviation (σ) = 8
Sample size (n) = 50
Desired confidence level = 90% (corresponding Z-score is approximately 1.645)

The formula for the confidence interval of a population mean (assuming a large sample or when the population standard deviation is known) is:

Confidence Interval = x̄ ± Z * (σ/√n)

Calculations:
Confidence Interval = 65 ± 1.645 * (8/√50)

Confidence Interval = 65 ± 1.645 * (8/7.07)

Confidence Interval = 65 ± 1.645 * 1.1314

Lower bound of the confidence interval = 65 - 1.8607 ≈ 63.14
Upper bound of the confidence interval = 65 + 1.8607 ≈ 66.86

Interpretation: We can be 90% confident that the true population mean lies between approximately 63.14 and 66.86.

# Q14. In a study of the effects of caffeine on reaction time, a sample of 30 participants had an average reaction time of 0.25 seconds with a standard deviation of 0.05 seconds. Conduct a hypothesis test to determine if the caffeine has a significant effect on reaction time at a 90% confidence level using a t-test.

Null Hypothesis (H0): Caffeine has no significant effect on reaction time (μ = 0).
Alternative Hypothesis (Ha): Caffeine has a significant effect on reaction time (μ ≠ 0, two-tailed test).

Given:
Sample mean (x̄) = 0.25 seconds
Sample standard deviation (s) = 0.05 seconds
Sample size (n) = 30

Assumptions: The population is approximately normally distributed, and the sample is a random sample.

Test statistic for t-test:
t = (x̄ - μ) / (s/√n)

Calculations:
t = (0.25 - 0) / (0.05/√30)
t = 0.25 / (0.05/√30)
t ≈ 0.25 / (0.05/5.48)
t ≈ 0.25 / 0.0091
t ≈ 27.47

Degrees of freedom (df) = n - 1 = 30 - 1 = 29

Critical value for a two-tailed t-test at 90% confidence level with df = 29 is approximately ±1.699.

Conclusion: The calculated t-value (27.47) is much larger than the critical value (±1.699). Therefore, we reject the null hypothesis (H0) and conclude that caffeine has a significant effect on reaction time at the 90% confidence level.