## Q1: What is the difference between a t-test and a z-test? Provide an example scenario where you would use each type of test.

### Answer:
A t-test is used when the sample size is small (typically n < 30) and the population standard deviation is unknown. A z-test is used when the sample size is large (n >= 30) or the population standard deviation is known.

### Example scenario for t-test:
Comparing the means of two small independent samples to see if they come from the same population.

### Example scenario for z-test:
Comparing the sample mean to the population mean to determine if there is a significant difference when the sample size is large.

## Q2: Differentiate between one-tailed and two-tailed tests

### Answer:
- A **one-tailed test** checks for a relationship in one direction (either greater than or less than).
- A **two-tailed test** checks for a relationship in both directions (both greater than and less than).


## Q3: Explain the concept of Type 1 and Type 2 errors in hypothesis testing. Provide an example scenario for each type of error.

### Answer:
- **Type 1 Error:** Rejecting the null hypothesis when it is actually true (false positive).
- **Type 2 Error:** Failing to reject the null hypothesis when it is actually false (false negative).

### Example of Type 1 Error:
Concluding a new drug is effective when it is not.

### Example of Type 2 Error:
Concluding a new drug is not effective when it is.

## Q4: Explain Bayes's theorem with an example.

### Answer:
Bayes's theorem describes the probability of an event based on prior knowledge of conditions related to the event. 

### Formula:
\[ P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)} \]

### Example:
Probability of having a disease given a positive test result.

# Q5: What is a confidence interval? How to calculate the confidence interval, explain with an example.



In [1]:
# Example calculation of 95% confidence interval for a sample mean
import numpy as np
import scipy.stats as stats

mean = 50
std_dev = 5
n = 30
confidence_level = 0.95
z_score = stats.norm.ppf(1 - (1 - confidence_level) / 2)
margin_of_error = z_score * (std_dev / np.sqrt(n))
confidence_interval = (mean - margin_of_error, mean + margin_of_error)
confidence_interval

(48.210805856282846, 51.789194143717154)

## Q6: Use Bayes' Theorem to calculate the probability of an event occurring given prior knowledge of the event's probability and new evidence. Provide a sample problem and solution.

### Example:
- Prior probability of having the disease (P(A)) = 0.01
- Probability of a positive test given the disease (P(B|A)) = 0.95
- Probability of a positive test given no disease (P(B|not A)) = 0.1



In [4]:
### Calculation:
P_A = 0.01  # Prior probability of having the disease
P_B_given_A = 0.95  # Probability of a positive test given the disease
P_B_given_not_A = 0.1  # Probability of a positive test given no disease
P_not_A = 1 - P_A

P_B = P_B_given_A * P_A + P_B_given_not_A * P_not_A
P_A_given_B = (P_B_given_A * P_A) / P_B
P_A_given_B


0.08755760368663594

In [5]:
## Q7: Calculate the 95% confidence interval for a sample of data with a mean of 50 and a standard deviation of 5. Interpret the results.

mean = 50
std_dev = 5
n = 30
confidence_level = 0.95
z_score = stats.norm.ppf(1 - (1 - confidence_level) / 2)
margin_of_error = z_score * (std_dev / np.sqrt(n))
confidence_interval = (mean - margin_of_error, mean + margin_of_error)
confidence_interval


(48.210805856282846, 51.789194143717154)

## Q8: What is the margin of error in a confidence interval? How does sample size affect the margin of error? Provide an example of a scenario where a larger sample size would result in a smaller margin of error.

### Answer:
The margin of error is the range of values below and above the sample statistic in a confidence interval. 

### Example of how sample size affects margin of error:


In [6]:
mean = 50
std_dev = 5
confidence_level = 0.95
sample_size_small = 30
sample_size_large = 100

margin_of_error_small = stats.norm.ppf(1 - (1 - confidence_level) / 2) * (std_dev / np.sqrt(sample_size_small))
margin_of_error_large = stats.norm.ppf(1 - (1 - confidence_level) / 2) * (std_dev / np.sqrt(sample_size_large))
margin_of_error_small, margin_of_error_large


(1.7891941437171572, 0.979981992270027)

In [7]:
# Q9: Calculate the z-score for a data point with a value of 75, a population mean of 70, and a population standard deviation of 5. Interpret the results.

value = 75
population_mean = 70
population_std_dev = 5
z_score = (value - population_mean) / population_std_dev
z_score


1.0

## Q10: In a study of the effectiveness of a new weight loss drug, a sample of 50 participants lost an average of 6 pounds with a standard deviation of 2.5 pounds. Conduct a hypothesis test to determine if the drug is significantly effective at a 95% confidence level using a t-test.



In [8]:
sample_mean = 6
sample_std_dev = 2.5
n = 50
population_mean = 0  # Assuming no weight loss as null hypothesis
confidence_level = 0.95
t_statistic, p_value = stats.ttest_1samp(a=np.random.normal(sample_mean, sample_std_dev, n), popmean=population_mean)
t_statistic, p_value


(16.02989046359322, 3.99970077587776e-21)

## Q11: In a survey of 500 people, 65% reported being satisfied with their current job. Calculate the 95% confidence interval for the true proportion of people who are satisfied with their job.



In [9]:
p_hat = 0.65
n = 500
confidence_level = 0.95
z_score = stats.norm.ppf(1 - (1 - confidence_level) / 2)
margin_of_error = z_score * np.sqrt((p_hat * (1 - p_hat)) / n)
confidence_interval = (p_hat - margin_of_error, p_hat + margin_of_error)
confidence_interval


(0.6081925393809212, 0.6918074606190788)

## Q12: A researcher is testing the effectiveness of two different teaching methods on student performance. Sample A has a mean score of 85 with a standard deviation of 6, while sample B has a mean score of 82 with a standard deviation of 5. Conduct a hypothesis test to determine if the two teaching methods have a significant difference in student performance using a t-test with a significance level of 0.01.



In [10]:
mean_A = 85
std_dev_A = 6
n_A = 30
mean_B = 82
std_dev_B = 5
n_B = 30
t_statistic, p_value = stats.ttest_ind_from_stats(mean1=mean_A, std1=std_dev_A, nobs1=n_A, mean2=mean_B, std2=std_dev_B, nobs2=n_B, equal_var=False)
t_statistic, p_value


(2.1038606199548298, 0.03987998118234142)

## Q13: A population has a mean of 60 and a standard deviation of 8. A sample of 50 observations has a mean of 65. Calculate the 90% confidence interval for the true population mean.



In [11]:
sample_mean = 65
sample_std_dev = 8
n = 50
confidence_level = 0.90
t_score = stats.t.ppf(1 - (1 - confidence_level) / 2, df=n-1)
margin_of_error = t_score * (sample_std_dev / np.sqrt(n))
confidence_interval = (sample_mean - margin_of_error, sample_mean + margin_of_error)
confidence_interval


(63.10319919251691, 66.89680080748309)

## Q14: In a study of the effects of caffeine on reaction time, a sample of 30 participants had an average reaction time of 0.25 seconds with a standard deviation of 0.05 seconds. Conduct a hypothesis test to determine if the caffeine has a significant effect on reaction time at a 90% confidence level using a t-test.



In [12]:
sample_mean = 0.25
sample_std_dev = 0.05
n = 30
population_mean = 0.3  # Assuming no effect on reaction time as null hypothesis
confidence_level = 0.90
t_statistic, p_value = stats.ttest_1samp(a=np.random.normal(sample_mean, sample_std_dev, n), popmean=population_mean)
t_statistic, p_value


(-5.075088368960469, 2.058758348313947e-05)