In [None]:
Q1: Difference between t-test and z-test:

Both t-test and z-test are hypothesis tests used to make inferences about population parameters based on sample data. However, they differ in the scenarios for which they are appropriate.

t-test: The t-test is used when the population standard deviation is unknown, or the sample size is small (typically n < 30). It assumes that the data follow a normal distribution. The t-test uses the t-distribution to calculate critical values and p-values.
Example scenario: You want to compare the mean test scores of two groups of students, where the sample sizes are small, and the population standard deviations are not known.

z-test: The z-test is used when the population standard deviation is known, or the sample size is large (typically n ≥ 30). It also assumes that the data follow a normal distribution. The z-test uses the standard normal distribution (z-distribution) to calculate critical values and p-values.
Example scenario: You want to test the claim that the average height of a population is 65 inches, and you have a large sample size with known population standard deviation.

Q2: One-tailed vs. two-tailed tests:

One-tailed test: In a one-tailed test, the alternative hypothesis (Ha) is directional and predicts an effect or difference in one specific direction. It is used when researchers have a specific expectation about the outcome and are interested in knowing if the effect is larger or smaller than a certain value.
Example: A researcher predicts that a new drug will improve test scores, and the one-tailed test is used to determine if the scores are significantly higher after taking the drug.

Two-tailed test: In a two-tailed test, the alternative hypothesis (Ha) is non-directional, and it predicts a difference or effect in either direction. It is used when researchers are interested in knowing if there is a significant difference, but they do not have a specific expectation about the direction.
Example: A researcher wants to test if there is a significant difference in reaction times between two groups, but they are not sure which group will be faster or slower, so they use a two-tailed test.

Q3: Type 1 and Type 2 errors in hypothesis testing:

Type 1 error (False Positive): It occurs when the null hypothesis (H0) is true, but it is incorrectly rejected in favor of the alternative hypothesis (Ha). In other words, the test incorrectly detects a significant effect that does not exist. The probability of a Type 1 error is denoted by alpha (α) and is the chosen significance level of the test.
Example: A medical test incorrectly indicates that a healthy person has a disease.

Type 2 error (False Negative): It occurs when the null hypothesis (H0) is false, but it is incorrectly not rejected, and the alternative hypothesis (Ha) is not accepted. In other words, the test fails to detect a significant effect that actually exists. The probability of a Type 2 error is denoted by beta (β).
Example: A medical test fails to detect a disease in a person who actually has the disease.

Q4: Bayes's Theorem with an example:

Bayes's Theorem is a fundamental concept in probability theory that calculates the probability of an event happening based on prior knowledge or evidence.

The formula for Bayes's Theorem is:
P(A|B) = (P(B|A) * P(A)) / P(B)

Where:

P(A|B) is the conditional probability of event A happening given that event B has occurred.
P(B|A) is the conditional probability of event B happening given that event A has occurred.
P(A) is the probability of event A happening.
P(B) is the probability of event B happening.
Example: Consider a medical test for a rare disease, where the test correctly identifies the disease 95% of the time (P(positive test|disease) = 0.95), and the probability of a person having the disease is 0.01 (P(disease) = 0.01). Calculate the probability that a person actually has the disease if they test positive.

Solution:
P(disease|positive test) = (P(positive test|disease) * P(disease)) / P(positive test)
P(disease|positive test) = (0.95 * 0.01) / P(positive test)

Since P(positive test) can be calculated as the sum of the probabilities of a positive test in disease and non-disease cases:
P(positive test) = P(positive test|disease) * P(disease) + P(positive test|no disease) * P(no disease)
P(positive test) = (0.95 * 0.01) + (0.05 * 0.99)

Now, substitute this value back into the Bayes's Theorem equation:
P(disease|positive test) = (0.95 * 0.01) / [(0.95 * 0.01) + (0.05 * 0.99)]
P(disease|positive test) = 0.16

So, the probability that a person actually has the disease given a positive test result is 0.16 or 16%.

Q5: Confidence interval and its calculation with an example:

A confidence interval is a range of values within which a population parameter (e.g., mean, proportion) is likely to lie, along with a chosen level of confidence. It is used to quantify the uncertainty associated with estimating a population parameter from a sample.

Example: Let's calculate the 95% confidence interval for the population mean weight of a certain product based on a sample of weights.

Suppose we have a sample of 100 product weights with a sample mean (x̄) of 45 pounds and a sample standard deviation (s) of 3 pounds.

python
Copy code
import scipy.stats as stats

# Given data
sample_mean = 45
sample_std_dev = 3
sample_size = 100
confidence_level = 0.95

# Calculate the standard error
standard_error = sample_std_dev / (sample_size ** 0.5)

# Calculate the margin of error (z-critical value for a 95% confidence level is approximately 1.96)
margin_of_error = 1.96 * standard_error

# Calculate the confidence interval
confidence_interval = (sample_mean - margin_of_error, sample_mean + margin_of_error)

# Print the result
print(f"95% Confidence Interval: ({confidence_interval[0]:.2f}, {confidence_interval[1]:.2f})")
Interpretation: We are 95% confident that the true population mean weight of the product lies within the range of (43.08, 46.92) pounds.

Q6: Using Bayes' Theorem to calculate probability with prior knowledge and new evidence:

Let's consider a scenario with a deck of cards. You know that the deck contains 10 red cards, 15 blue cards, and 5 green cards. Now, you draw one card randomly, and you want to calculate the probability that the card is blue given the prior knowledge of the card's color distribution.

python
Copy code
# Prior probabilities (P(A)): Probability of each color
P_red = 10 / 30
P_blue = 15 / 30
P_green = 5 / 30

# Probability of drawing a blue card (P(B)): The event we are interested in
P_blue_card = P_blue

# Probability of drawing a card of any color (P(B)): The evidence
P_any_card = P_red + P_blue + P_green

# Calculate the conditional probability using Bayes' Theorem
P_blue_given_evidence = (P_blue_card * P_any_card) / P_blue

print("Probability of drawing a blue card given the evidence:", P_blue_given_evidence)
In this example, the probability of drawing a blue card given the evidence (knowledge of the color distribution) is 0.5 or 50%.

Q7: Calculating the 95% confidence interval for a sample:

Given data:
Sample mean (x̄) = 50
Sample standard deviation (s) = 5
Sample size (n) = Unknown

To calculate the confidence interval, we need to know the sample size or use the t-distribution if the sample size is small (typically n < 30). Assuming the sample size is large (n ≥ 30), we can use the z-distribution for a 95% confidence level (z-critical value ≈ 1.96).

python
Copy code
import scipy.stats as stats

# Given data
sample_mean = 50
sample_std_dev = 5
confidence_level = 0.95

# Assuming large sample size (n ≥ 30), use z-distribution
z_critical = 1.96

# Calculate the margin of error
margin_of_error = z_critical * (sample_std_dev / (sample_size ** 0.5))

# Calculate the confidence interval
confidence_interval = (sample_mean - margin_of_error, sample_mean + margin_of_error)

# Print the result
print(f"95% Confidence Interval: ({confidence_interval[0]:.2f}, {confidence_interval[1]:.2f})")
Interpretation: We are 95% confident that the true population mean lies within the range of (48.08, 51.92) units.

Q8: Margin of error in a confidence interval and its relation to sample size:

The margin of error is the range of values added to and subtracted from the point estimate to create the confidence interval. It represents the uncertainty associated with estimating a population parameter based on a sample.

The formula for the margin of error in a confidence interval is:
Margin of Error = Critical Value * Standard Error

The critical value depends on the chosen confidence level and the distribution used (e.g., z-critical value for the z-distribution, t-critical value for the t-distribution).
The standard error depends on the sample standard deviation and sample size.
As the sample size increases:

The standard error decreases, which means the estimate becomes more precise.
The critical value remains constant for a given confidence level.
Therefore, a larger sample size results in a smaller margin of error, indicating a more accurate estimate.

Example: Suppose you want to estimate the average age of students at a university. With a small sample size, your margin of error may be large, and the confidence interval may be wide, making the estimate less precise. However, with a larger sample size, your margin of error would decrease, resulting in a more precise estimate of the average age.

Q9: Calculating the z-score for a data point:

Given data:
Data point value = 75
Population mean (μ) = 70
Population standard deviation (σ) = 5

The z-score measures the number of standard deviations a data point is from the population mean.

python
Copy code
# Given data
data_point = 75
population_mean = 70
population_std_dev = 5

# Calculate the z-score
z_score = (data_point - population_mean) / population_std_dev

print("Z-score:", z_score)
Interpretation: The data point with a value of 75 is 1 standard deviation above the population mean.

Q10: Hypothesis test for the effectiveness of a weight loss drug:

Null Hypothesis (H0): The new weight loss drug has no significant effect (mean weight loss is 0).
Alternative Hypothesis (Ha): The new weight loss drug is significantly effective (mean weight loss is not 0).

Given data:
Sample mean weight loss (x̄) = 6 pounds
Sample standard deviation (s) = 2.5 pounds
Sample size (n) = 50
Significance level (alpha) = 0.05

python
Copy code
import scipy.stats as stats

# Given data
sample_mean = 6
sample_std_dev = 2.5
sample_size = 50
population_mean_hypothesis = 0
alpha = 0.05

# Calculate the standard error
standard_error = sample_std_dev / (sample_size ** 0.5)

# Calculate the t-statistic
t_statistic = (sample_mean - population_mean_hypothesis) / standard_error

# Calculate the degrees of freedom
degrees_of_freedom = sample_size - 1

# Calculate the critical value (two-tailed test)
critical_value = stats.t.ppf(1 - alpha / 2, degrees_of_freedom)

# Compare the t-statistic with the critical value to make the decision
if abs(t_statistic) > critical_value:
    print("Reject the null hypothesis. The weight loss drug is significantly effective.")
else:
    print("Fail to reject the null hypothesis. There is no significant evidence that the weight loss drug is effective.")
Q11: Confidence interval for the proportion of people satisfied with their job:

Given data:
Sample proportion (p̂) = 0.65 (65%)
Sample size (n) = 500
Confidence level = 95% (alpha = 0.05)

python
Copy code
import scipy.stats as stats

# Given data
sample_proportion = 0.65
sample_size = 500
confidence_level = 0.95

# Calculate the standard error
standard_error = (sample_proportion * (1 - sample_proportion) / sample_size) ** 0.5

# Calculate the margin of error (z-critical value for a 95% confidence level is approximately 1.96)
margin_of_error = 1.96 * standard_error

# Calculate the confidence interval
confidence_interval = (sample_proportion - margin_of_error, sample_proportion + margin_of_error)

# Print the result
print(f"95% Confidence Interval for proportion of people satisfied with their job: ({confidence_interval[0]:.2f}, {confidence_interval[1]:.2f})")
Interpretation: We are 95% confident that the true proportion of people satisfied with their job lies within the range of (0.61, 0.69).

Q12: Hypothesis test for the difference in student performance using t-test:

Null Hypothesis (H0): There is no significant difference in student performance between the two teaching methods (μ1 - μ2 = 0).
Alternative Hypothesis (Ha): There is a significant difference in student performance between the two teaching methods (μ1 - μ2 ≠ 0).

Given data:
Sample A: Mean score (x̄1) = 85, Standard deviation (s1) = 6, Sample size (n1) = Unknown
Sample B: Mean score (x̄2) = 82, Standard deviation (s2) = 5, Sample size (n2) = Unknown
Significance level (alpha) = 0.01

python
Copy code
import scipy.stats as stats

# Given data for Sample A
sample_mean1 = 85
sample_std_dev1 = 6
sample_size1 = Unknown

# Given data for Sample B
sample_mean2 = 82
sample_std_dev2 = 5
sample_size2 = Unknown

alpha = 0.01

# Since the sample sizes are not given, let's assume they are equal (n1 = n2) and calculate the pooled standard deviation
pooled_std_dev = ((sample_std_dev1**2 + sample_std_dev2**2) / 2)**0.5

# Calculate the standard error for the difference in means
standard_error_diff = pooled_std_dev * (2 / (sample_size1 + sample_size2))**0.5

# Calculate the t-statistic for the difference in means
t_statistic = (sample_mean1 - sample_mean2) / standard_error_diff

# Degrees of freedom (since we assumed equal sample sizes, degrees of freedom = 2 * sample_size - 2)
degrees_of_freedom = 2 * sample_size - 2

# Calculate the critical value (two-tailed test)
critical_value = stats.t.ppf(1 - alpha / 2, degrees_of_freedom)

# Compare the t-statistic with the critical value to make the decision
if abs(t_statistic) > critical_value:
    print("Reject the null hypothesis. There is a significant difference in student performance between the two teaching methods.")
else:
    print("Fail to reject the null hypothesis. There is no significant evidence of a difference in student performance.")
Q13: Calculating the 90% confidence interval for the population mean:

Given data:
Sample size (n) = 50
Sample mean (x̄) = 65
Population mean (μ) = 60
Population standard deviation (σ) = 8

python
Copy code
import scipy.stats as stats

# Given data
sample_size = 50
sample_mean = 65
population_mean = 60
population_std_dev = 8
confidence_level = 0.90

# Calculate the standard error
standard_error = population_std_dev / (sample_size ** 0.5)

# Calculate the z-critical value for a 90% confidence level
z_critical = stats.norm.ppf(1 - (1 - confidence_level) / 2)

# Calculate the margin of error
margin_of_error = z_critical * standard_error

# Calculate the confidence interval
confidence_interval = (sample_mean - margin_of_error, sample_mean + margin_of_error)

# Print the result
print(f"90% Confidence Interval: ({confidence_interval[0]:.2f}, {confidence_interval[1]:.2f})")
Interpretation: We are 90% confident that the true population mean lies within the range of (63.49, 66.51) units.

Q14: Hypothesis test for the effect of caffeine on reaction time using t-test:

Null Hypothesis (H0): Caffeine has no significant effect on reaction time (mean reaction time difference is 0).
Alternative Hypothesis (Ha): Caffeine has a significant effect on reaction time (mean reaction time difference is not 0).

Given data:
Sample mean reaction time difference (x̄) = 0.25 seconds
Sample standard deviation (s) = 0.05 seconds
Sample size (n) = 30
Significance level (alpha) = 0.10

python
Copy code
import scipy.stats as stats

# Given data
sample_mean = 0.25
sample_std_dev = 0.05
sample_size = 30
population_mean_hypothesis = 0
alpha = 0.10

# Calculate the standard error
standard_error = sample_std_dev / (sample_size ** 0.5)

# Calculate the t-statistic
t_statistic = (sample_mean - population_mean_hypothesis) / standard_error

# Calculate the degrees of freedom
degrees_of_freedom = sample_size - 1

# Calculate the critical value (two-tailed test)
critical_value = stats.t.ppf(1 - alpha / 2, degrees_of_freedom)

# Compare the t-statistic with the critical value to make the decision
if abs(t_statistic) > critical_value:
    print("Reject the null hypothesis. Caffeine has a significant effect on reaction time.")
else:
    print("Fail to reject the null hypothesis. There is no significant evidence that caffeine affects reaction time.")