<span style=color:red;font-size:60px>ASSIGNMENT</span>

<span style=color:pink;font-size:60px>STATISTICS ADVANCE-4</span>

<span style=color:green>Q1: What is the difference between a t-test and a z-test? Provide an example scenario where you would
use each type of test.</span>

Ans-

## Difference between T-Test and Z-Test

In statistics, both t-tests and z-tests are used to make inferences about population parameters based on sample data. The main difference lies in the situations in which they are applicable, particularly when dealing with small sample sizes.

### Z-Test

- **Used when:** The population standard deviation is known.
- **Applicable for:** Large sample sizes (usually n > 30).
- **Formula:** \( Z = \frac{{\bar{X} - \mu}}{{\frac{\sigma}{\sqrt{n}}}} \)
  
### T-Test

- **Used when:** The population standard deviation is unknown and must be estimated from the sample.
- **Applicable for:** Small sample sizes (typically n < 30).
- **Formula (One-sample T-Test):** \( t = \frac{{\bar{X} - \mu}}{{\frac{s}{\sqrt{n}}}} \)

### Example Scenario

Let's consider a scenario where we want to test whether the average height of a sample of students differs significantly from the known average height of the entire student population.



In [5]:
import numpy as np
from scipy.stats import zscore, ttest_1samp

# Generate sample data
np.random.seed(42)
sample_heights = np.random.normal(loc=170, scale=5, size=25)  # Assuming a normal distribution with mean 170 and standard deviation 5

# Known population parameters
population_mean = 172
population_std_dev = 8

# Z-Test (assuming population standard deviation is known)
z_scores = zscore(sample_heights)
z_statistic = z_scores.mean()
p_value_z = 2 * (1 - stats.norm.cdf(np.abs(z_statistic)))  # Two-tailed test
print(f'Z-Statistic: {z_statistic}, P-Value: {p_value_z}')

# T-Test (assuming population standard deviation is unknown)
t_statistic, p_value_t = ttest_1samp(sample_heights, popmean=population_mean)
print(f'T-Statistic: {t_statistic}, P-Value: {p_value_t}')


Z-Statistic: 4.605205106145149e-15, P-Value: 0.9999999999999962
T-Statistic: -2.9455138746771814, P-Value: 0.007060822668515461


<span style=color:green>Q2: Differentiate between one-tailed and two-tailed tests.</span>

Ans-

## One-Tailed vs. Two-Tailed Tests

In hypothesis testing, the choice between a one-tailed and a two-tailed test depends on the nature of the research question and the directionality of the expected effect.

### One-Tailed Test

- **Definition:** A statistical test where the critical region is on only one side of the distribution curve.
- **Use:** Applied when there is a specific directional hypothesis, and the interest is only in one side of the distribution (either an increase or a decrease).
- **Example Scenario:** Testing whether a new drug increases the average response time. The one-tailed hypothesis would be: \( H_0: \mu_{\text{old}} \geq \mu_{\text{new}} \) vs. \( H_1: \mu_{\text{old}} < \mu_{\text{new}} \).

### Two-Tailed Test

- **Definition:** A statistical test where the critical region is on both sides of the distribution curve.
- **Use:** Applied when the research question is non-directional, and the interest is in detecting any significant difference (increase or decrease).
- **Example Scenario:** Testing whether a new teaching method has a different effect on test scores. The two-tailed hypothesis would be: \( H_0: \mu_{\text{method}} = \mu_{\text{control}} \) vs. \( H_1: \mu_{\text{method}} \neq \mu_{\text{control}} \).

### Key Differences

- **Critical Region:** One-tailed tests have a critical region on only one side of the distribution, while two-tailed tests have critical regions on both sides.
- **Directionality:** One-tailed tests are used when there is a specific expected direction of the effect, while two-tailed tests are used when the direction of the effect is not specified.

These choices in hypothesis testing depend on the researcher's understanding of the problem and the specific hypotheses being tested.


<span style=color:green>Q3: Explain the concept of Type 1 and Type 2 errors in hypothesis testing. Provide an example scenario for
each type of error.<s/pan>

Ans-

## Type I and Type II Errors in Hypothesis Testing

In hypothesis testing, errors can occur when making decisions about the null hypothesis (\(H_0\)). These errors are classified into two types: Type I and Type II errors.

### Type I Error (False Positive)

- **Definition:** Rejecting the null hypothesis when it is actually true.
- **Symbol:** \( \alpha \) (alpha)
- **Probability of Type I Error:** The significance level (\(\alpha\)) is the probability of making a Type I error.
- **Example Scenario:** A drug testing scenario where the null hypothesis (\(H_0\)) is that the drug has no effect. A Type I error occurs if we wrongly conclude that the drug is effective and approve it for use when, in reality, it has no effect.

### Type II Error (False Negative)

- **Definition:** Failing to reject the null hypothesis when it is actually false.
- **Symbol:** \( \beta \) (beta)
- **Probability of Type II Error:** Dependent on the power of the test (\(1 - \beta\)).
- **Example Scenario:** A medical test for a rare disease where the null hypothesis (\(H_0\)) is that the patient is healthy. A Type II error occurs if we fail to detect the disease in a patient who actually has it.

### Key Concepts

- **Trade-off:** There is often a trade-off between Type I and Type II errors. As the probability of one type of error decreases, the probability of the other type increases.
- **Significance Level (\(\alpha\)):** Researchers set the significance level to control the probability of Type I errors. Common choices include 0.05 and 0.01.
- **Power of the Test:** The ability of a test to detect a true effect and avoid Type II errors is measured by the power of the test.

Understanding and controlling Type I and Type II errors are crucial in hypothesis testing to ensure the reliability of study results and decision-making.


<span style=color:green>Q4:  Explain Bayes's theorem with an example.</span.

Ans-

## Bayes's Theorem

Bayes's theorem is a fundamental concept in probability theory that describes the probability of an event based on prior knowledge of conditions that might be related to the event. It is named after the Reverend Thomas Bayes.

### Formula

Bayes's theorem is mathematically expressed as:

\[ P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)} \]

where:
- \( P(A|B) \) is the probability of event A given that event B has occurred.
- \( P(B|A) \) is the probability of event B given that event A has occurred.
- \( P(A) \) and \( P(B) \) are the probabilities of events A and B, respectively.

### Example Scenario

Let's consider a medical example to illustrate Bayes's theorem:

- **Scenario:** Suppose there is a rare disease (Event A) that occurs in 1% of the population. Additionally, there is a diagnostic test for the disease that has a 90% accuracy rate (Event B). The test produces false positives in 5% of cases.

- **Given:**
  - \( P(A) \): Probability of having the disease = 0.01 (1% of the population)
  - \( P(B|A) \): Probability of testing positive given the disease = 0.90 (90% accuracy)
  - \( P(\neg B|\neg A) \): Probability of testing negative given not having the disease = 0.95 (5% false positive rate)

- **Calculation:**
\[ P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)} \]
\[ P(A|B) = \frac{(0.90 \cdot 0.01)}{(0.90 \cdot 0.01 + 0.05 \cdot 0.99)} \]

- **Result:**
  - The probability of having the disease given a positive test result is calculated using Bayes's theorem.

### Interpretation

Bayes's theorem allows us to update our prior beliefs (prior probability of having the disease) based on new evidence (positive test result). In this example, the probability of having the disease given a positive test result may be different from the accuracy rate of the test alone due to the rarity of the disease and the false positive rate.

Understanding Bayes's theorem is crucial in fields such as medicine, where accurate interpretation of diagnostic test results is essential.


<span style=color:green>Q5: What is a confidence interval? How to calculate the confidence interval, explain with an example.</span>

Ans-

## Confidence Interval

A confidence interval is a range of values that is used to estimate the true value of a population parameter with a certain level of confidence. It provides a way to quantify the uncertainty associated with a sample estimate and helps researchers understand the precision of their findings.

### Calculation of Confidence Interval

The general formula for calculating a confidence interval for a population mean (\(\mu\)) is:

\[ \text{Confidence Interval} = \bar{X} \pm Z \left(\frac{\sigma}{\sqrt{n}}\right) \]

where:
- \(\bar{X}\) is the sample mean,
- \(Z\) is the Z-score corresponding to the desired confidence level,
- \(\sigma\) is the population standard deviation (or the sample standard deviation in the case of small sample sizes),
- \(n\) is the sample size.

### Example Scenario

Let's consider an example of calculating a 95% confidence interval for the average height of a population based on a sample.

```python



In [11]:
import numpy as np
from scipy import stats

# Generate sample data
np.random.seed(42)
sample_heights = np.random.normal(loc=170, scale=5, size=25)  # Assuming a normal distribution with mean 170 and standard deviation 5

# Calculate sample statistics
sample_mean = np.mean(sample_heights)
sample_std_dev = np.std(sample_heights, ddof=1)  # Use ddof=1 for sample standard deviation
sample_size = len(sample_heights)

# Choose confidence level and find Z-score
confidence_level = 0.95
z_score = stats.norm.ppf((1 + confidence_level) / 2)

# Calculate confidence interval
margin_of_error = z_score * (sample_std_dev / np.sqrt(sample_size))
confidence_interval = (sample_mean - margin_of_error, sample_mean + margin_of_error)

print(f"Sample Mean: {sample_mean}")
print(f"Confidence Interval ({confidence_level*100}%): {confidence_interval}")

Sample Mean: 169.18245970670952
Confidence Interval (95.0%): (167.30765017098972, 171.05726924242933)


<span style=color:green>Q6. Use Bayes' Theorem to calculate the probability of an event occurring given prior knowledge of the
event's probability and new evidence. Provide a sample problem and solution.</span>

Ans-

## Using Bayes's Theorem for Probability Calculation

Bayes's Theorem is a powerful tool for updating probabilities based on new evidence. It allows us to revise our beliefs about the likelihood of an event occurring given prior information and observed data.

### Bayes's Theorem Formula

\[ P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)} \]

where:
- \( P(A|B) \) is the posterior probability of event A given evidence B.
- \( P(B|A) \) is the likelihood of evidence B given event A.
- \( P(A) \) is the prior probability of event A.
- \( P(B) \) is the probability of evidence B.

### Sample Problem

Let's consider a classic example involving a medical test for a rare disease.

- **Scenario:**
  - A rare disease (Event A) occurs in 1% of the population (\( P(A) = 0.01 \)).
  - A diagnostic test for the disease (Event B) has a 95% accuracy rate (\( P(B|A) = 0.95 \)).
  - The test produces false positives in 2% of cases (\( P(B|\neg A) = 0.02 \)).

- **Calculation:**
\[ P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)} \]
\[ P(A|B) = \frac{(0.95 \cdot 0.01)}{(0.95 \cdot 0.01 + 0.02 \cdot 0.99)} \]

- **Result:**
  - The calculated probability (\( P(A|B) \)) represents the updated probability of having the disease given a positive test result.

### Interpretation

Bayes's Theorem allows us to combine prior knowledge with new evidence to obtain a more accurate estimate of the probability of an event. In this example, the probability of having the disease is adjusted based on the accuracy of the diagnostic test and the prevalence of the disease in the population.


<span style=color:green>Q7. Calculate the 95% confidence interval for a sample of data with a mean of 50 and a standard deviation
of 5. Interpret the results.</span>

Ans-

## Calculating and Interpreting a 95% Confidence Interval

Let's calculate the 95% confidence interval for a sample with a mean of 50 and a standard deviation of 5. The confidence interval provides a range within which we can reasonably expect the true population mean to lie.

### Given Data

- Sample Mean (\(\bar{X}\)): 50
- Sample Standard Deviation (\(s\)): 5
- Sample Size (\(n\)): The sample size is not provided, and for the calculation, we'll assume a reasonably large sample size.

### Calculation

The formula for the confidence interval is given by:

\[ \text{Confidence Interval} = \bar{X} \pm Z \left(\frac{s}{\sqrt{n}}\right) \]

where \(Z\) is the Z-score corresponding to the desired confidence level. For a 95% confidence interval, \(Z \approx 1.96\).

Let's assume a sample size of 100 for this example:


In [12]:
import numpy as np
from scipy import stats

# Given data
sample_mean = 50
sample_std_dev = 5
sample_size = 100  # Assumed sample size

# Z-score for a 95% confidence interval
z_score = stats.norm.ppf(0.975)

# Calculate confidence interval
margin_of_error = z_score * (sample_std_dev / np.sqrt(sample_size))
confidence_interval = (sample_mean - margin_of_error, sample_mean + margin_of_error)

confidence_interval


(49.02001800772997, 50.97998199227003)

<span style=color:green>Q8. What is the margin of error in a confidence interval? How does sample size affect the margin of error?
Provide an example of a scenario where a larger sample size would result in a smaller margin of error.</span>

Ans-

## Margin of Error in Confidence Interval

The margin of error is a measure of the uncertainty or precision associated with a sample estimate in a confidence interval. It represents the range within which the true population parameter is likely to fall.

### Formula

The margin of error (\(ME\)) in a confidence interval is given by:

\[ ME = Z \left(\frac{s}{\sqrt{n}}\right) \]

where:
- \( Z \) is the Z-score corresponding to the desired confidence level.
- \( s \) is the sample standard deviation.
- \( n \) is the sample size.

### Sample Size and Margin of Error

As the sample size (\( n \)) increases, the margin of error decreases. Larger sample sizes lead to more precise estimates and narrower confidence intervals.

### Example Scenario

Let's consider a scenario where we want to estimate the average time it takes for customers to complete an online survey. We'll compare the margin of error for two sample sizes: 50 and 200.




In [13]:
import numpy as np
from scipy import stats

# Given data
sample_mean = 15  # Assume the average completion time is 15 minutes
sample_std_dev = 3
confidence_level = 0.95

# Z-score for a 95% confidence interval
z_score = stats.norm.ppf((1 + confidence_level) / 2)

# Calculate margin of error for two sample sizes
sample_sizes = [50, 200]
margin_of_errors = [z_score * (sample_std_dev / np.sqrt(sample_size)) for sample_size in sample_sizes]

margin_of_errors

[0.8315422946098067, 0.41577114730490333]

<span style=color:green>Q9. Calculate the z-score for a data point with a value of 75, a population mean of 70, and a population
standard deviation of 5. Interpret the results.</span>

Ans-

## Calculating and Interpreting Z-Score

The z-score is a measure of how many standard deviations a particular data point is from the mean of a distribution. It is calculated using the formula:

\[ Z = \frac{{X - \mu}}{{\sigma}} \]

where:
- \( X \) is the data point's value,
- \( \mu \) is the population mean,
- \( \sigma \) is the population standard deviation.

### Given Data

- Data Point Value (\( X \)): 75
- Population Mean (\( \mu \)): 70
- Population Standard Deviation (\( \sigma \)): 5

### Calculation

Let's calculate the z-score for the given data point:

```python


In [14]:
# Given data
data_point = 75
population_mean = 70
population_std_dev = 5

# Calculate z-score
z_score = (data_point - population_mean) / population_std_dev
z_score


1.0

<span style=color:green>Q10. In a study of the effectiveness of a new weight loss drug, a sample of 50 participants lost an average
of 6 pounds with a standard deviation of 2.5 pounds. Conduct a hypothesis test to determine if the drug is
significantly effective at a 95% confidence level using a t-test.</span>

Ans-

## Hypothesis Test for Weight Loss Drug Effectiveness

In this example, we will conduct a hypothesis test to determine if a new weight loss drug is significantly effective at a 95% confidence level using a t-test.

### Given Data

- Sample Size (\(n\)): 50
- Sample Mean (\(\bar{X}\)): 6 pounds
- Sample Standard Deviation (\(s\)): 2.5 pounds
- Confidence Level: 95%

### Hypotheses

The null hypothesis (\(H_0\)) and alternative hypothesis (\(H_1\)) are formulated as follows:

\[ H_0: \mu = 0 \]
\[ H_1: \mu \neq 0 \]

where \(\mu\) is the population mean weight loss.

### T-Test Calculation

We will use a two-sample t-test since we don't know the population standard deviation. The t-statistic is calculated as:

\[ t = \frac{{\bar{X} - \mu_0}}{{\frac{s}{\sqrt{n}}}} \]

Let's assume a significance level (\(\alpha\)) of 0.05:

```python



In [15]:
import numpy as np
from scipy import stats

# Given data
sample_size = 50
sample_mean = 6
sample_std_dev = 2.5
confidence_level = 0.95
alpha = 0.05

# Degrees of freedom
df = sample_size - 1

# Critical t-value
t_critical = stats.t.ppf(1 - alpha / 2, df)

# Calculate standard error of the mean (SEM)
sem = sample_std_dev / np.sqrt(sample_size)

# Calculate t-statistic
t_statistic = (sample_mean - 0) / sem

# Calculate p-value
p_value = 2 * (1 - stats.t.cdf(np.abs(t_statistic), df))

t_statistic, p_value

(16.970562748477143, 0.0)

<span style=color:green>Q11. In a survey of 500 people, 65% reported being satisfied with their current job. Calculate the 95%
confidence interval for the true proportion of people who are satisfied with their job.

</span>

Ans-

## Calculating 95% Confidence Interval for Job Satisfaction

In this example, we will calculate the 95% confidence interval for the true proportion of people who are satisfied with their job based on survey data.

### Given Data

- Sample Size (\(n\)): 500
- Sample Proportion (\(p\)): 65% or 0.65
- Confidence Level: 95%

### Confidence Interval Formula

The formula for calculating the confidence interval for a proportion is given by:

\[ \text{Confidence Interval} = \hat{p} \pm Z \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} \]

where:
- \(\hat{p}\) is the sample proportion,
- \(Z\) is the Z-score corresponding to the desired confidence level,
- \(n\) is the sample size.

Let's calculate the confidence interval:


In [16]:
import numpy as np
from scipy import stats

# Given data
sample_proportion = 0.65
sample_size = 500
confidence_level = 0.95

# Z-score for a 95% confidence interval
z_score = stats.norm.ppf((1 + confidence_level) / 2)

# Calculate confidence interval
margin_of_error = z_score * np.sqrt((sample_proportion * (1 - sample_proportion)) / sample_size)
confidence_interval = (sample_proportion - margin_of_error, sample_proportion + margin_of_error)

confidence_interval


(0.6081925393809212, 0.6918074606190788)

<span style=color:green>Q12. A researcher is testing the effectiveness of two different teaching methods on student performance.
Sample A has a mean score of 85 with a standard deviation of 6, while sample B has a mean score of 82
with a standard deviation of 5. Conduct a hypothesis test to determine if the two teaching methods have a
significant difference in student performance using a t-test with a significance level of 0.01.</span>

Ans-

## Hypothesis Test for Difference in Teaching Methods

In this example, we will conduct a hypothesis test to determine if there is a significant difference in student performance between two teaching methods (Sample A and Sample B) using a t-test.

### Given Data

For Sample A:
- Sample Mean (\(\bar{X}_A\)): 85
- Sample Standard Deviation (\(s_A\)): 6
- Sample Size (\(n_A\)): Not provided

For Sample B:
- Sample Mean (\(\bar{X}_B\)): 82
- Sample Standard Deviation (\(s_B\)): 5
- Sample Size (\(n_B\)): Not provided

Significance Level (\(\alpha\)): 0.01

### Hypotheses

The null hypothesis (\(H_0\)) and alternative hypothesis (\(H_1\)) are formulated as follows:

\[ H_0: \mu_A = \mu_B \]
\[ H_1: \mu_A \neq \mu_B \]

where \(\mu_A\) and \(\mu_B\) are the population means for the two teaching methods.

### T-Test Calculation

We will conduct a two-sample t-test assuming unequal variances. The t-statistic is calculated as:

\[ t = \frac{{\bar{X}_A - \bar{X}_B}}{{\sqrt{\frac{{s_A^2}}{{n_A}} + \frac{{s_B^2}}{{n_B}}}}} \]

Let's assume equal sample sizes for simplicity and a significance level of 0.01:



In [17]:
import numpy as np
from scipy import stats

# Given data for Sample A
mean_A = 85
std_dev_A = 6

# Given data for Sample B
mean_B = 82
std_dev_B = 5

# Assume equal sample sizes for simplicity
sample_size_A = 30
sample_size_B = 30

# Calculate t-statistic
t_statistic, p_value = stats.ttest_ind_from_stats(mean_A, std_dev_A, sample_size_A, mean_B, std_dev_B, sample_size_B)

t_statistic, p_value

(2.1038606199548298, 0.03973697161571063)

<span style=color:green>Q13. A population has a mean of 60 and a standard deviation of 8. A sample of 50 observations has a mean
of 65. Calculate the 90% confidence interval for the true population mean.,</span>

Ans-

## Calculating 90% Confidence Interval for Population Mean

In this example, we will calculate the 90% confidence interval for the true population mean based on a sample of 50 observations.

### Given Data

- Population Mean (\(\mu\)): 60
- Population Standard Deviation (\(\sigma\)): 8
- Sample Size (\(n\)): 50
- Sample Mean (\(\bar{X}\)): 65
- Confidence Level: 90%

### Confidence Interval Formula

The formula for calculating the confidence interval for a population mean is given by:

\[ \text{Confidence Interval} = \bar{X} \pm Z \left(\frac{\sigma}{\sqrt{n}}\right) \]

where:
- \(\bar{X}\) is the sample mean,
- \(Z\) is the Z-score corresponding to the desired confidence level,
- \(\sigma\) is the population standard deviation,
- \(n\) is the sample size.

Let's calculate the confidence interval:


In [18]:
import numpy as np
from scipy import stats

# Given data
population_mean = 60
population_std_dev = 8
sample_size = 50
sample_mean = 65
confidence_level = 0.90

# Z-score for a 90% confidence interval
z_score = stats.norm.ppf((1 + confidence_level) / 2)

# Calculate confidence interval
margin_of_error = z_score * (population_std_dev / np.sqrt(sample_size))
confidence_interval = (sample_mean - margin_of_error, sample_mean + margin_of_error)

confidence_interval


(63.13906055411732, 66.86093944588268)

<span style=color:green>Q14. In a study of the effects of caffeine on reaction time, a sample of 30 participants had an average
reaction time of 0.25 seconds with a standard deviation of 0.05 seconds. Conduct a hypothesis test to
determine if the caffeine has a significant effect on reaction time at a 90% confidence level using a t-test.</span>

Ans-

## Hypothesis Test for Caffeine's Effect on Reaction Time

In this example, we will conduct a hypothesis test to determine if caffeine has a significant effect on reaction time using a t-test at a 90% confidence level.

### Given Data

- Sample Size (\(n\)): 30
- Sample Mean Reaction Time (\(\bar{X}\)): 0.25 seconds
- Sample Standard Deviation (\(s\)): 0.05 seconds
- Confidence Level: 90%

### Hypotheses

The null hypothesis (\(H_0\)) and alternative hypothesis (\(H_1\)) are formulated as follows:

\[ H_0: \mu = 0 \]
\[ H_1: \mu \neq 0 \]

where \(\mu\) is the population mean reaction time.

### T-Test Calculation

We will conduct a one-sample t-test. The t-statistic is calculated as:

\[ t = \frac{{\bar{X} - \mu_0}}{{\frac{s}{\sqrt{n}}}} \]

Let's assume a significance level (\(\alpha\)) of 0.10:



In [20]:
import numpy as np
from scipy import stats

# Given data
sample_mean = 0.25
sample_std_dev = 0.05
sample_size = 30
confidence_level = 0.90
alpha = 0.10

# Degrees of freedom
df = sample_size - 1

# Critical t-value
t_critical = stats.t.ppf(1 - alpha / 2, df)

# Calculate standard error of the mean (SEM)
sem = sample_std_dev / np.sqrt(sample_size)

# Calculate t-statistic
t_statistic = (sample_mean - 0) / sem

# Calculate p-value
p_value = 2 * (1 - stats.t.cdf(np.abs(t_statistic), df))

t_statistic, p_value


(27.386127875258307, 0.0)