Q1: What is Estimation Statistics? Explain point estimate and interval estimate.

Estimation statistics is a branch of statistics that involves using sample data to make inferences or estimates about population parameters. Estimation allows us to draw conclusions about characteristics of populations based on information obtained from samples. There are two main types of estimation in statistics: point estimation and interval estimation.

**1. Point Estimate:**
A point estimate is a single value that serves as the best guess or estimate of a population parameter. It is obtained by using sample data to calculate a statistic that is used to estimate the population parameter of interest. Common point estimates include the sample mean, sample proportion, sample median, etc.

For example, if we want to estimate the population mean (\(\mu\)) based on a sample mean (\(\bar{x}\)), then \(\bar{x}\) serves as the point estimate of \(\mu\).

**2. Interval Estimate:**
An interval estimate provides a range of values within which the population parameter is likely to lie, along with a level of confidence associated with that range. It is obtained by calculating a confidence interval based on sample data. The confidence interval consists of two values: the lower bound and the upper bound.

For example, if we want to estimate the population mean (\(\mu\)) based on a sample mean (\(\bar{x}\)), we can calculate a confidence interval around \(\bar{x}\) within which we believe the true population mean (\(\mu\)) lies with a specified level of confidence (e.g., 95% confidence interval).

**Importance of Estimation:**
- Estimation allows us to make informed decisions and draw conclusions about populations based on limited sample data.
- It provides a way to quantify the uncertainty associated with our estimates by providing both point estimates and interval estimates.
- Estimation is crucial for hypothesis testing, model building, forecasting, and decision-making in various fields such as business, economics, healthcare, and social sciences.

Q2. Write a Python function to estimate the population mean using a sample mean and standard deviation.

def estimate_population_mean(sample_mean, sample_std_dev, sample_size):
    """
    Estimate the population mean using a sample mean, sample standard deviation,
    and sample size.

    Parameters:
        sample_mean (float): The sample mean.
        sample_std_dev (float): The sample standard deviation.
        sample_size (int): The sample size.

    Returns:
        float: The estimated population mean.
    """
    # Calculate the standard error of the mean (SEM)
    sem = sample_std_dev / (sample_size ** 0.5)
    
    # Use the sample mean and SEM to estimate the population mean
    population_mean = sample_mean
    
    return population_mean

# Example usage:
sample_mean = 50
sample_std_dev = 10
sample_size = 100
estimated_population_mean = estimate_population_mean(sample_mean, sample_std_dev, sample_size)
print("Estimated Population Mean:", estimated_population_mean)

Q3: What is Hypothesis testing? Why is it used? State the importance of Hypothesis testing.

Hypothesis testing is a statistical method used to make inferences about population parameters based on sample data. It involves formulating a hypothesis about the population parameter of interest, collecting sample data, and using statistical techniques to evaluate the likelihood of the observed sample data under the assumption of the null hypothesis.

In hypothesis testing, there are typically two competing hypotheses:
1. **Null Hypothesis (H0)**: The null hypothesis is the default assumption that there is no significant difference or effect. It represents the status quo or the absence of an effect.
2. **Alternative Hypothesis (H1 or Ha)**: The alternative hypothesis is the assertion that there is a significant difference or effect. It represents the claim or hypothesis that the researcher wants to test.

The goal of hypothesis testing is to determine whether there is enough evidence to reject the null hypothesis in favor of the alternative hypothesis, based on the observed sample data.

**Importance of Hypothesis Testing:**
1. **Decision Making**: Hypothesis testing provides a systematic framework for making decisions based on sample data. It allows researchers to evaluate competing hypotheses and draw conclusions about population parameters.
2. **Inference**: Hypothesis testing allows researchers to infer whether observed differences or effects in sample data are statistically significant and likely to generalize to the population.
3. **Scientific Research**: Hypothesis testing is widely used in scientific research to test theories, hypotheses, and predictions. It helps researchers validate or refute theories based on empirical evidence.
4. **Quality Control**: In manufacturing and quality control processes, hypothesis testing is used to evaluate product quality, monitor process performance, and detect deviations from expected standards.
5. **Policy Evaluation**: Hypothesis testing is used in policy evaluation and program assessment to determine the effectiveness of interventions, policies, or programs.
6. **Business Decision Making**: In business and economics, hypothesis testing is used to evaluate marketing strategies, assess the impact of new products or services, and make strategic decisions based on empirical evidence.
7. **Legal and Regulatory Compliance**: Hypothesis testing is used in legal and regulatory contexts to evaluate claims, assess compliance with regulations, and make judgments based on evidence.

Q4. Create a hypothesis that states whether the average weight of male college students is greater than the average weight of female college students.

**Null Hypothesis (H0):** The average weight of male college students is equal to or less than the average weight of female college students.

**Alternative Hypothesis (Ha):** The average weight of male college students is greater than the average weight of female college students.

In symbols:
- H0: μ_male <= μ_female
- Ha: μ_male > μ_female

Here:
- \( \mu_{\text{male}} \) represents the population mean weight of male college students.
- \( \mu_{\text{female}} \) represents the population mean weight of female college students.

This hypothesis test aims to determine if there is sufficient evidence to reject the null hypothesis in favor of the alternative hypothesis, suggesting that male college students have a greater average weight than female college students.

Q5. Write a Python script to conduct a hypothesis test on the difference between two population means, given a sample from each population.

from scipy import stats

def two_sample_t_test(sample1, sample2, alpha=0.05):
    """
    Perform a two-sample t-test to compare the means of two populations.

    Parameters:
        sample1 (array-like): Sample data from population 1.
        sample2 (array-like): Sample data from population 2.
        alpha (float): Significance level for the test (default is 0.05).

    Returns:
        tuple: t-statistic, p-value, and whether to reject the null hypothesis.
    """
    t_statistic, p_value = stats.ttest_ind(sample1, sample2)
    
    # Determine whether to reject the null hypothesis
    reject_null = p_value < alpha
    
    return t_statistic, p_value, reject_null

# Example usage:
sample1 = [85, 90, 88, 92, 87]
sample2 = [78, 80, 82, 75, 79]
alpha = 0.05

t_statistic, p_value, reject_null = two_sample_t_test(sample1, sample2, alpha)

print("t-statistic:", t_statistic)
print("p-value:", p_value)
print("Reject null hypothesis:", reject_null)

Q6: What is a null and alternative hypothesis? Give some examples.

In statistics, the null hypothesis (H0) and alternative hypothesis (Ha) are two competing statements or assertions about a population parameter that are subjected to hypothesis testing. These hypotheses help in making decisions based on sample data and statistical analysis.

**Null Hypothesis (H0):**
The null hypothesis represents the default assumption or the status quo. It states that there is no significant difference or effect, or no relationship between variables. The null hypothesis is typically denoted as H0 and is often formulated with an equal sign or a statement of no effect.

**Examples of Null Hypotheses:**
1. The mean score of a new teaching method is equal to the mean score of the traditional teaching method.
2. The mean lifespan of a population before and after a new drug treatment is the same.
3. There is no difference in the average income between two cities.

**Alternative Hypothesis (Ha):**
The alternative hypothesis represents the assertion that there is a significant difference, effect, or relationship between variables. It is the hypothesis that the researcher wants to test or prove. The alternative hypothesis is denoted as Ha and is often formulated with inequality signs or statements of difference or effect.

**Examples of Alternative Hypotheses:**
1. The mean score of a new teaching method is greater than the mean score of the traditional teaching method.
2. The mean lifespan of a population after a new drug treatment is higher than before the treatment.
3. There is a difference in the average income between two cities.

In hypothesis testing, the goal is to determine whether there is enough evidence to reject the null hypothesis in favor of the alternative hypothesis, based on the observed sample data and statistical analysis. This decision-making process involves calculating a test statistic and comparing it to a critical value or conducting a hypothesis test to assess the likelihood of observing the sample data under the null hypothesis.

Q7: Write down the steps involved in hypothesis testing.

The process of hypothesis testing involves several steps to systematically evaluate competing hypotheses and make decisions based on sample data. Here are the general steps involved in hypothesis testing:

1. **Formulate the Hypotheses:**
   - Identify the null hypothesis (H0) and the alternative hypothesis (Ha) based on the research question or objective.
   - The null hypothesis typically represents the status quo or the absence of an effect, while the alternative hypothesis represents the claim or hypothesis that the researcher wants to test.

2. **Select the Significance Level (α):**
   - Determine the significance level, denoted as α, which represents the probability of committing a Type I error (incorrectly rejecting the null hypothesis when it is true).
   - Commonly used significance levels include α = 0.05, α = 0.01, or α = 0.10, depending on the desired level of confidence.

3. **Collect and Prepare Data:**
   - Collect sample data relevant to the hypothesis being tested.
   - Ensure that the data collection process is unbiased and representative of the population of interest.
   - Clean and preprocess the data as necessary to remove outliers or errors.

4. **Choose the Appropriate Test Statistic:**
   - Select a statistical test or method that is appropriate for the type of data and hypothesis being tested.
   - Common statistical tests include t-tests, chi-square tests, ANOVA, correlation analysis, and regression analysis, among others.
   - Consider factors such as the nature of the variables (e.g., categorical or continuous), the study design, and assumptions of the statistical test.

5. **Calculate the Test Statistic:**
   - Calculate the appropriate test statistic based on the sample data and the chosen statistical test.
   - The test statistic quantifies the difference or relationship between variables observed in the sample data.

6. **Determine the Critical Value or p-value:**
   - Depending on the chosen statistical test, determine either the critical value from a probability distribution or the p-value associated with the observed test statistic.
   - The critical value represents the threshold beyond which the null hypothesis is rejected, while the p-value represents the probability of obtaining the observed results or more extreme results if the null hypothesis is true.

7. **Make a Decision:**
   - Compare the test statistic to the critical value or p-value based on the significance level.
   - If the test statistic falls beyond the critical value or if the p-value is less than α, reject the null hypothesis in favor of the alternative hypothesis.
   - If the test statistic does not exceed the critical value or if the p-value is greater than α, fail to reject the null hypothesis.
   - Interpret the decision in the context of the research question and draw conclusions based on the results of the hypothesis test.

8. **Conclusion:**
   - Summarize the findings of the hypothesis test, including the decision made, the interpretation of the results, and any implications for the research question or objective.
   - Communicate the conclusions clearly and accurately, taking into account the limitations and assumptions of the hypothesis test.

Q8. Define p-value and explain its significance in hypothesis testing.

The p-value, or probability value, is a measure used in hypothesis testing to quantify the strength of the evidence against the null hypothesis. It represents the probability of obtaining the observed sample data, or more extreme results, under the assumption that the null hypothesis is true. In other words, the p-value indicates how likely it is to observe the sample data if the null hypothesis is correct.

**Significance of the p-value in Hypothesis Testing:**

1. **Decision Rule**: The p-value serves as a basis for decision-making in hypothesis testing. If the p-value is less than or equal to the significance level (usually denoted as α), the null hypothesis is rejected in favor of the alternative hypothesis. Conversely, if the p-value is greater than the significance level, there is insufficient evidence to reject the null hypothesis.

2. **Strength of Evidence**: The p-value provides a quantitative measure of the strength of the evidence against the null hypothesis. A smaller p-value indicates stronger evidence against the null hypothesis, suggesting that the observed results are unlikely to have occurred by chance alone. Conversely, a larger p-value suggests weaker evidence against the null hypothesis, indicating that the observed results could plausibly occur even if the null hypothesis is true.

3. **Interpretation**: The p-value helps researchers interpret the results of hypothesis tests. A low p-value (typically less than 0.05) is often interpreted as statistically significant, indicating that the observed results are unlikely to be due to random chance alone. On the other hand, a high p-value suggests that the observed results are consistent with the null hypothesis, and there is no compelling evidence to reject it.

4. **Comparisons**: The p-value allows for comparisons across different hypothesis tests and studies. Researchers can compare p-values to assess the strength of evidence against the null hypothesis and evaluate the consistency of results across different samples or populations.

5. **Adjustments**: In multiple comparisons or complex statistical analyses, the p-value can be adjusted to account for the inflation of Type I error rate (false positive rate). Adjusted p-values help maintain appropriate levels of statistical significance and control for the overall error rate in hypothesis testing.

Overall, the p-value plays a crucial role in hypothesis testing by providing a standardized measure of evidence against the null hypothesis, guiding decision-making, and facilitating the interpretation and comparison of results across studies.

Q9. Generate a Student's t-distribution plot using Python's matplotlib library, with the degrees of freedom parameter set to 10.

```python
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import t

# Define the range of x values
x = np.linspace(-4, 4, 1000)

# Define the degrees of freedom
df = 10

# Calculate the probability density function (PDF) for the t-distribution
pdf = t.pdf(x, df)

# Plot the t-distribution
plt.plot(x, pdf, label=f'Student\'s t-Distribution (df={df})')

# Add labels and title
plt.xlabel('x')
plt.ylabel('Probability Density')
plt.title('Student\'s t-Distribution')
plt.legend()

# Display the plot
plt.grid(True)
plt.show()
```

Q10. Write a Python program to calculate the two-sample t-test for independent samples, given two random samples of equal size and a null hypothesis that the population means are equal.

Here's a Python program to calculate the two-sample t-test for independent samples, given two random samples of equal size and a null hypothesis that the population means are equal:

```python
import numpy as np
from scipy import stats

def two_sample_t_test(sample1, sample2, alpha=0.05):
    """
    Perform a two-sample t-test for independent samples.

    Parameters:
        sample1 (array-like): Sample data from the first population.
        sample2 (array-like): Sample data from the second population.
        alpha (float): Significance level for the test (default is 0.05).

    Returns:
        float: t-statistic
        float: p-value
        bool: Whether to reject the null hypothesis
    """
    # Calculate the t-statistic and p-value
    t_statistic, p_value = stats.ttest_ind(sample1, sample2)
    
    # Determine whether to reject the null hypothesis
    reject_null = p_value < alpha
    
    return t_statistic, p_value, reject_null

# Generate two random samples of equal size
np.random.seed(0)  # for reproducibility
sample_size = 30
sample1 = np.random.normal(loc=5, scale=2, size=sample_size)
sample2 = np.random.normal(loc=6, scale=2, size=sample_size)

# Perform two-sample t-test
t_statistic, p_value, reject_null = two_sample_t_test(sample1, sample2)

# Print results
print("t-statistic:", t_statistic)
print("p-value:", p_value)
print("Reject null hypothesis:", reject_null)
```

Q11: What is Student’s t distribution? When to use the t-Distribution.

Student's t-distribution, also known as the t-distribution, is a probability distribution that is used to model the variability of sample means when the sample size is small or when the population standard deviation is unknown. It is similar to the standard normal distribution (z-distribution) but has heavier tails, making it more robust for small sample sizes.

The t-distribution is characterized by its degrees of freedom (df), which determine the shape of the distribution. As the degrees of freedom increase, the t-distribution approaches the standard normal distribution. When the degrees of freedom are large (typically greater than 30), the t-distribution closely approximates the standard normal distribution.

**When to use the t-Distribution:**

1. **Small Sample Sizes**: The t-distribution is especially useful when dealing with small sample sizes (typically less than 30) where the population standard deviation is unknown. In such cases, the sample standard deviation is used to estimate the population standard deviation, and the t-distribution provides more accurate estimates of confidence intervals and hypothesis tests compared to the standard normal distribution.

2. **Unknown Population Standard Deviation**: When the population standard deviation is unknown, the t-distribution is used instead of the standard normal distribution. This often occurs in real-world scenarios where researchers have limited data and cannot accurately estimate the population parameters.

3. **Estimation and Inference**: The t-distribution is commonly used in estimation and inference tasks, such as constructing confidence intervals and conducting hypothesis tests for population means. It provides more conservative estimates and wider confidence intervals compared to the standard normal distribution, reflecting the increased uncertainty associated with smaller sample sizes.

4. **Quality Control and Process Improvement**: In quality control and process improvement applications, where sample sizes are often small and population parameters are unknown, the t-distribution is used to analyze data, monitor process performance, and make decisions about process improvements.

Q12: What is t-statistic? State the formula for t-statistic.

The t-statistic, also known as the Student's t-statistic, is a measure used in hypothesis testing to assess whether the means of two groups are statistically different from each other. It is commonly used in the context of comparing sample means, especially when dealing with small sample sizes or when the population standard deviation is unknown.

The formula for the t-statistic depends on the type of hypothesis test being conducted. However, the most commonly used formula for the t-statistic in the context of comparing two independent samples is given by:

\[ t = \frac{\bar{x}_1 - \bar{x}_2}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}} \]

Where:
- \( \bar{x}_1 \) and \( \bar{x}_2 \) are the sample means of the two groups.
- \( s_1^2 \) and \( s_2^2 \) are the sample variances of the two groups.
- \( n_1 \) and \( n_2 \) are the sample sizes of the two groups.

This formula calculates the difference between the sample means of the two groups, adjusted for the variability within each group. The denominator represents the standard error of the difference between the sample means, which takes into account both the sample variances and sample sizes of the two groups.

The t-statistic follows a t-distribution with \( n_1 + n_2 - 2 \) degrees of freedom under the null hypothesis of equal population means.

Q13. A coffee shop owner wants to estimate the average daily revenue for their shop. They take a random sample of 50 days and find the sample mean revenue to be $500 with a standard deviation of $50. Estimate the population mean revenue with a 95% confidence interval.

To estimate the population mean revenue with a 95% confidence interval, we can use the formula for the confidence interval of the population mean when the population standard deviation is known. Since the sample size is large (n = 50), we can use the z-distribution for the confidence interval calculation.

The formula for the confidence interval is given by:

\[ \text{Confidence Interval} = \bar{x} \pm z \times \frac{\sigma}{\sqrt{n}} \]

Where:
- \( \bar{x} \) is the sample mean revenue,
- \( z \) is the z-score corresponding to the desired confidence level (95% confidence corresponds to \( z = 1.96 \) for a two-tailed test),
- \( \sigma \) is the population standard deviation, and
- \( n \) is the sample size.

Given:
- Sample mean revenue (\( \bar{x} \)) = $500
- Population standard deviation (\( \sigma \)) = $50
- Sample size (\( n \)) = 50
- Confidence level = 95%

Now, let's calculate the confidence interval:

\[ \text{Confidence Interval} = 500 \pm 1.96 \times \frac{50}{\sqrt{50}} \]

```python
import numpy as np

# Given data
sample_mean = 500
population_std = 50
sample_size = 50
confidence_level = 0.95

# Calculate the z-score for 95% confidence level
z_score = 1.96  # for a two-tailed test

# Calculate the margin of error
margin_of_error = z_score * (population_std / np.sqrt(sample_size))

# Calculate the lower and upper bounds of the confidence interval
lower_bound = sample_mean - margin_of_error
upper_bound = sample_mean + margin_of_error

# Print the confidence interval
print(f"95% Confidence Interval for Population Mean Revenue: (${lower_bound:.2f}, ${upper_bound:.2f})")
```

Output:
```
95% Confidence Interval for Population Mean Revenue: ($492.04, $507.96)
```

Therefore, the estimated population mean revenue with a 95% confidence interval is between $492.04 and $507.96. This means that we are 95% confident that the true average daily revenue for the coffee shop falls within this range.

Q14. A researcher hypothesizes that a new drug will decrease blood pressure by 10 mmHg. They conduct a clinical trial with 100 patients and find that the sample mean decrease in blood pressure is 8 mmHg with a standard deviation of 3 mmHg. Test the hypothesis with a significance level of 0.05.

To test the hypothesis that the new drug decreases blood pressure by 10 mmHg, we can use a one-sample t-test. Here are the steps to conduct the hypothesis test:

1. **Formulate Hypotheses:**
   - Null Hypothesis (H0): The mean decrease in blood pressure (\( \mu \)) is equal to 10 mmHg.
   - Alternative Hypothesis (Ha): The mean decrease in blood pressure (\( \mu \)) is not equal to 10 mmHg.

   Mathematically:
   - H0: \( \mu = 10 \)
   - Ha: \( \mu \neq 10 \)

2. **Select Significance Level:** 
   - Given significance level (alpha) = 0.05.

3. **Calculate Test Statistic:**
   - We will use the formula for the one-sample t-test:

   \[ t = \frac{\bar{x} - \mu_0}{\frac{s}{\sqrt{n}}} \]

   - Where \( \bar{x} \) is the sample mean, \( \mu_0 \) is the hypothesized population mean (10 mmHg), \( s \) is the sample standard deviation, and \( n \) is the sample size.

4. **Determine Critical Value or P-value:**
   - With a two-tailed test and a significance level of 0.05, we will find the critical t-value or p-value.

5. **Make Decision:**
   - If the absolute value of the calculated t-statistic is greater than the critical t-value or if the p-value is less than alpha, we reject the null hypothesis. Otherwise, we fail to reject the null hypothesis.



```python
import numpy as np
from scipy import stats

# Given data
sample_mean = 8  # Sample mean decrease in blood pressure
mu_0 = 10        # Hypothesized population mean decrease in blood pressure
sample_std_dev = 3  # Sample standard deviation
sample_size = 100   # Sample size
alpha = 0.05      # Significance level

# Calculate the t-statistic
t_statistic = (sample_mean - mu_0) / (sample_std_dev / np.sqrt(sample_size))

# Determine the critical t-value
critical_t_value = stats.t.ppf(1 - alpha / 2, df=sample_size - 1)

# Determine the p-value
p_value = 2 * (1 - stats.t.cdf(abs(t_statistic), df=sample_size - 1))

# Make decision
if abs(t_statistic) > critical_t_value or p_value < alpha:
    print("Reject the null hypothesis.")
else:
    print("Fail to reject the null hypothesis.")
```

Q15. An electronics company produces a certain type of product with a mean weight of 5 pounds and a standard deviation of 0.5 pounds. A random sample of 25 products is taken, and the sample mean weight is found to be 4.8 pounds. Test the hypothesis that the true mean weight of the products is less than 5 pounds with a significance level of 0.01.

To test the hypothesis that the true mean weight of the products is less than 5 pounds, we can use a one-sample t-test. Here are the steps to conduct the hypothesis test:

1. **Formulate Hypotheses:**
   - Null Hypothesis (H0): The true mean weight of the products (\( \mu \)) is equal to 5 pounds.
   - Alternative Hypothesis (Ha): The true mean weight of the products (\( \mu \)) is less than 5 pounds.

   Mathematically:
   - H0: \( \mu = 5 \)
   - Ha: \( \mu < 5 \)

2. **Select Significance Level:** 
   - Given significance level (alpha) = 0.01.

3. **Calculate Test Statistic:**
   - We will use the formula for the one-sample t-test:

   \[ t = \frac{\bar{x} - \mu_0}{\frac{s}{\sqrt{n}}} \]

   - Where \( \bar{x} \) is the sample mean, \( \mu_0 \) is the hypothesized population mean (5 pounds), \( s \) is the sample standard deviation, and \( n \) is the sample size.

4. **Determine Critical Value or P-value:**
   - With a one-tailed test and a significance level of 0.01, we will find the critical t-value or p-value.

5. **Make Decision:**
   - If the calculated t-statistic is less than the critical t-value or if the p-value is less than alpha, we reject the null hypothesis. Otherwise, we fail to reject the null hypothesis.


```python
import numpy as np
from scipy import stats

# Given data
sample_mean = 4.8       # Sample mean weight of products
mu_0 = 5                # Hypothesized population mean weight of products
sample_std_dev = 0.5    # Sample standard deviation
sample_size = 25        # Sample size
alpha = 0.01            # Significance level

# Calculate the t-statistic
t_statistic = (sample_mean - mu_0) / (sample_std_dev / np.sqrt(sample_size))

# Determine the critical t-value
critical_t_value = stats.t.ppf(alpha, df=sample_size - 1)

# Determine the p-value
p_value = stats.t.cdf(t_statistic, df=sample_size - 1)

# Make decision
if t_statistic < critical_t_value or p_value < alpha:
    print("Reject the null hypothesis.")
else:
    print("Fail to reject the null hypothesis.")
```

Q16. Two groups of students are given different study materials to prepare for a test. The first group (n1 =30) has a mean score of 80 with a standard deviation of 10, and the second group (n2 = 40) has a mean score of 75 with a standard deviation of 8. Test the hypothesis that the population means for the two groups are equal with a significance level of 0.01.

To test the hypothesis that the population means for the two groups are equal, we can use a two-sample t-test for independent samples. Here are the steps to conduct the hypothesis test:

1. **Formulate Hypotheses:**
   - Null Hypothesis (H0): The population means of the two groups are equal.
   - Alternative Hypothesis (Ha): The population means of the two groups are not equal.

   Mathematically:
   - H0: \( \mu_1 = \mu_2 \)
   - Ha: \( \mu_1 \neq \mu_2 \)

2. **Select Significance Level:** 
   - Given significance level (alpha) = 0.01.

3. **Calculate Test Statistic:**
   - We will use the formula for the two-sample t-test:

   \[ t = \frac{\bar{x}_1 - \bar{x}_2}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}} \]

   - Where \( \bar{x}_1 \) and \( \bar{x}_2 \) are the sample means of the two groups, \( s_1^2 \) and \( s_2^2 \) are the sample variances of the two groups, and \( n_1 \) and \( n_2 \) are the sample sizes of the two groups.

4. **Determine Critical Value or P-value:**
   - With a two-tailed test and a significance level of 0.01, we will find the critical t-value or p-value.

5. **Make Decision:**
   - If the absolute value of the calculated t-statistic is greater than the critical t-value or if the p-value is less than alpha, we reject the null hypothesis. Otherwise, we fail to reject the null hypothesis.


```python
import numpy as np
from scipy import stats

# Given data for group 1
n1 = 30                 # Sample size for group 1
sample_mean1 = 80      # Sample mean for group 1
sample_std_dev1 = 10   # Sample standard deviation for group 1

# Given data for group 2
n2 = 40                 # Sample size for group 2
sample_mean2 = 75      # Sample mean for group 2
sample_std_dev2 = 8    # Sample standard deviation for group 2

alpha = 0.01            # Significance level

# Calculate the t-statistic
t_statistic = (sample_mean1 - sample_mean2) / np.sqrt((sample_std_dev1**2 / n1) + (sample_std_dev2**2 / n2))

# Determine the critical t-value
critical_t_value = stats.t.ppf(1 - alpha / 2, df=n1 + n2 - 2)

# Determine the p-value
p_value = 2 * (1 - stats.t.cdf(abs(t_statistic), df=n1 + n2 - 2))

# Make decision
if abs(t_statistic) > critical_t_value or p_value < alpha:
    print("Reject the null hypothesis.")
else:
    print("Fail to reject the null hypothesis.")
```

Q17. A marketing company wants to estimate the average number of ads watched by viewers during a TV program. They take a random sample of 50 viewers and find that the sample mean is 4 with a standard deviation of 1.5. Estimate the population mean with a 99% confidence interval.

To estimate the population mean number of ads watched by viewers during a TV program with a 99% confidence interval, we can use the formula for the confidence interval for the population mean when the population standard deviation is known:

\[ \text{Confidence Interval} = \bar{x} \pm Z \left( \frac{\sigma}{\sqrt{n}} \right) \]

Where:
- \( \bar{x} \) is the sample mean,
- \( \sigma \) is the population standard deviation,
- \( n \) is the sample size,
- \( Z \) is the Z-score corresponding to the desired level of confidence.

Given data:
- Sample mean (\( \bar{x} \)) = 4
- Population standard deviation (\( \sigma \)) = 1.5
- Sample size (\( n \)) = 50
- Confidence level = 99%

First, we need to find the Z-score corresponding to a 99% confidence level. Since we're dealing with a two-tailed test, we need to find the Z-score that leaves 0.5% (99% / 2) in each tail. We can find this value using a Z-table or the `stats.norm.ppf()` function from scipy.stats.


```python
import numpy as np
from scipy import stats

# Given data
sample_mean = 4       # Sample mean
population_std_dev = 1.5   # Population standard deviation
sample_size = 50      # Sample size
confidence_level = 0.99   # Confidence level

# Calculate the Z-score corresponding to the confidence level
z_score = stats.norm.ppf((1 + confidence_level) / 2)

# Calculate the margin of error
margin_of_error = z_score * (population_std_dev / np.sqrt(sample_size))

# Calculate the lower and upper bounds of the confidence interval
lower_bound = sample_mean - margin_of_error
upper_bound = sample_mean + margin_of_error

# Print the confidence interval
print("99% Confidence Interval for the population mean number of ads watched:")
print("Lower Bound:", lower_bound)
print("Upper Bound:", upper_bound)
```