# 1: What is Estimation Statistics? Explain point estimate and interval estimate.
ANSWER:

Estimation statistics is a branch of statistics that deals with the process of estimating unknown population parameters based on sample data. When we collect data from a sample, we use estimation techniques to make inferences about the population from which the sample was drawn.

Point Estimate:
A point estimate is a single value that is used to estimate an unknown population parameter. It provides a best guess or approximation of the parameter based on the available sample data. For example, if we want to estimate the population mean, we can use the sample mean as a point estimate. Similarly, the sample proportion can be used as a point estimate for the population proportion.

Interval Estimate:
An interval estimate, also known as an confidence interval, is a range of values within which the true population parameter is likely to lie. Instead of providing a single point estimate, an interval estimate provides a range of plausible values. The interval is constructed based on the sample data and the desired level of confidence. A confidence level, such as 95% or 99%, is chosen to represent the degree of certainty we want to have in capturing the true parameter.

For example, a 95% confidence interval for the population mean represents a range of values within which we can be 95% confident that the true population mean lies. It is typically constructed by taking the point estimate (e.g., sample mean) and adding and subtracting a margin of error based on the variability of the data and the desired level of confidence.

Interval estimates provide more information than point estimates because they incorporate the uncertainty associated with the estimation process. They give a sense of the precision and reliability of the estimate by considering the variability in the data.

# 2. Write a Python function to estimate the population mean using a sample mean and standard deviation.
ANSWER:

    import math

    def estimate_population_mean(sample_mean, sample_std_dev, sample_size):
        # Calculate the standard error (standard deviation of the sampling distribution)
        standard_error = sample_std_dev / math.sqrt(sample_size)
    
        # Calculate the margin of error (critical value times the standard error)
        # For a 95% confidence level, the critical value is approximately 1.96
        margin_of_error = 1.96 * standard_error
    
        # Calculate the lower and upper bounds of the confidence interval
        lower_bound = sample_mean - margin_of_error
        upper_bound = sample_mean + margin_of_error
    
        # Return the estimated population mean and the confidence interval
        return sample_mean, (lower_bound, upper_bound)

OUTPUT:

    sample_mean = 75.2
    sample_std_dev = 4.8
    sample_size = 100

    population_mean, confidence_interval = estimate_population_mean(sample_mean, sample_std_dev, sample_size)

    print("Estimated Population Mean:", population_mean)
    print("Confidence Interval:", confidence_interval)



# 3: What is Hypothesis testing? Why is it used? State the importance of Hypothesis testing.
ANSWER:

Hypothesis testing is a statistical method used to make inferences and draw conclusions about a population based on sample data. It involves formulating a hypothesis about a population parameter, collecting sample data, and assessing the evidence against the hypothesis.

The main purpose of hypothesis testing is to determine whether the observed sample data provides enough evidence to support or reject a specific claim or hypothesis about a population parameter. It allows us to make objective and data-driven decisions by providing a framework for testing assumptions and evaluating the statistical significance of results.

The importance of hypothesis testing lies in its ability to:

    Make Informed Decisions: Hypothesis testing provides a systematic approach for making decisions based on evidence from data. By testing hypotheses, we can determine whether there is enough evidence to support a claim or if it is more reasonable to reject it.

    Evaluate Statistical Significance: Hypothesis testing helps us assess the statistical significance of our findings. It allows us to quantify the likelihood of observing the obtained results under the assumption that the null hypothesis is true. This helps in understanding the reliability and generalizability of the findings.

    Test Research Questions and Theories: Hypothesis testing plays a crucial role in scientific research. It allows researchers to test their research questions, theories, or predictions against empirical evidence. By conducting hypothesis tests, researchers can validate or refute their hypotheses and contribute to the advancement of knowledge.

    Control Type I and Type II Errors: Hypothesis testing helps control the risks of making incorrect decisions. It distinguishes between Type I errors (rejecting a true null hypothesis) and Type II errors (failing to reject a false null hypothesis). By setting appropriate significance levels (alpha) and sample sizes, researchers can manage and minimize these errors.

    Provide Objective Evidence: Hypothesis testing provides a rigorous and objective approach to decision-making. It requires researchers to specify their hypotheses in advance, collect data, and analyze it using appropriate statistical tests. This helps reduce bias and subjectivity, providing a more reliable basis for drawing conclusions.

# 4. Create a hypothesis that states whether the average weight of male college students is greater than the average weight of female college students.
ANSWER:

Hypothesis:
The average weight of male college students is greater than the average weight of female college students.

Null hypothesis (H₀):
The average weight of male college students is equal to or less than the average weight of female college students.

Alternative hypothesis (H₁):
The average weight of male college students is greater than the average weight of female college students.

# 5. Write a Python script to conduct a hypothesis test on the difference between two population means, given a sample from each population.
ANSWER:

import scipy.stats as stats

# Sample data from the two populations
sample1 = [15, 17, 18, 19, 22, 25, 26, 27, 28, 30]
sample2 = [12, 14, 16, 18, 20, 22, 24, 26, 28, 30]

# Set the significance level (alpha)
alpha = 0.05

# Perform independent t-test
t_statistic, p_value = stats.ttest_ind(sample1, sample2)

# Print the results
print("T-Statistic:", t_statistic)
print("P-Value:", p_value)

# Compare the p-value to the significance level
if p_value < alpha:
    print("Reject the null hypothesis")
else:
    print("Fail to reject the null hypothesis")


# 6: What is a null and alternative hypothesis? Give some examples.
ANSWER:

In statistics, the null hypothesis (H0) and the alternative hypothesis (H1 or Ha) are two competing statements about a population parameter or the relationship between variables. These hypotheses are used in hypothesis testing to make inferences and draw conclusions from sample data.

The null hypothesis (H0) represents the status quo or a default assumption. It states that there is no significant difference, relationship, or effect in the population or between variables. The alternative hypothesis (H1 or Ha), on the other hand, contradicts the null hypothesis and suggests that there is indeed a significant difference, relationship, or effect in the population or between variables.

Here are some examples to illustrate null and alternative hypotheses:

    Example: Coin Toss
    Null hypothesis (H0): The coin is fair (i.e., the probability of getting heads is 0.5).
    Alternative hypothesis (H1): The coin is biased (i.e., the probability of getting heads is not 0.5).

    Example: Drug Efficacy
    Null hypothesis (H0): The new drug has no effect on reducing blood pressure.
    Alternative hypothesis (H1): The new drug significantly reduces blood pressure.

    Example: Gender and Income
    Null hypothesis (H0): There is no significant difference in average income between men and women.
    Alternative hypothesis (H1): There is a significant difference in average income between men and women.

    Example: Website Conversion Rate
    Null hypothesis (H0): The changes made to the website have no impact on the conversion rate.
    Alternative hypothesis (H1): The changes made to the website significantly increase the conversion rate.

    Example: Climate Change
    Null hypothesis (H0): The average global temperature has not increased over the past century.
    Alternative hypothesis (H1): The average global temperature has increased over the past century.

# 7: Write down the steps involved in hypothesis testing.
ANSWER:

Hypothesis testing involves a series of steps to make statistical inferences and draw conclusions about population parameters or relationships between variables. Here are the general steps involved in hypothesis testing:

1. Formulate the null and alternative hypotheses: Clearly state the null hypothesis (H0) and the alternative hypothesis (H1 or Ha) based on the research question or problem being investigated.

2. Set the significance level (α): Determine the desired level of significance, denoted by α, which represents the maximum probability of committing a Type I error (rejecting the null hypothesis when it is true). Common significance levels include 0.05 (5%) and 0.01 (1%).

3. Select the appropriate test statistic: Choose a statistical test that is appropriate for the research question, data type, and assumptions. This selection depends on factors such as the nature of the data (e.g., categorical or continuous) and the specific research design (e.g., one-sample, two-sample, paired samples).

4. Collect and analyze the sample data: Gather data from the sample or samples involved in the study. Perform the necessary calculations or statistical analyses based on the chosen test statistic. This may involve calculating sample means, proportions, differences, or correlation coefficients, depending on the test.

5. Calculate the test statistic and p-value: Compute the test statistic using the sample data. The test statistic quantifies the distance between the observed data and the null hypothesis. From the test statistic, calculate the p-value, which represents the probability of obtaining results as extreme as or more extreme than the observed data, assuming the null hypothesis is true.

6. Make a decision: Compare the p-value to the significance level (α) set in step 2. If the p-value is less than or equal to α, the results are considered statistically significant, and the null hypothesis is rejected in favor of the alternative hypothesis. If the p-value is greater than α, there is insufficient evidence to reject the null hypothesis.

7. Draw conclusions: Based on the decision in step 6, draw conclusions about the research question or problem. If the null hypothesis is rejected, interpret the results in the context of the alternative hypothesis. If the null hypothesis is not rejected, interpret the results in terms of the lack of evidence to support the alternative hypothesis.

8. Consider limitations and report results: Discuss any limitations or assumptions of the hypothesis test. Report the test statistic, p-value, and decision in a clear and concise manner. Provide additional information, such as effect sizes or confidence intervals, if applicable.


# 8. Define p-value and explain its significance in hypothesis testing.
ANSWER:

The p-value is a statistical measure that quantifies the strength of evidence against the null hypothesis in hypothesis testing. It represents the probability of obtaining results as extreme as or more extreme than the observed data, assuming the null hypothesis is true.

In hypothesis testing, the p-value is compared to the pre-determined significance level (α) to make a decision about the null hypothesis. The significance level, typically set at 0.05 (5%) or 0.01 (1%), represents the threshold below which the null hypothesis is rejected. The steps involved in interpreting the p-value and making a decision are as follows:

1. If the p-value is less than or equal to α: This indicates that the observed data is unlikely to have occurred if the null hypothesis were true. In other words, there is strong evidence against the null hypothesis, and it is rejected in favor of the alternative hypothesis. The results are considered statistically significant, and there is confidence that the observed effect or relationship exists in the population.

2. If the p-value is greater than α: This suggests that the observed data is reasonably likely to occur even if the null hypothesis were true. In this case, there is insufficient evidence to reject the null hypothesis. The results are not considered statistically significant, and it is concluded that there is not enough evidence to support the alternative hypothesis.

The p-value provides a quantitative measure of the strength of evidence against the null hypothesis. A smaller p-value indicates stronger evidence against the null hypothesis, while a larger p-value suggests weaker evidence. It is important to note that the p-value does not provide information about the size or importance of the effect, but rather indicates the statistical significance.

It is crucial to interpret the p-value correctly and avoid misinterpretation. A p-value above the significance level does not prove that the null hypothesis is true or that the effect or relationship is absent; it simply means that there is not enough evidence to reject the null hypothesis based on the observed data. Similarly, a p-value below the significance level does not guarantee the practical or real-world significance of the effect or relationship; it only indicates statistical significance.

Therefore, the p-value is a key factor in hypothesis testing as it helps researchers make decisions about the null hypothesis and draw conclusions about the presence or absence of significant effects or relationships in the population based on the observed data.

# 9. Generate a Student's t-distribution plot using Python's matplotlib library, with the degrees of freedom parameter set to 10.
ANSWER:

    import numpy as np
    import matplotlib.pyplot as plt
    from scipy.stats import t

    # Define the degrees of freedom
    df = 10

    # Generate x values for the plot
    x = np.linspace(-4, 4, 500)

    # Calculate the corresponding y values from the t-distribution
    y = t.pdf(x, df)

    # Create the plot
    plt.plot(x, y, label=f't-distribution (df = {df})')

    # Set the plot title and labels
    plt.title("Student's t-distribution")
    plt.xlabel('x')
    plt.ylabel('Probability Density')

    # Display a legend
    plt.legend()

    # Show the plot
    plt.show()


# 10. Write a Python program to calculate the two-sample t-test for independent samples, given two random samples of equal size and a null hypothesis that the population means are equal.
ANSWER:

    import numpy as np
    from scipy.stats import ttest_ind

    # Define the two random samples
    sample1 = np.array([1, 2, 3, 4, 5])
    sample2 = np.array([6, 7, 8, 9, 10])

    # Perform the two-sample t-test
    t_statistic, p_value = ttest_ind(sample1, sample2)

    # Print the results
    print(f"t-statistic: {t_statistic}")
    print(f"p-value: {p_value}")


# 11: What is Student’s t distribution? When to use the t-Distribution.
ANSWER:

Student's t-distribution, often referred to as the t-distribution, is a probability distribution that is widely used in statistics for making inferences about population parameters when the sample size is small or when the population standard deviation is unknown. It is named after William Sealy Gosset, who published under the pseudonym "Student."

The t-distribution is similar in shape to the standard normal distribution (Z-distribution), but it has thicker tails. The shape of the t-distribution depends on the degrees of freedom (df), which is related to the sample size. As the degrees of freedom increase, the t-distribution approaches the shape of the standard normal distribution.

The t-distribution is typically used in the following scenarios:

1. Small Sample Sizes: When the sample size is small (typically less than 30), the t-distribution is used instead of the normal distribution to estimate population parameters. This is because the standard deviation of the population is unknown, and using the t-distribution accounts for the added uncertainty due to small sample sizes.

2. Unknown Population Standard Deviation: When the population standard deviation is unknown, the t-distribution is used to construct confidence intervals and perform hypothesis tests. The t-distribution incorporates the sample standard deviation to estimate the population parameters.

3. Comparing Means of Independent Samples: When comparing the means of two independent samples, such as in a two-sample t-test, the t-distribution is used to determine if there is a significant difference between the sample means.

4. Regression Analysis: In linear regression analysis, the t-distribution is used to test the significance of regression coefficients and to construct confidence intervals for the coefficients.

The t-distribution provides critical values and probabilities that are used in hypothesis testing, confidence interval construction, and estimating the uncertainty in parameter estimates. It is a valuable tool when dealing with small sample sizes or when the population standard deviation is unknown.

# 12: What is t-statistic? State the formula for t-statistic.
ANSWER:

The t-statistic is a measure used in hypothesis testing to assess the difference between sample means and determine whether it is statistically significant. It quantifies the difference between the observed sample means and what would be expected under the null hypothesis that the population means are equal.

The formula for the t-statistic depends on the specific scenario and hypothesis being tested. Here are the formulas for the t-statistic in two common scenarios:

1. One-Sample t-Test:
   The one-sample t-test is used when comparing the mean of a single sample to a known or hypothesized population mean.
   
   The formula for the t-statistic in a one-sample t-test is:
   
   t = (x - μ) / (s / sqrt(n))
   
   Where:
   - t: The t-statistic
   - x: The sample mean
   - μ: The hypothesized population mean
   - s: The sample standard deviation
   - n: The sample size

2. Two-Sample t-Test (Independent Samples):
   The two-sample t-test is used when comparing the means of two independent samples to determine if there is a significant difference between them.
   
   The formula for the t-statistic in a two-sample t-test is:
   
   t = (x1 - x2) / sqrt((s1^2 / n1) + (s2^2 / n2))
   
   Where:
   - t: The t-statistic
   - x1: The mean of the first sample
   - x2: The mean of the second sample
   - s1: The standard deviation of the first sample
   - s2: The standard deviation of the second sample
   - n1: The size of the first sample
   - n2: The size of the second sample

In both cases, the t-statistic measures the difference between the sample means, adjusted for the variability within the samples and the sample sizes. It is then compared to a critical value from the t-distribution or used to calculate a p-value to assess the statistical significance of the difference.

# 13. A coffee shop owner wants to estimate the average daily revenue for their shop. They take a random sample of 50 days and find the sample mean revenue to be $500 with a standard deviation of $50. Estimate the population mean revenue with a 95% confidence interval.
ANSWER:

To estimate the population mean revenue with a 95% confidence interval, we can use the sample data provided. Here's how you can calculate it:

Given:
Sample size (n) = 50
Sample mean (x̄) = $500
Sample standard deviation (s) = $50
Confidence level = 95% (which corresponds to a significance level of α = 0.05)

To calculate the confidence interval, we can use the formula for the confidence interval of the mean:

Confidence interval = x̄ ± (t * (s / sqrt(n)))

First, we need to determine the critical value, t, from the t-distribution table or using statistical software. Since the sample size is 50, the degrees of freedom will be n-1 = 49. For a 95% confidence level with 49 degrees of freedom, the critical value will be approximately 2.009.

Substituting the values into the formula, we have:

Confidence interval = $500 ± (2.009 * ($50 / sqrt(50)))

Calculating the expression inside the parentheses:
= $500 ± (2.009 * ($50 / 7.071))  # sqrt(50) = 7.071

Calculating the expression inside the parentheses:
= $500 ± (2.009 * $7.071)

Finally, calculating the confidence interval:
= $500 ± $14.17

Therefore, the 95% confidence interval estimate for the average daily revenue is:
$500 - $14.17 to $500 + $14.17

The 95% confidence interval is approximately $485.83 to $514.17. This means we can be 95% confident that the true average daily revenue for the coffee shop falls within this interval.

# 14. A researcher hypothesizes that a new drug will decrease blood pressure by 10 mmHg. They conduct a clinical trial with 100 patients and find that the sample mean decrease in blood pressure is 8 mmHg with a standard deviation of 3 mmHg. Test the hypothesis with a significance level of 0.05.
ANSWER:

To test the hypothesis that the new drug decreases blood pressure by 10 mmHg, we can perform a one-sample t-test. Here's how you can calculate it:

Given:
Sample size (n) = 100
Sample mean (x̄) = 8 mmHg (decrease in blood pressure)
Sample standard deviation (s) = 3 mmHg (decrease in blood pressure)
Hypothesized mean (μ) = 10 mmHg (decrease in blood pressure)
Significance level (α) = 0.05

We can use the one-sample t-test formula to calculate the t-statistic:

t = (x̄ - μ) / (s / sqrt(n))

Substituting the values:
t = (8 - 10) / (3 / sqrt(100))

Calculating inside the parentheses:
t = -2 / (3 / 10)

Simplifying:
t = -20 / 3

The t-value for this test is approximately -6.67 (rounded to two decimal places).

Next, we compare the calculated t-value to the critical t-value from the t-distribution table or using statistical software. Since we are using a significance level (α) of 0.05 and the test is two-tailed (we are testing for a decrease or increase), the critical t-value will be ±1.984.

Since -6.67 is outside the range of -1.984 to 1.984, we have strong evidence to reject the null hypothesis.

Therefore, we can conclude that there is a statistically significant difference between the sample mean decrease in blood pressure (8 mmHg) and the hypothesized mean decrease (10 mmHg) at a significance level of 0.05.

# 15. An electronics company produces a certain type of product with a mean weight of 5 pounds and a standard deviation of 0.5 pounds. A random sample of 25 products is taken, and the sample mean weight is found to be 4.8 pounds. Test the hypothesis that the true mean weight of the products is less than 5 pounds with a significance level of 0.01.
ANSWER:

To test the hypothesis that the true mean weight of the products is less than 5 pounds, we can perform a one-sample t-test. Here's how you can calculate it:

Given:
Sample size (n) = 25
Sample mean (x̄) = 4.8 pounds
Population standard deviation (σ) = 0.5 pounds
Hypothesized mean (μ) = 5 pounds
Significance level (α) = 0.01

We can use the one-sample t-test formula to calculate the t-statistic:

t = (x̄ - μ) / (σ / sqrt(n))

Substituting the values:
t = (4.8 - 5) / (0.5 / sqrt(25))

Calculating inside the parentheses:
t = -0.2 / (0.5 / 5)

Simplifying:
t = -0.2 / 0.1

The t-value for this test is -2.

Next, we compare the calculated t-value to the critical t-value from the t-distribution table or using statistical software. Since we are using a one-tailed test with a significance level (α) of 0.01 and testing for a smaller mean weight, the critical t-value will be approximately -2.492 (for a left-tailed test).

Since -2 is greater than -2.492, we fail to reject the null hypothesis.

Therefore, we do not have sufficient evidence to conclude that the true mean weight of the products is less than 5 pounds at a significance level of 0.01.

# 16. Two groups of students are given different study materials to prepare for a test. The first group (n1 = 30) has a mean score of 80 with a standard deviation of 10, and the second group (n2 = 40) has a mean score of 75 with a standard deviation of 8. Test the hypothesis that the population means for the two groups are equal with a significance level of 0.01.
ANSWER:

To test the hypothesis that the population means for the two groups are equal, we can perform a two-sample t-test for independent samples. Here's how you can calculate it:

Given:
First group sample size (n1) = 30
First group sample mean (x̄1) = 80
First group standard deviation (s1) = 10

Second group sample size (n2) = 40
Second group sample mean (x̄2) = 75
Second group standard deviation (s2) = 8

Significance level (α) = 0.01

We can use the two-sample t-test formula to calculate the t-statistic:

t = (x̄1 - x̄2) / sqrt((s1^2 / n1) + (s2^2 / n2))

Substituting the values:
t = (80 - 75) / sqrt((10^2 / 30) + (8^2 / 40))

Calculating inside the square root:
t = 5 / sqrt((100 / 30) + (64 / 40))

Simplifying further:
t = 5 / sqrt(3.333 + 1.6)

Calculating the square root:
t = 5 / sqrt(4.933)

t ≈ 5 / 2.221

The t-value for this test is approximately 2.25 (rounded to two decimal places).

Next, we compare the calculated t-value to the critical t-value from the t-distribution table or using statistical software. Since we are using a significance level (α) of 0.01 and the test is two-tailed (we are testing for inequality), the critical t-value will be approximately ±2.612 (for a two-tailed test).

Since 2.25 is within the range of -2.612 to 2.612, we fail to reject the null hypothesis.

Therefore, we do not have sufficient evidence to conclude that the population means for the two groups are different at a significance level of 0.01.

# 17. A marketing company wants to estimate the average number of ads watched by viewers during a TV program. They take a random sample of 50 viewers and find that the sample mean is 4 with a standard deviation of 1.5. Estimate the population mean with a 99% confidence interval.
ANSWER:

To estimate the population mean number of ads watched by viewers during a TV program with a 99% confidence interval, we can use the sample data provided. Here's how you can calculate it:

Given:
Sample size (n) = 50
Sample mean (x̄) = 4
Sample standard deviation (s) = 1.5
Confidence level = 99% (which corresponds to a significance level of α = 0.01)

To calculate the confidence interval, we can use the formula for the confidence interval of the mean:

Confidence interval = x̄ ± (t * (s / sqrt(n)))

First, we need to determine the critical value, t, from the t-distribution table or using statistical software. Since the sample size is 50, the degrees of freedom will be n-1 = 49. For a 99% confidence level with 49 degrees of freedom, the critical value will be approximately 2.68.

Substituting the values into the formula, we have:

Confidence interval = 4 ± (2.68 * (1.5 / sqrt(50)))

Calculating the expression inside the parentheses:
= 4 ± (2.68 * (1.5 / 7.071))  # sqrt(50) = 7.071

Calculating the expression inside the parentheses:
= 4 ± (2.68 * 0.2121)

Finally, calculating the confidence interval:
= 4 ± 0.568

Therefore, the 99% confidence interval estimate for the average number of ads watched by viewers during a TV program is:
4 - 0.568 to 4 + 0.568

The 99% confidence interval is approximately 3.432 to 4.568. This means we can be 99% confident that the true average number of ads watched falls within this interval.