## Q1: What is Estimation Statistics? Explain point estimate and interval estimate.

Ans= Estimation statistics is a branch of statistics that deals with estimating population parameters based on sample data. When the entire population cannot be measured or observed directly, estimation statistics allows us to make inferences and approximate the unknown parameters using information from a representative sample.

Point Estimate:
A point estimate is a single value that is used to estimate an unknown population parameter. It provides a "best guess" or approximation of the parameter based on the sample data. For example, if we want to estimate the population mean, we can use the sample mean as a point estimate. Similarly, if we want to estimate the population proportion, we can use the sample proportion as a point estimate. Point estimates are simple and easy to calculate, but they do not provide information about the precision or reliability of the estimate.

Interval Estimate:
An interval estimate provides a range of values within which the population parameter is likely to lie. It takes into account the variability of the sample data and provides a measure of uncertainty associated with the estimate. The interval is constructed using a point estimate along with a margin of error or a confidence level. The margin of error represents the amount of uncertainty or sampling error in the estimate.

## Q2. Write a Python function to estimate the population mean using a sample mean and standard deviation.

In [1]:
import numpy as np
from scipy import stats

def estimate_population_mean(sample, confidence_level=0.95):
    sample_size = len(sample)
    sample_mean = np.mean(sample)
    sample_stddev = np.std(sample, ddof=1)  # ddof=1 for sample standard deviation

    margin_of_error = stats.t.ppf((1 + confidence_level) / 2, df=sample_size-1) * (sample_stddev / np.sqrt(sample_size))
    lower_bound = sample_mean - margin_of_error
    upper_bound = sample_mean + margin_of_error

    return lower_bound, upper_bound

sample = [10, 12, 11, 9, 10, 13, 11, 12, 9, 11]

lower_bound, upper_bound = estimate_population_mean(sample)

print("Estimated population mean: {:.2f}".format(np.mean(sample)))
print("Confidence interval: [{:.2f}, {:.2f}]".format(lower_bound, upper_bound))


Estimated population mean: 10.80
Confidence interval: [9.86, 11.74]


## Q3: What is Hypothesis testing? Why is it used? State the importance of Hypothesis testing.

Ans= Hypothesis testing is a statistical procedure used to make inferences and draw conclusions about a population based on sample data. It involves formulating two competing hypotheses, the null hypothesis (H0) and the alternative hypothesis (H1 or Ha), and then testing the data against these hypotheses to assess the strength of evidence.

The importance of hypothesis testing lies in its ability to provide a structured framework for making decisions and drawing conclusions based on evidence. Here are a few key reasons why hypothesis testing is significant:

Statistical Inference: Hypothesis testing allows researchers to make inferences about population parameters using sample data. It provides a way to generalize findings from a sample to the larger population.

Evidence-Based Decision Making: By setting up hypotheses and performing hypothesis tests, we can objectively evaluate the strength of evidence for or against a specific claim or hypothesis. This helps in making informed decisions based on data rather than relying on intuition or subjective opinions.

## Q4. Create a hypothesis that states whether the average weight of male college students is greater than the average weight of female college students.

Ans= Hypothesis: The average weight of male college students is greater than the average weight of female college students.

Null Hypothesis (H0): The average weight of male college students is equal to or less than the average weight of female college students.

Alternative Hypothesis (Ha): The average weight of male college students is greater than the average weight of female college students.

Symbolically:

H0: μ_male ≤ μ_female

Ha: μ_male > μ_female

In this hypothesis, we are comparing the average weight (μ) of male college students to that of female college students. The null hypothesis assumes that there is no significant difference in the average weight between the two groups, while the alternative hypothesis suggests that there is a difference, specifically that the average weight of male college students is greater. By conducting hypothesis testing and analyzing the data, we can determine whether there is enough evidence to reject the null hypothesis in favor of the alternative hypothesis.



## Q5. Write a Python script to conduct a hypothesis test on the difference between two population means, given a sample from each population.

In [2]:
import numpy as np
from scipy import stats

def hypothesis_test(sample1, sample2, alpha):
    # Perform a two-sample independent t-test
    t_stat, p_value = stats.ttest_ind(sample1, sample2)

    # Compare p-value with the significance level (alpha)
    if p_value < alpha:
        print("Reject the null hypothesis. There is evidence to support a significant difference between the population means.")
    else:
        print("Fail to reject the null hypothesis. There is no significant difference between the population means.")

    # Print the test statistic and p-value
    print("Test statistic:", t_stat)
    print("P-value:", p_value)


# Example usage
sample1 = [1, 2, 3, 4, 5]
sample2 = [6, 7, 8, 9, 10]
alpha = 0.05

hypothesis_test(sample1, sample2, alpha)


Reject the null hypothesis. There is evidence to support a significant difference between the population means.
Test statistic: -5.0
P-value: 0.001052825793366539


## Q6: What is a null and alternative hypothesis? Give some examples.

Ans=

1) The Null Hypothesis (H0):
The null hypothesis represents the default or baseline assumption. It states that there is no significant effect, difference, or relationship in the population. It assumes that any observed differences or relationships in the sample are due to random chance or sampling variability. The null hypothesis is typically denoted as H0.

Examples of null hypotheses:

The average test scores of students who received tutoring and those who did not receive tutoring are equal.

The proportion of defective products produced by Machine A is the same as the proportion of defective products produced by Machine B.

2) The Alternative Hypothesis (Ha or H1):
The alternative hypothesis represents an alternative claim or hypothesis that contradicts the null hypothesis. It suggests that there is a significant effect, difference, or relationship in the population. It posits that the observed differences or relationships in the sample are not due to random chance but are representative of a real effect or relationship in the population. The alternative hypothesis can be one-sided (directional) or two-sided (non-directional).

Examples of alternative hypotheses:

The average test scores of students who received tutoring are higher than those who did not receive tutoring.

The proportion of defective products produced by Machine A is different from the proportion of defective products produced by Machine B.

## Q7: Write down the steps involved in hypothesis testing.

Ans= Hypothesis testing involves several steps to make a statistical inference and draw conclusions about a population based on sample data. Here are the general steps involved in hypothesis testing:

1) Formulate the Null Hypothesis (H0) and Alternative Hypothesis (Ha): Clearly define the null hypothesis, which represents the default assumption of no effect or no difference in the population. Also, specify the alternative hypothesis, which represents the claim or hypothesis that contradicts the null hypothesis.

2) Set the Significance Level (α): Determine the desired significance level, denoted as α (alpha). It represents the maximum acceptable probability of making a Type I error (rejecting the null hypothesis when it is true). Commonly used significance levels are 0.05 (5%) or 0.01 (1%).

3) Collect and Analyze Sample Data: Gather a representative sample from the population of interest and perform the necessary data collection and preparation. Calculate relevant statistics (e.g., sample mean, sample proportion) and assess the characteristics of the sample.

4) Choose a Statistical Test: Select an appropriate statistical test based on the nature of the data and the hypothesis being tested. Common tests include t-tests, chi-square tests, ANOVA, regression analysis, etc. 

5) Calculate the Test Statistic: Compute the test statistic, which is a measure that quantifies the difference or relationship between the sample data and the null hypothesis. The test statistic depends on the chosen statistical test and the specific hypothesis being tested.

6) Determine the Critical Region and p-value: Identify the critical region, which is the range of values that would lead to rejection of the null hypothesis. Alternatively, calculate the p-value, which is the probability of obtaining a test statistic as extreme as, or more extreme than, the observed value, assuming the null hypothesis is true.

7) Make a Decision: Compare the test statistic (or p-value) to the critical region (or significance level α). If the test statistic falls within the critical region or the p-value is less than α, reject the null hypothesis. If the test statistic does not fall within the critical region or the p-value is greater than or equal to α, fail to reject the null hypothesis.

## Q8. Define p-value and explain its significance in hypothesis testing.

Ans= In hypothesis testing, the p-value is a probability value that measures the strength of evidence against the null hypothesis. It represents the probability of observing a test statistic as extreme as, or more extreme than, the one computed from the sample data, assuming that the null hypothesis is true.

Here's a breakdown of the significance of the p-value in hypothesis testing:

1) Assessing Statistical Significance: The p-value provides a quantitative measure of the statistical significance of the observed data. If the p-value is very small (typically below the chosen significance level α), it suggests strong evidence against the null hypothesis and indicates that the observed effect or difference is unlikely to have occurred by chance alone.

2) Decision Rule: The p-value helps in making a decision about the null hypothesis. If the p-value is less than or equal to the significance level α, we reject the null hypothesis in favor of the alternative hypothesis. Conversely, if the p-value is greater than α, we fail to reject the null hypothesis.

3) Precision of the Evidence: The p-value provides information about the strength of the evidence against the null hypothesis. A very small p-value indicates a precise and convincing evidence against the null hypothesis, suggesting that the observed effect or difference is likely not due to chance.


## Q9. Generate a Student's t-distribution plot using Python's matplotlib library, with the degrees of freedom parameter set to 10.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import t

# Generate x values for the range of the t-distribution
x = np.linspace(-4, 4, 500)

# Calculate the probability density function (PDF) for the t-distribution with 10 degrees of freedom
pdf = t.pdf(x, df=10)

# Create the plot
plt.plot(x, pdf, label='t-distribution (df=10)')
plt.xlabel('x')
plt.ylabel('Probability Density')
plt.title("Student's t-distribution")
plt.legend()
plt.grid(True)

# Show the plot
plt.show()


## Q10. Write a Python program to calculate the two-sample t-test for independent samples, given two random samples of equal size and a null hypothesis that the population means are equal.

In [None]:
import numpy as np
from scipy import stats

sample1 = np.array([1, 2, 3, 4, 5])
sample2 = np.array([6, 7, 8, 9, 10])

t_stat, p_value = stats.ttest_ind(sample1, sample2)

print("T-statistic:", t_stat)
print("P-value:", p_value)


## Q11: What is Student’s t distribution? When to use the t-Distribution.

Ans= The Student's t-distribution, also known as the t-distribution, is a probability distribution that is used in statistical inference for situations where the sample size is small or the population standard deviation is unknown.

The t-distribution is used in various statistical procedures, including:

Confidence Intervals: When the population standard deviation is unknown and the sample size is small, the t-distribution is used to construct confidence intervals for population parameters (e.g., mean, difference of means).

Hypothesis Testing: When comparing means or proportions between two independent samples with small sample sizes, the t-distribution is used in hypothesis testing. It allows for testing the null hypothesis and determining whether the observed difference is statistically significant.

One-Sample t-test: When testing the mean of a single sample against a known or hypothesized value, the t-distribution is used to assess the statistical significance.

The t-distribution is particularly useful when the sample size is small (typically less than 30) and the population standard deviation is unknown

## Q12: What is t-statistic? State the formula for t-statistic.

Ans= The t-statistic is a measure that quantifies the difference between the sample mean and the hypothesized population mean, relative to the variability within the sample. It is commonly used in hypothesis testing and confidence interval estimation when the population standard deviation is unknown and must be estimated from the sample.

The formula for the t-statistic depends on the specific scenario being analyzed. Here are the formulas for two common cases:

One-Sample t-Test:
The t-statistic for a one-sample t-test compares the sample mean (x̄) to the hypothesized population mean (μ0). It is calculated as:

t = (x̄ - μ0) / (s / √n)

where:

x̄ is the sample mean,

μ0 is the hypothesized population mean,

s is the sample standard deviation,

n is the sample size.

## Q13. A coffee shop owner wants to estimate the average daily revenue for their shop. They take a random sample of 50 days and find the sample mean revenue to be $500 with a standard deviation of $50. Estimate the population mean revenue with a 95% confidence interval.

Ans= Determine the critical value:

Since we want a 95% confidence interval, we need to find the critical value associated with a 95% confidence level and the given sample size (50). The degrees of freedom for a sample size of 50 are 50 - 1 = 49. We can find the critical value using the t-distribution or refer to a t-table. For a 95% confidence level and 49 degrees of freedom, the critical value is approximately 2.01.

Calculate the margin of error:

The margin of error is the product of the critical value and the standard error. The standard error is calculated by dividing the sample standard deviation by the square root of the sample size.

Margin of Error = Critical Value * (Standard Deviation / √(Sample Size))
Margin of Error = 2.01 * (50 / √(50))

Calculate the confidence interval:

The confidence interval is obtained by subtracting and adding the margin of error to the sample mean.

Lower Bound = Sample Mean - Margin of Error
Upper Bound = Sample Mean + Margin of Error

Lower Bound = 500 - Margin of Error
Upper Bound = 500 + Margin of Error

Substituting the values into the equations:

Margin of Error = 2.01 * (50 / √50) ≈ 14.18

Lower Bound = 500 - 14.18 ≈ 485.82
Upper Bound = 500 + 14.18 ≈ 514.18

Therefore, the 95% confidence interval for the average daily revenue is approximately $485.82 to $514.18. We can be 95% confident that the true population mean revenue falls within this interval based on the given sample.

## Q14. A researcher hypothesizes that a new drug will decrease blood pressure by 10 mmHg. They conduct a clinical trial with 100 patients and find that the sample mean decrease in blood pressure is 8 mmHg with a standard deviation of 3 mmHg. Test the hypothesis with a significance level of 0.05.

Ans= Formulate the hypotheses:

H0: The mean decrease in blood pressure is equal to 10 mmHg.

Ha: The mean decrease in blood pressure is different from 10 mmHg.

Set the significance level:

The significance level (α) is given as 0.05. This represents the probability of making a Type I error, which is rejecting the null hypothesis when it is true.

Calculate the test statistic:

The test statistic for a one-sample t-test is calculated as the difference between the sample mean and the hypothesized mean, divided by the standard error of the mean.

t = (Sample Mean - Hypothesized Mean) / (Standard Deviation / √Sample Size)
t = (8 - 10) / (3 / √100)

t = -2 / 0.3

t ≈ -6.67

Determine the critical value:

Since the alternative hypothesis is two-sided, we need to find the critical values for a two-tailed test at the specified significance level (α = 0.05) and degrees of freedom (n - 1 = 100 - 1 = 99). Using a t-table or statistical software, the critical values are approximately ±1.984.

Make a decision:

Compare the test statistic to the critical values. If the test statistic falls outside the critical region, we reject the null hypothesis. Otherwise, we fail to reject the null hypothesis.

In this case, the test statistic (t ≈ -6.67) falls far beyond the critical region (-1.984 to 1.984). Thus, we can reject the null hypothesis.

Draw a conclusion:

Based on the results, there is evidence to suggest that the new drug has a statistically significant effect on decreasing blood pressure, as the mean decrease in blood pressure is significantly different from 10 mmHg at the 0.05 significance level.

## Q15. An electronics company produces a certain type of product with a mean weight of 5 pounds and a standard deviation of 0.5 pounds. A random sample of 25 products is taken, and the sample mean weight is found to be 4.8 pounds. Test the hypothesis that the true mean weight of the products is less than 5 pounds with a significance level of 0.01.

Ans= Formulate the hypotheses:

H0: The true mean weight of the products is equal to 5 pounds.

Ha: The true mean weight of the products is less than 5 pounds.

Set the significance level:

The significance level (α) is given as 0.01. This represents the probability of making a Type I error, which is rejecting the null hypothesis when it is true.

Calculate the test statistic:

The test statistic for a one-sample t-test is calculated as the difference between the sample mean and the hypothesized mean, divided by the standard error of the mean.

t = (Sample Mean - Hypothesized Mean) / (Standard Deviation / √Sample Size)

t = (4.8 - 5) / (0.5 / √25)

t = -0.2 / 0.1

t = -2

Determine the critical value:

Since the alternative hypothesis is one-sided (less than), we need to find the critical value for a one-tailed test at the specified significance level (α = 0.01) and degrees of freedom (n - 1 = 25 - 1 = 24). Using a t-table or statistical software, the critical value is approximately -2.492.

Make a decision:

Compare the test statistic to the critical value. If the test statistic falls in the critical region (to the left of the critical value), we reject the null hypothesis. Otherwise, we fail to reject the null hypothesis.

In this case, the test statistic (t = -2) falls in the critical region (-∞ to -2.492). Thus, we can reject the null hypothesis.

Draw a conclusion:

Based on the results, there is evidence to suggest that the true mean weight of the products is less than 5 pounds at the 0.01 significance level. The sample provides statistically significant evidence supporting the alternative hypothesis that the mean weight is less than 5 pounds.

## Q16. Two groups of students are given different study materials to prepare for a test. The first group (n1 = 30) has a mean score of 80 with a standard deviation of 10, and the second group (n2 = 40) has a mean score of 75 with a standard deviation of 8. Test the hypothesis that the population means for the two groups are equal with a significance level of 0.01.

Ans= Formulate the hypotheses:

H0: The population means of the two groups are equal.

Ha: The population means of the two groups are different.

Set the significance level:

The significance level (α) is given as 0.01. This represents the probability of making a Type I error, which is rejecting the null hypothesis when it is true.

Calculate the test statistic:

The test statistic for a two-sample t-test is calculated as the difference between the sample means, divided by the pooled standard error.

t = (Mean1 - Mean2) / √((S1^2 / n1) + (S2^2 / n2))

where:

Mean1 and Mean2 are the sample means of the two groups,
S1 and S2 are the sample standard deviations of the two groups,
n1 and n2 are the sample sizes of the two groups.

t = (80 - 75) / √((10^2 / 30) + (8^2 / 40))

t = 5 / √((100 / 30) + (64 / 40))

t = 5 / √(3.33 + 1.6)

t = 5 / √4.93

t ≈ 5 / 2.22

t ≈ 2.25

Determine the critical value:

Since the alternative hypothesis is two-sided, we need to find the critical values for a two-tailed test at the specified significance level (α = 0.01) and degrees of freedom. The degrees of freedom can be approximated as (n1 + n2 - 2) = (30 + 40 - 2) = 68. Using a t-table or statistical software, the critical values are approximately ±2.616.

Make a decision:

Compare the test statistic to the critical values. If the test statistic falls outside the critical region, we reject the null hypothesis. Otherwise, we fail to reject the null hypothesis.

In this case, the test statistic (t ≈ 2.25) does not fall outside the critical region (-2.616 to 2.616). Thus, we fail to reject the null hypothesis.

Draw a conclusion:

Based on the results, there is not enough evidence to suggest that the population means of the two groups are different at the 0.01 significance level. The sample does not provide statistically significant evidence to reject the null hypothesis.

## Q17. A marketing company wants to estimate the average number of ads watched by viewers during a TV program. They take a random sample of 50 viewers and find that the sample mean is 4 with a standard deviation of 1.5. Estimate the population mean with a 99% confidence interval.

Ans= Determine the critical value:

Since we want a 99% confidence interval, we need to find the critical value associated with a 99% confidence level and the given sample size (50). The degrees of freedom for a sample size of 50 are 50 - 1 = 49. We can find the critical value using the t-distribution or refer to a t-table. For a 99% confidence level and 49 degrees of freedom, the critical value is approximately 2.678.

Calculate the margin of error:

The margin of error is the product of the critical value and the standard error. The standard error is calculated by dividing the sample standard deviation by the square root of the sample size.

Margin of Error = Critical Value * (Standard Deviation / √(Sample Size))

Margin of Error = 2.678 * (1.5 / √(50))

Calculate the confidence interval:

The confidence interval is obtained by subtracting and adding the margin of error to the sample mean.

Lower Bound = Sample Mean - Margin of Error

Upper Bound = Sample Mean + Margin of Error

Lower Bound = 4 - Margin of Error

Upper Bound = 4 + Margin of Error

Substituting the values into the equations:

Margin of Error = 2.678 * (1.5 / √50) ≈ 0.630

Lower Bound = 4 - 0.630 ≈ 3.370

Upper Bound = 4 + 0.630 ≈ 4.630

Therefore, the 99% confidence interval for the average number of ads watched by viewers during a TV program is approximately 3.370 to 4.630. We can be 99% confident that the true population mean falls within this interval based on the given sample.