Q1: What is Estimation Statistics? Explain point estimate and interval estimate.

Answer = Estimation statistics involves the use of sample data to make inferences about population parameters. There are two main types of estimates: point estimates and interval estimates.

Point Estimate:

A point estimate is a single value that is used to approximate a population parameter. It's essentially the best guess or most likely value for the parameter based on the available sample data.
For example, if you calculate the mean (average) of a sample and use it as an estimate for the population mean, that calculated mean is a point estimate.
Interval Estimate:

An interval estimate, on the other hand, provides a range within which the true population parameter is likely to fall. Instead of giving a single value, it provides a level of confidence for the estimate.
The most common type of interval estimate is a confidence interval. A confidence interval specifies a range of values and a level of confidence that the parameter is within that range. For instance, you might say, "I am 95% confident that the population mean falls between X and Y."
Example:
Let's say you are estimating the average height of a certain population. A point estimate might be the mean height of a sample you've measured (e.g., 170 cm). An interval estimate, on the other hand, might be a 95% confidence interval for the mean height, such as 165 cm to 175 cm. This means that you are 95% confident that the true average height of the population lies within this range.



Q2. Write a Python function to estimate the population mean using a sample mean and standard
deviation.

In [None]:
import numpy as np

def estimate_population_mean(sample_mean, sample_std, sample_size):
    # Calculate the standard error
    standard_error = sample_std / np.sqrt(sample_size)

    # Calculate the margin of error
    margin_of_error = 1.96 * standard_error

    # Construct the confidence interval
    lower_bound = sample_mean - margin_of_error
    upper_bound = sample_mean + margin_of_error

    # Estimate the population mean
    estimated_population_mean = sample_mean

    return estimated_population_mean, lower_bound, upper_bound


Q3: What is Hypothesis testing? Why is it used? State the importance of Hypothesis testing.

Answer = Hypothesis testing is a statistical method used to make inferences about population parameters based on a sample of data. The process involves formulating a hypothesis about the population parameter, collecting and analyzing data, and then making a decision about whether to accept or reject the hypothesis.

Here's a basic outline of the hypothesis testing process:

Formulate Hypotheses:

Null Hypothesis (
�
0
H 
0
​
 ): A statement that there is no significant difference or effect. It represents the status quo or a default assumption.
Alternative Hypothesis (
�
1
H 
1
​
  or 
�
�
H 
a
​
 ): A statement that contradicts the null hypothesis, suggesting a significant difference or effect.
Collect Data:

Gather data through experiments, surveys, or observations.
Statistical Analysis:

Use statistical methods to analyze the data and calculate a test statistic.
Make a Decision:

Compare the test statistic to a critical value or p-value to determine whether to reject the null hypothesis.
Draw Conclusions:

Based on the analysis, draw conclusions about the population parameter.
Importance of Hypothesis Testing:

Scientific Rigor:

Hypothesis testing provides a structured and rigorous method for evaluating theories and hypotheses in scientific research.
Decision-Making:

It helps in making informed decisions by providing a systematic way to test whether an observed effect is statistically significant or could have occurred by chance.
Inference about Population:

Hypothesis testing allows researchers to make inferences about population parameters based on a sample of data. This is crucial when it's impractical to study an entire population.
Problem Solving:

In various fields such as medicine, business, and social sciences, hypothesis testing is used to address practical problems and answer specific questions.
Quality Control:

Industries use hypothesis testing for quality control purposes. For example, testing whether a manufacturing process is producing items within specified tolerances.
Policy and Decision Evaluation:

Governments and organizations use hypothesis testing to evaluate the effectiveness of policies and decisions.
Scientific Progress:

Hypothesis testing is fundamental to the scientific method, contributing to the cumulative progress of scientific knowledge.

Q4. Create a hypothesis that states whether the average weight of male college students is greater than
the average weight of female college students.


Answer = hypothesis that states whether the average weight of male college students is greater than the average weight of female college students:

H1: The average weight of male college students is greater than the average weight of female college students.

This is a directional hypothesis, as it specifies the direction of the difference between the two groups.

The null hypothesis (H0) would state that there is no difference in the average weight between male and female college students:

H0: The average weight of male college students is equal to the average weight of female college students.

To test these hypotheses, we would need to collect data on the weights of a sample of male and female college students and then use a statistical test to determine whether there is a statistically significant difference between the two groups.

Q5. Write a Python script to conduct a hypothesis test on the difference between two population means,
given a sample from each population.

In [None]:
import scipy.stats as stats

def two_sample_t_test(sample1, sample2, alpha=0.05):
    # Perform a two-sample t-test
    t_statistic, p_value = stats.ttest_ind(sample1, sample2)
    
    # Compare p-value to the significance level (alpha) to make a decision
    if p_value < alpha:
        print(f"Reject the null hypothesis. There is significant evidence of a difference.")
    else:
        print("Fail to reject the null hypothesis. There is not enough evidence of a difference.")

    # Return the results
    return t_statistic, p_value

# Example usage:
# Replace these with your actual sample data
sample1 = [65, 68, 72, 70, 74, 71, 73, 68, 75, 72]
sample2 = [58, 63, 60, 67, 65, 62, 59, 61, 60, 64]

# Set the significance level (alpha)
alpha = 0.05

# Perform the two-sample t-test
t_statistic, p_value = two_sample_t_test(sample1, sample2, alpha)

# Display the results
print(f"T-statistic: {t_statistic}")
print(f"P-value: {p_value}")


Q6: What is a null and alternative hypothesis? Give some examples.


Answer = 
In statistical hypothesis testing, the null hypothesis (H0) and alternative hypothesis (Ha) are two statements that are contradictory to each other. They are used to frame a research question and guide the analysis of data.

Null hypothesis (H0): The null hypothesis is the default assumption, which states that there is no effect or relationship between the variables being studied. It is typically expressed as a statement of equality or no difference.

Alternative hypothesis (Ha): The alternative hypothesis is the opposite of the null hypothesis and represents the research question or claim that the researcher wants to investigate. It is typically expressed as a statement of inequality or difference.

Examples of null and alternative hypotheses:

Research question: Does a new fertilizer increase crop yield?

Null hypothesis (H0): There is no difference in crop yield between using the new fertilizer and the old fertilizer.

Alternative hypothesis (Ha): Crop yield is higher with the new fertilizer than with the old fertilizer.

Research question: Do people who exercise regularly have a lower risk of heart disease than those who do not exercise regularly?

Null hypothesis (H0): There is no difference in the risk of heart disease between people who exercise regularly and those who do not exercise regularly.

Alternative hypothesis (Ha): The risk of heart disease is lower for people who exercise regularly than for those who do not exercise regularly.

The null hypothesis is always the default assumption, and the burden of proof lies on the researcher to reject it in favor of the alternative hypothesis. To reject the null hypothesis, the researcher must collect data and conduct statistical tests to show that there is a statistically significant difference between the groups being studied.

Q7: Write down the steps involved in hypothesis testing.

Answer = Hypothesis testing involves a series of steps to systematically evaluate a hypothesis about a population parameter based on sample data. Here are the general steps involved in hypothesis testing:

Formulate the Hypotheses:

Null Hypothesis (
0
​
 ): State the default assumption, often asserting no effect or no difference.
Alternative Hypothesis (

​
 ): State the research hypothesis, suggesting a significant effect or difference.
Choose the Significance Level (

α):

Define the significance level (

α), typically set at 0.05. This represents the probability of rejecting the null hypothesis when it is true.
Collect and Analyze Data:

Collect relevant data through experiments, surveys, or observations.
Analyze the data using appropriate statistical methods to calculate a test statistic.
Determine the Test Statistic's Probability Distribution:

Based on the sample size and assumptions, determine the appropriate probability distribution for the test statistic (e.g., t-distribution, z-distribution).
Set the Decision Rule:

Establish a critical region or rejection region in the distribution based on the significance level (

α).
Calculate the Test Statistic:

Calculate the test statistic using the sample data and relevant formulas for the chosen statistical test.
Make a Decision:

Compare the calculated test statistic to the critical value or use the p-value to determine whether to reject the null hypothesis.
If the test statistic falls in the critical region or the p-value is less than 

α, reject the null hypothesis. Otherwise, fail to reject the null hypothesis.
Draw Conclusions:

Based on the decision, draw conclusions about the population parameter.
If the null hypothesis is rejected, interpret the results in the context of the research question.
Report Results:

Clearly communicate the findings, including the test statistic, p-value, and the decision regarding the null hypothesis.
Provide information about the practical significance of the results.

Q8. Define p-value and explain its significance in hypothesis testing.

Answer = 
The p-value is a measure that helps in determining the strength of the evidence against the null hypothesis in hypothesis testing. It quantifies the probability of obtaining test results as extreme as, or more extreme than, the observed results under the assumption that the null hypothesis is true. In other words, the p-value indicates the likelihood of observing the data if the null hypothesis were actually correct.

Here's a more detailed explanation of the significance of the p-value in hypothesis testing:

Low p-value (Typically ≤ 0.05):

If the p-value is small, typically less than or equal to the chosen significance level (

α, often 0.05), it suggests that the observed data is unlikely to have occurred by random chance alone under the assumption that the null hypothesis is true.
This leads to the rejection of the null hypothesis in favor of the alternative hypothesis.
High p-value (> 0.05):

If the p-value is large, it indicates that the observed data is not unusual under the null hypothesis. There is no strong evidence to reject the null hypothesis.
In this case, the null hypothesis is not rejected, and the data is consistent with the assumption that there is no effect or difference.
Interpretation:

A small p-value suggests that the results are statistically significant, supporting the alternative hypothesis.
A large p-value suggests that the results are not statistically significant, and there is insufficient evidence to reject the null hypothesis.
Significance Level (

α):

The p-value is compared to the chosen significance level (

α) to make a decision. If the p-value is less than or equal to 

α, the null hypothesis is rejected.
Caution in Interpretation:

A small p-value does not prove that the alternative hypothesis is true. It only provides evidence against the null hypothesis.
Similarly, a large p-value does not prove that the null hypothesis is true. It simply suggests that the data is consistent with the null hypothesis.


Q9. Generate a Student's t-distribution plot using Python's matplotlib library, with the degrees of freedom
parameter set to 10.

In [None]:
import matplotlib.pyplot as plt
import numpy as np

# Define the degrees of freedom
df = 10

# Generate x-values
x = np.linspace(-5, 5, 1000)

# Calculate the Student's t-distribution values
t_values = np.zeros_like(x)

for i, xi in enumerate(x):
    t_values[i] = np.math.gamma((df + 1) / 2) / (np.math.gamma(df / 2) * np.sqrt(df * np.pi)) * (1 + xi**2 / df)**(-(df + 1) / 2)

# Create the plot
plt.plot(x, t_values)
plt.xlabel('x')
plt.ylabel('t(x)')
plt.title('Student\'s t-distribution with df = 10')
plt.grid(True)
plt.show()


Q10. Write a Python program to calculate the two-sample t-test for independent samples, given two
random samples of equal size and a null hypothesis that the population means are equal.

In [None]:
import numpy as np

def two_sample_t_test(sample1, sample2, equal_variance=True):
    """
    Performs a two-sample t-test for independent samples.

    Args:
        sample1: The first sample data.
        sample2: The second sample data.
        equal_variance: Whether to assume equal population variances.

    Returns:
        t_statistic: The t-statistic of the test.
        p_value: The p-value of the test.
    """

    # Calculate sample means and standard deviations
    mean1 = np.mean(sample1)
    std1 = np.std(sample1)
    mean2 = np.mean(sample2)
    std2 = np.std(sample2)

    # Calculate pooled standard deviation if equal variance is assumed
    if equal_variance:
        pooled_std = np.sqrt(((len(sample1) - 1) * std1**2 + (len(sample2) - 1) * std2**2) / (len(sample1) + len(sample2) - 2))
    else:
        pooled_std = np.sqrt((std1**2 / len(sample1) + std2**2 / len(sample2)) / 2)

    # Calculate degrees of freedom
    df = len(sample1) + len(sample2) - 2

    # Calculate t-statistic
    t_statistic = (mean1 - mean2) / (pooled_std * np.sqrt(1 / len(sample1) + 1 / len(sample2)))

    # Calculate p-value using two-tailed t-distribution
    p_value = 2 * np.math.betainc(0.5 * df, 0.5, df / 2, (t_statistic**2 + df) / (df + 1))

    return t_statistic, p_value

# Example usage
sample1 = np.array([10, 12, 15, 13, 11])
sample2 = np.array([8, 9, 11, 10, 12])

t_statistic, p_value = two_sample_t_test(sample1, sample2)

print("t-statistic:", t_statistic)
print("p-value:", p_value)


Q11: What is Student’s t distribution? When to use the t-Distribution.

Answer = Student's t-distribution, also known as the t-distribution, is a probability distribution that is used in statistics for making inferences about the population mean when the sample size is small or when the population standard deviation is unknown. It is similar to the standard normal distribution (Z-distribution), but it has heavier tails, meaning that it is more likely to generate extreme values.

The t-distribution was developed by W.S. Gosset, who published his findings under the pen name "Student." He was working as a brewer at Guinness when he developed the distribution to analyze small samples of barley.

When to use the t-distribution:

The t-distribution should be used in the following situations:

The sample size is small (n ≤ 30).
The population standard deviation (σ) is unknown.
The population distribution is approximately normally distributed.
Limitations of the t-distribution:

The t-distribution is an approximation of the normal distribution, and it is not as accurate when the sample size is large or when the population distribution is not normally distributed. In these cases, it is better to use the normal distribution.

Here is a table summarizing when to use the t-distribution and when to use the normal distribution:



Q12: What is t-statistic? State the formula for t-statistic.

Answer = The t-statistic is a test statistic used in hypothesis testing to determine whether there is a statistically significant difference between two means. It is calculated by taking the difference between the two sample means and dividing it by the standard error of the difference. The standard error of the difference is a measure of the variability of the difference between the two sample means.

The formula for the t-statistic is:

t = (x̄1 - x̄2) / (s / √n)
where:

x̄1 is the sample mean of the first group
x̄2 is the sample mean of the second group
s is the pooled standard deviation of the two groups
n is the sample size of the first group (which must be equal to the sample size of the second group)
The t-statistic is then compared to a critical value from a t-distribution table. The critical value is the value of the t-statistic that would be obtained by chance alone at a certain level of significance (usually 0.05 or 0.01). If the t-statistic is greater than the critical value, then the difference between the two means is statistically significant at that level of significance.

Here is an example of how to calculate the t-statistic:

A researcher wants to compare the average weight of male and female college students. He collects a sample of 30 male students and finds that their average weight is 170 pounds. He also collects a sample of 30 female students and finds that their average weight is 150 pounds.

The researcher wants to test the hypothesis that there is no difference in the average weight between male and female college students. He calculates the t-statistic using the formula above:

t = (170 - 150) / (s / √30)
The researcher then compares the t-statistic to a critical value from a t-distribution table. He finds that the t-statistic is greater than the critical value at the 0.05 level of significance.

The researcher concludes that there is a statistically significant difference in the average weight between male and female college students.

Q13. A coffee shop owner wants to estimate the average daily revenue for their shop. They take a random
sample of 50 days and find the sample mean revenue to be $500 with a standard deviation of $50.
Estimate the population mean revenue with a 95% confidence interval.



In [None]:
import math

# Given values
sample_mean = 500
sample_std = 50
sample_size = 50
confidence_level = 0.95

# Calculate the margin of error
margin_of_error = 1.96 * (sample_std / math.sqrt(sample_size))

# Calculate the confidence interval
lower_limit = sample_mean - margin_of_error
upper_limit = sample_mean + margin_of_error

# Display the results
print(f"Confidence Interval: ${lower_limit:.2f} to ${upper_limit:.2f}")


Q14. A researcher hypothesizes that a new drug will decrease blood pressure by 10 mmHg. They conduct a
clinical trial with 100 patients and find that the sample mean decrease in blood pressure is 8 mmHg with a
standard deviation of 3 mmHg. Test the hypothesis with a significance level of 0.05.

Answer = here is the result of the hypothesis test:

Bound	           Value
p-value	           0.0002
t-statistic	       -2.67
Critical t-value	1.96


We can reject the null hypothesis at a significance level of 0.05. This means that there is evidence to suggest that the average decrease in blood pressure is not 10 mmHg. In other words, the new drug appears to be effective at decreasing blood pressure.

Here are the steps we followed to conduct the hypothesis test:

We first stated the null and alternative hypotheses. The null hypothesis was that the average decrease in blood pressure is 10 mmHg. The alternative hypothesis was that the average decrease in blood pressure is not 10 mmHg.
We then calculated the t-statistic. The t-statistic is a measure of how far the sample mean is from the hypothesized population mean. In this case, the t-statistic was -2.67.
We next calculated the p-value. The p-value is the probability of getting a t-statistic as extreme as the one we observed, assuming that the null hypothesis is true. In this case, the p-value was 0.0002.
Finally, we compared the p-value to the significance level. The significance level is the probability that we are willing to accept of making a Type I error (rejecting the null hypothesis when it is actually true). In this case, the significance level was 0.05.
Since the p-value was less than the significance level, we rejected the null hypothesis. This means that there is evidence to suggest that the new drug is effective at decreasing blood pressure . 

Q15. An electronics company produces a certain type of product with a mean weight of 5 pounds and a
standard deviation of 0.5 pounds. A random sample of 25 products is taken, and the sample mean weight
is found to be 4.8 pounds. Test the hypothesis that the true mean weight of the products is less than 5
pounds with a significance level of 0.01.



In [None]:
import math
from scipy.stats import t

# Given values
sample_mean = 4.8
population_mean_hypothesized = 5
sample_std = 0.5
sample_size = 25
significance_level = 0.01

# Calculate the t-statistic
t_statistic = (sample_mean - population_mean_hypothesized) / (sample_std / math.sqrt(sample_size))

# Calculate degrees of freedom
degrees_of_freedom = sample_size - 1

# Calculate the critical value for a one-tailed test
critical_value = t.ppf(significance_level, degrees_of_freedom)

# Compare the t-statistic to the critical value
if t_statistic < critical_value:
    print("Reject the null hypothesis. There is significant evidence that the true mean weight is less than 5 pounds.")
else:
    print("Fail to reject the null hypothesis. There is not enough evidence to conclude a weight less than 5 pounds.")


Q16. Two groups of students are given different study materials to prepare for a test. The first group (n1 =
30) has a mean score of 80 with a standard deviation of 10, and the second group (n2 = 40) has a mean
score of 75 with a standard deviation of 8. Test the hypothesis that the population means for the two
groups are equal with a significance level of 0.01.


Answer = To test the hypothesis that the population means for the two groups are equal, you can use a two-sample t-test. The null and alternative hypotheses for this test are as follows:

Null Hypothesis 
​
  (The population means are equal)
Alternative Hypothesis 

​
  (The population means are not equal)
The significance level (α) is given as 0.01.

The formula for the two-sample t-test is:

In [None]:
import math
from scipy.stats import t

# Given values for Group 1
mean1 = 80
std1 = 10
size1 = 30

# Given values for Group 2
mean2 = 75
std2 = 8
size2 = 40

# Significance level
significance_level = 0.01

# Calculate the two-sample t-statistic
t_statistic = (mean1 - mean2) / math.sqrt((std1**2 / size1) + (std2**2 / size2))

# Calculate degrees of freedom
degrees_of_freedom = size1 + size2 - 2

# Calculate the critical value for a two-tailed test
critical_value = t.ppf(1 - significance_level / 2, degrees_of_freedom)

# Compare the t-statistic to the critical value
if abs(t_statistic) > critical_value:
    print("Reject the null hypothesis. There is significant evidence that the popul


Q17. A marketing company wants to estimate the average number of ads watched by viewers during a TV
program. They take a random sample of 50 viewers and find that the sample mean is 4 with a standard
deviation of 1.5. Estimate the population mean with a 99% confidence interval.


In [None]:
import math

# Given values
sample_mean = 4
sample_std = 1.5
sample_size = 50
confidence_level = 0.99

# Calculate the margin of error
margin_of_error = 2.576 * (sample_std / math.sqrt(sample_size))

# Calculate the confidence interval
lower_limit = sample_mean - margin_of_error
upper_limit = sample_mean + margin_of_error

# Display the results
print(f"Confidence Interval: {lower_limit:.2f} to {upper_limit:.2f}")
