In [None]:
Q1: Estimation Statistics:

Estimation statistics is a branch of statistics that deals with estimating population parameters (e.g., mean, proportion, variance) based on sample data. When it is not feasible or practical to collect data from an entire population, researchers take a sample from that population and use statistical techniques to make inferences about the unknown population parameters.

Point Estimate: A point estimate is a single value that is used to estimate the population parameter. It is calculated based on the sample data and serves as the best guess for the unknown population parameter. For example, the sample mean is often used as a point estimate for the population mean.
Interval Estimate: An interval estimate provides a range of values within which the true population parameter is likely to lie, along with a level of confidence. It is represented as an interval or a range of values and is derived from the sample data. The most common type of interval estimate is the confidence interval, which provides an estimate of the parameter along with an associated level of confidence.
Q2: Python function to estimate the population mean:

python
Copy code
def population_mean_estimate(sample_mean, sample_std_dev, sample_size):
    import math

    standard_error = sample_std_dev / math.sqrt(sample_size)
    z_critical = 1.96  # For a 95% confidence interval (you can change this for different confidence levels)

    lower_bound = sample_mean - z_critical * standard_error
    upper_bound = sample_mean + z_critical * standard_error

    return (lower_bound, upper_bound)

# Example usage:
sample_mean = 500
sample_std_dev = 50
sample_size = 50
confidence_interval = population_mean_estimate(sample_mean, sample_std_dev, sample_size)
print("Population mean estimate (95% confidence interval):", confidence_interval)
Q3: Hypothesis Testing:

Hypothesis testing is a statistical method used to make inferences about a population based on a sample data. It involves setting up two competing hypotheses, the null hypothesis (H0) and the alternative hypothesis (Ha), and then using sample data to determine whether there is enough evidence to reject the null hypothesis in favor of the alternative hypothesis.

The null hypothesis (H0) typically represents the status quo or the hypothesis of no effect or no difference, while the alternative hypothesis (Ha) represents what the researcher is trying to show or prove.

Hypothesis testing is used to make decisions and draw conclusions about population parameters or characteristics when only sample data is available.

Q4: Hypothesis about the average weight of male college students vs. female college students:

Null Hypothesis (H0): The average weight of male college students is equal to or less than the average weight of female college students.

Alternative Hypothesis (Ha): The average weight of male college students is greater than the average weight of female college students.

Q5: Python script to conduct a hypothesis test on the difference between two population means:

python
Copy code
def two_sample_t_test(sample1, sample2, alpha=0.05):
    import scipy.stats as stats

    # Calculate means and standard deviations
    mean1 = sample1.mean()
    mean2 = sample2.mean()
    std_dev1 = sample1.std()
    std_dev2 = sample2.std()

    # Calculate the pooled standard deviation
    pooled_std_dev = ((std_dev1**2 + std_dev2**2) / 2)**0.5

    # Calculate the t-statistic
    t_statistic = (mean1 - mean2) / (pooled_std_dev * (2 / (len(sample1) + len(sample2)))**0.5)

    # Calculate the degrees of freedom
    degrees_of_freedom = len(sample1) + len(sample2) - 2

    # Calculate the critical value (two-tailed test)
    critical_value = stats.t.ppf(1 - alpha / 2, degrees_of_freedom)

    # Compare the t-statistic with the critical value to make the decision
    if abs(t_statistic) > critical_value:
        print("Reject the null hypothesis. There is enough evidence to support the alternative hypothesis.")
    else:
        print("Fail to reject the null hypothesis. There is not enough evidence to support the alternative hypothesis.")

# Example usage:
sample1 = [65, 68, 72, 70, 67]
sample2 = [60, 62, 59, 65, 61]
two_sample_t_test(sample1, sample2)
Q6: Null and Alternative Hypothesis:

Null Hypothesis (H0): It is the default assumption or statement of no effect, no difference, or no relationship between variables. It represents the status quo and is assumed to be true unless there is sufficient evidence to reject it.
Example: The average commute time by car to work is equal to 30 minutes.

Alternative Hypothesis (Ha): It is the statement that contradicts the null hypothesis. It represents what the researcher wants to show or prove and is often expressed as a specific direction or effect.
Example: The average commute time by car to work is less than 30 minutes.

Q7: Steps involved in hypothesis testing:

Formulate the null hypothesis (H0) and alternative hypothesis (Ha).
Choose the appropriate statistical test and significance level (alpha).
Collect data through sampling.
Calculate the test statistic based on the sample data.
Determine the critical value(s) based on the chosen significance level and distribution of the test statistic (e.g., t-distribution, z-distribution).
Compare the test statistic with the critical value(s) to make a decision:
If the test statistic falls in the critical region (reject region), reject the null hypothesis in favor of the alternative hypothesis.
If the test statistic does not fall in the critical region, fail to reject the null hypothesis (insufficient evidence to support the alternative hypothesis).
Interpret the results and draw conclusions based on the decision made.
Q8: P-value and its significance in hypothesis testing:

The p-value is a probability value that helps determine the strength of evidence against the null hypothesis. It is the probability of obtaining a test statistic as extreme or more extreme than the one calculated from the sample data, assuming the null hypothesis is true.

If the p-value is small (typically smaller than the chosen significance level, alpha), it suggests strong evidence against the null hypothesis. In this case, the null hypothesis is rejected in favor of the alternative hypothesis.
If the p-value is large, it suggests weak evidence against the null hypothesis. In this case, the null hypothesis is not rejected (fail to reject).
The p-value provides a way to quantify the uncertainty in the hypothesis testing process and helps researchers make decisions based on the strength of evidence in the data.

Q9: Generate a Student's t-distribution plot using Python's matplotlib library, with the degrees of freedom parameter set to 10:

python
Copy code
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats

# Generate t-distribution data
df = 10
x = np.linspace(-5, 5, 500)  # Range of values for x-axis
y = stats.t.pdf(x, df)       # Probability density function values for the t-distribution

# Plot the t-distribution
plt.plot(x, y, label='t-distribution (df=10)')
plt.xlabel('x')
plt.ylabel('Probability Density')
plt.title('Student\'s t-distribution')
plt.legend()
plt.grid(True)
plt.show()
Q10: Python program to calculate the two-sample t-test for independent samples:

python
Copy code
def two_sample_t_test(sample1, sample2, null_hypothesis=0, alpha=0.05):
    import scipy.stats as stats

    # Calculate means and standard deviations
    mean1 = sample1.mean()
    mean2 = sample2.mean()
    std_dev1 = sample1.std()
    std_dev2 = sample2.std()

    # Calculate the pooled standard deviation
    pooled_std_dev = ((std_dev1**2 + std_dev2**2) / 2)**0.5

    # Calculate the t-statistic
    t_statistic = (mean1 - mean2 - null_hypothesis) / (pooled_std_dev * (2 / (len(sample1) + len(sample2)))**0.5)

    # Calculate the degrees of freedom
    degrees_of_freedom = len(sample1) + len(sample2) - 2

    # Calculate the critical value (two-tailed test)
    critical_value = stats.t.ppf(1 - alpha / 2, degrees_of_freedom)

    # Compare the t-statistic with the critical value to make the decision
    if abs(t_statistic) > critical_value:
        print("Reject the null hypothesis. There is enough evidence to support the alternative hypothesis.")
    else:
        print("Fail to reject the null hypothesis. There is not enough evidence to support the alternative hypothesis.")

# Example usage:
sample1 = [65, 68, 72, 70, 67]
sample2 = [60, 62, 59, 65, 61]
two_sample_t_test(sample1, sample2, null_hypothesis=0, alpha=0.05)
Q11: Student’s t-Distribution and when to use it:

The Student's t-distribution (often just called the t-distribution) is a probability distribution used in statistical inference when the population standard deviation is unknown and the sample size is small (typically n < 30). It is used primarily for hypothesis testing and constructing confidence intervals when dealing with small sample sizes.

The t-distribution is similar in shape to the standard normal distribution (z-distribution), but it has heavier tails, which accounts for the increased uncertainty associated with small sample sizes.

Use the t-distribution:

When the sample size is small (typically n < 30).
When the population standard deviation is unknown.
For larger sample sizes (typically n ≥ 30) or when the population standard deviation is known, the standard normal distribution (z-distribution) is used.

Q12: t-statistic and its formula:

The t-statistic is a value used in hypothesis testing to quantify how far the sample estimate (e.g., sample mean) deviates from the hypothesized population parameter (e.g., population mean) under the null hypothesis.

For a one-sample t-test, the formula for the t-statistic is:

t = (sample_mean - hypothesized_mean) / (sample_standard_deviation / sqrt(sample_size))

Where:

sample_mean is the mean of the sample data.
hypothesized_mean is the hypothesized value for the population mean under the null hypothesis.
sample_standard_deviation is the standard deviation of the sample data.
sample_size is the number of observations in the sample.
Q13: Estimating the population mean revenue with a 95% confidence interval:

Given data:
Sample mean (x̄) = $500
Sample standard deviation (s) = $50
Sample size (n) = 50
Confidence level = 95% (alpha = 0.05)

python
Copy code
import scipy.stats as stats

# Given data
sample_mean = 500
sample_std_dev = 50
sample_size = 50
confidence_level = 0.95

# Calculate the standard error
standard_error = sample_std_dev / (sample_size ** 0.5)

# Calculate the margin of error (z-critical value for a 95% confidence level is 1.96)
margin_of_error = 1.96 * standard_error

# Calculate the confidence interval
confidence_interval = (sample_mean - margin_of_error, sample_mean + margin_of_error)

# Print the result
print(f"Population mean revenue estimate with a {confidence_level * 100:.0f}% confidence interval: ${confidence_interval[0]:.2f} to ${confidence_interval[1]:.2f}")
The population mean revenue is estimated to be between $484.20 and $515.80 with a 95% confidence interval.

Q14: Hypothesis test for the new drug's effect on blood pressure:

Null Hypothesis (H0): The new drug has no effect on blood pressure (mean decrease in blood pressure is 0).

Alternative Hypothesis (Ha): The new drug decreases blood pressure by 10 mmHg (mean decrease is 10 mmHg).

Given data:
Sample mean decrease in blood pressure (x̄) = 8 mmHg
Sample standard deviation (s) = 3 mmHg
Sample size (n) = 100
Significance level (alpha) = 0.05

python
Copy code
import scipy.stats as stats

# Given data
sample_mean = 8
sample_std_dev = 3
sample_size = 100
null_hypothesis = 0  # The hypothesized mean decrease is 0
alpha = 0.05

# Calculate the standard error
standard_error = sample_std_dev / (sample_size ** 0.5)

# Calculate the t-statistic
t_statistic = (sample_mean - null_hypothesis) / standard_error

# Calculate the degrees of freedom
degrees_of_freedom = sample_size - 1

# Calculate the critical value (two-tailed test)
critical_value = stats.t.ppf(1 - alpha / 2, degrees_of_freedom)

# Compare the t-statistic with the critical value to make the decision
if abs(t_statistic) > critical_value:
    print("Reject the null hypothesis. There is enough evidence to support the alternative hypothesis.")
else:
    print("Fail to reject the null hypothesis. There is not enough evidence to support the alternative hypothesis.")
Q15: Hypothesis test for the product weight:

Null Hypothesis (H0): The true mean weight of the products is 5 pounds.

Alternative Hypothesis (Ha): The true mean weight of the products is less than 5 pounds.

Given data:
Sample mean weight (x̄) = 4.8 pounds
Sample standard deviation (s) = 0.5 pounds
Sample size (n) = 25
Significance level (alpha) = 0.01

python
Copy code
import scipy.stats as stats

# Given data
sample_mean = 4.8
sample_std_dev = 0.5
sample_size = 25
population_mean = 5
alpha = 0.01

# Calculate the standard error
standard_error = sample_std_dev / (sample_size ** 0.5)

# Calculate the t-statistic
t_statistic = (sample_mean - population_mean) / standard_error

# Calculate the degrees of freedom
degrees_of_freedom = sample_size - 1

# Calculate the critical value (one-tailed test)
critical_value = stats.t.ppf(1 - alpha, degrees_of_freedom)

# Compare the t-statistic with the critical value to make the decision
if t_statistic < -critical_value:
    print("Reject the null hypothesis. There is enough evidence to support the alternative hypothesis.")
else:
    print("Fail to reject the null hypothesis. There is not enough evidence to support the alternative hypothesis.")
Q16: Hypothesis test for two groups of students:

Null Hypothesis (H0): The population means for the two groups are equal (μ1 = μ2).

Alternative Hypothesis (Ha): The population means for the two groups are not equal (μ1 ≠ μ2).

Given data:
Group 1 (n1 = 30): Mean score (x̄1) = 80, Standard deviation (s1) = 10
Group 2 (n2 = 40): Mean score (x̄2) = 75, Standard deviation (s2) = 8
Significance level (alpha) = 0.01

python
Copy code
import scipy.stats as stats

# Given data for group 1
sample_mean1 = 80
sample_std_dev1 = 10
sample_size1 = 30

# Given data for group 2
sample_mean2 = 75
sample_std_dev2 = 8
sample_size2 = 40

alpha = 0.01

# Calculate the standard error for both groups
standard_error1 = sample_std_dev1 / (sample_size1 ** 0.5)
standard_error2 = sample_std_dev2 / (sample_size2 ** 0.5)

# Calculate the pooled standard deviation
pooled_std_dev = ((sample_std_dev1**2 + sample_std_dev2**2) / (sample_size1 + sample_size2 - 2)) ** 0.5

# Calculate the t-statistic
t_statistic = (sample_mean1 - sample_mean2) / (pooled_std_dev * ((1 / sample_size1) + (1 / sample_size2)) ** 0.5)

# Calculate the degrees of freedom
degrees_of_freedom = sample_size1 + sample_size2 - 2

# Calculate the critical value (two-tailed test)
critical_value = stats.t.ppf(1 - alpha / 2, degrees_of_freedom)

# Compare the t-statistic with the critical value to make the decision
if abs(t_statistic) > critical_value:
    print("Reject the null hypothesis. There is enough evidence to support the alternative hypothesis.")
else:
    print("Fail to reject the null hypothesis. There is not enough evidence to support the alternative hypothesis.")
Q17: Estimating the average number of ads watched by viewers:

Given data:
Sample mean (x̄) = 4
Sample standard deviation (s) = 1.5
Sample size (n) = 50
Confidence level = 99% (alpha = 0.01)

python
Copy code
import scipy.stats as stats

# Given data
sample_mean = 4
sample_std_dev = 1.5
sample_size = 50
confidence_level = 0.99

# Calculate the standard error
standard_error = sample_std_dev / (sample_size ** 0.5)

# Calculate the margin of error (z-critical value for a 99% confidence level is 2.576)
margin_of_error = 2.576 * standard_error

# Calculate the confidence interval
confidence_interval = (sample_mean - margin_of_error, sample_mean + margin_of_error)

# Print the result
print(f"Estimated average number of ads watched with a {confidence_level * 100:.0f}% confidence interval: {confidence_interval[0]:.2f} to {confidence_interval[1]:.2f}")
The average number of ads watched by viewers is estimated to be between 3.49 and 4.51 with a 99% confidence interval.