# Q1: What is Estimation Statistics? Explain point estimate and interval estimate.
Estimation Statistics involves the process of using sample data to estimate population parameters. There are two types of estimation:

Point Estimate: A single value given as an estimate of an unknown population parameter (e.g., the sample mean 
𝑥
ˉ
x
ˉ
  is a point estimate for the population mean 
𝜇
μ).
Interval Estimate: An estimate that provides a range of values within which the population parameter is likely to fall, with a specified level of confidence (e.g., a confidence interval).

In [None]:
# 2. Q2: Write a Python function to estimate the population mean using a sample mean and standard deviation.

import math

def estimate_population_mean(sample_mean, sample_std_dev, sample_size):
    standard_error = sample_std_dev / math.sqrt(sample_size)
    return sample_mean, standard_error

# Example usage:
sample_mean = 50
sample_std_dev = 10
sample_size = 30
mean_estimate, std_error = estimate_population_mean(sample_mean, sample_std_dev, sample_size)
print(f"Estimated Population Mean: {mean_estimate}")
print(f"Standard Error: {std_error}")


# Q3: What is Hypothesis testing? Why is it used? State the importance of Hypothesis testing.
Hypothesis Testing is a statistical method used to determine whether there is enough evidence to reject a null hypothesis based on sample data. It is used to test claims or assumptions about a population.

Importance: Helps to make decisions based on data and provides a structured approach to testing claims about population parameters.



# Q4: Create a hypothesis that states whether the average weight of male college students is greater than the average weight of female college students.

Null Hypothesis (H₀): The average weight of male college students is equal to the average weight of female college students.

H0 :μ male​ =μ female

Alternative Hypothesis (H₁): The average weight of male college students is greater than the average weight of female college students.

H1 :μ male​ >μ female

In [None]:
# Q5: Write a Python script to conduct a hypothesis test on the difference between two population means, given a sample from each population.

import scipy.stats as stats

def hypothesis_test(mean1, mean2, std1, std2, n1, n2, alpha=0.05):
    # Calculate the t-statistic
    pooled_std = math.sqrt((std1**2 / n1) + (std2**2 / n2))
    t_stat = (mean1 - mean2) / pooled_std
    
    # Degrees of freedom
    df = min(n1 - 1, n2 - 1)
    
    # Calculate the p-value
    p_value = stats.t.sf(abs(t_stat), df)  # one-tailed test
    
    if p_value < alpha:
        return "Reject the null hypothesis"
    else:
        return "Fail to reject the null hypothesis"

# Example usage
mean1, mean2 = 80, 75
std1, std2 = 10, 8
n1, n2 = 30, 40
print(hypothesis_test(mean1, mean2, std1, std2, n1, n2))

# Q6: What is a null and alternative hypothesis? Give some examples.
Null Hypothesis (H₀): A statement that there is no effect or no difference. It is the hypothesis that is tested.
Alternative Hypothesis (H₁): A statement that contradicts the null hypothesis, indicating that there is an effect or a difference.
Example:

Null Hypothesis (H₀): The mean salary of employees in company A is equal to the mean salary of employees in company B.
Alternative Hypothesis (H₁): The mean salary of employees in company A is not equal to the mean salary of employees in company B.


# Q7: Write down the steps involved in hypothesis testing.
State the hypotheses: Define the null and alternative hypotheses.
Select the significance level (α): Typically 0.05 or 0.01.
Choose the appropriate test: t-test, z-test, etc.
Calculate the test statistic: Based on the sample data.
Determine the p-value: This tells you the probability of observing the data if the null hypothesis is true.
Make a decision: If p-value < α, reject the null hypothesis; otherwise, fail to reject it.


# Q8: Define p-value and explain its significance in hypothesis testing.
The p-value is the probability of obtaining a test statistic at least as extreme as the one observed, assuming that the null hypothesis is true. A smaller p-value indicates stronger evidence against the null hypothesis.

Significance: A p-value less than the chosen significance level (e.g., 0.05) leads to rejecting the null hypothesis.

In [None]:
# Q9: Generate a Student's t-distribution plot using Python's matplotlib library, with the degrees of freedom parameter set to 10.

import matplotlib.pyplot as plt
import numpy as np
import scipy.stats as stats

# Generate data
x = np.linspace(-5, 5, 1000)
y = stats.t.pdf(x, df=10)

# Plot the t-distribution
plt.plot(x, y)
plt.title("Student's t-distribution (df=10)")
plt.xlabel("t-value")
plt.ylabel("Density")
plt.grid(True)
plt.show()


In [None]:
# Q10: Write a Python program to calculate the two-sample t-test for independent samples, given two random samples of equal size and a null hypothesis that the population means are equal.

from scipy import stats

def two_sample_t_test(sample1, sample2):
    t_stat, p_value = stats.ttest_ind(sample1, sample2)
    return t_stat, p_value

# Example samples
sample1 = [85, 78, 92, 88, 91]
sample2 = [80, 77, 89, 85, 90]

t_stat, p_value = two_sample_t_test(sample1, sample2)
print(f"t-statistic: {t_stat}, p-value: {p_value}")


# Q11: What is Student’s t distribution? When to use the t-Distribution?
The Student's t-distribution is used when the sample size is small, and the population standard deviation is unknown. It accounts for the extra variability that comes with small samples. The t-distribution is bell-shaped and similar to the normal distribution but with heavier tails.

In [None]:
# Q13: Estimate the population mean revenue with a 95% confidence interval.

import scipy.stats as stats
import math

def confidence_interval(mean, std_dev, sample_size, confidence_level=0.95):
    standard_error = std_dev / math.sqrt(sample_size)
    z_score = stats.norm.ppf(1 - (1 - confidence_level) / 2)
    margin_of_error = z_score * standard_error
    return mean - margin_of_error, mean + margin_of_error

# Example
mean_revenue = 500
std_dev_revenue = 50
sample_size_revenue = 50
lower, upper = confidence_interval(mean_revenue, std_dev_revenue, sample_size_revenue)
print(f"95% Confidence Interval: ({lower}, {upper})")


In [None]:
# Q14: Test the hypothesis with a significance level of 0.05.

import scipy.stats as stats

def test_hypothesis(sample_mean, sample_size, sample_std, hypothesized_mean, alpha):
    standard_error = sample_std / math.sqrt(sample_size)
    t_stat = (sample_mean - hypothesized_mean) / standard_error
    df = sample_size - 1
    p_value = stats.t.sf(abs(t_stat), df)
    
    if p_value < alpha:
        return "Reject the null hypothesis"
    else:
        return "Fail to reject the null hypothesis"

# Example
sample_mean = 8
sample_size = 100
sample_std = 3
hypothesized_mean = 10
alpha = 0.05
print(test_hypothesis(sample_mean, sample_size, sample_std, hypothesized_mean, alpha))
