<a href="https://colab.research.google.com/github/indranil046/4-febasian/blob/main/statistics_adv3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
1:Estimation Statistics is a branch of inferential statistics that involves
using data from a sample to estimate characteristics (parameters) of a
population. The goal is to make informed guesses about population parameters
based on sample data, rather than attempting to measure the entire population
directly.

There are two main types of estimation in statistics:

1. Point Estimate
A point estimate is a single value that serves as an estimate of a population
parameter. It is calculated from sample data and provides the best guess at the
value of the unknown parameter.

Example: The sample mean (𝑥̄) is a point estimate of the population mean (μ).

Use: It's straightforward and easy to compute, but it doesn’t provide
information about how accurate or reliable the estimate is.

2. Interval Estimate
An interval estimate gives a range of values (an interval) that is likely to
contain the population parameter, along with a confidence level (usually 90%, 95%, or 99%).

Example: A 95% confidence interval for the population mean might be [48.5, 51.5].

Use: It provides not only an estimate but also a measure of uncertainty or
reliability about the estimate.

2:import scipy.stats as stats
import math

def estimate_population_mean(sample_mean, sample_std_dev, sample_size,
                             confidence_level=0.95):
    """
    Estimate the population mean with a confidence interval.

    Parameters:
    - sample_mean (float): Mean of the sample
    - sample_std_dev (float): Standard deviation of the sample
    - sample_size (int): Number of observations in the sample
    - confidence_level (float): Confidence level for the interval (default is 0.95)

    Returns:
    - point_estimate (float): The sample mean (point estimate of population mean)
    - confidence_interval (tuple): Lower and upper bounds of the confidence interval
    """

    # Standard error
    std_error = sample_std_dev / math.sqrt(sample_size)

    # t-score for the given confidence level
    t_score = stats.t.ppf((1 + confidence_level) / 2, df=sample_size - 1)

    # Margin of error
    margin_of_error = t_score * std_error

    # Confidence interval
    lower_bound = sample_mean - margin_of_error
    upper_bound = sample_mean + margin_of_error

    return sample_mean, (lower_bound, upper_bound)

3:Hypothesis testing is a statistical method used to make decisions about a
population parameter based on sample data. It involves formulating two competing
 statements (hypotheses) and determining which one is more likely to be true
based on statistical evidence.

4:To determine whether the average weight of male college students is greater
than that of female college students, we can formulate the hypotheses as follows:
h0=The average weight of male college students is less than or equal to the
average weight of female college students.
h1=The average weight of male college students is greater than the average
weight of female college students.

This is a one-tailed (right-tailed) independent two-sample t-test, assuming:

The samples are independent.

The weights are normally distributed.

Variances may be equal or unequal (Welch’s t-test can be used if unequal).

5:import numpy as np
from scipy import stats

def hypothesis_test_two_means(sample1, sample2, alpha=0.05, equal_var=False):
    """
    Conduct a two-sample t-test to compare the means of two populations.

    Parameters:
    - sample1 (array-like): Sample from population 1
    - sample2 (array-like): Sample from population 2
    - alpha (float): Significance level (default = 0.05)
    - equal_var (bool): Assume equal variance? (default = False, Welch's t-test)

    Returns:
    - t_stat (float): Calculated t-statistic
    - p_value (float): Corresponding p-value
    - decision (str): Whether to reject or fail to reject the null hypothesis
    """

    # Perform two-sample t-test
    t_stat, p_value = stats.ttest_ind(sample1, sample2, equal_var=equal_var)

    # Interpret result
    if p_value < alpha:
        decision = "Reject the null hypothesis (significant difference)"
    else:
        decision = "Fail to reject the null hypothesis (no significant difference)"

    return t_stat, p_value, decision

6:In hypothesis testing, we use two competing statements about a population to
make statistical decisions:

The null hypothesis is the default or starting assumption. It typically states
that there is no effect, no difference, or no relationship between variables.

It is assumed to be true unless evidence suggests otherwise.

We aim to test this assumption using sample data.

    2. Alternative Hypothesis (H₁ or Ha):
The alternative hypothesis is what you want to prove. It states that there is an
 effect, a difference, or a relationship.

If there is strong evidence against the null hypothesis, we reject it in favor
of the alternative.

7:Step	Description
1	Formulate H₀ and H₁
2	Choose significance level (α)
3	Choose the right statistical test
4	Calculate the test statistic
5	Compute p-value or critical value
6	Compare and make a decision
7	Draw a conclusion

8:The p-value (probability value) is the probability of obtaining test results
 at least as extreme as the observed results, assuming that the null hypothesis
  (H₀) is true.

It quantifies the evidence against the null hypothesis:

A small p-value indicates strong evidence against H₀.

A large p-value suggests weak evidence against H₀.

9:import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import t

# Degrees of freedom
df = 10

# x values for the plot
x = np.linspace(-4, 4, 500)

# Compute t-distribution PDF values
y = t.pdf(x, df)

# Plot
plt.figure(figsize=(8, 5))
plt.plot(x, y, label=f"t-distribution (df={df})", color='blue')
plt.title("Student's t-Distribution (df = 10)")
plt.xlabel("t-value")
plt.ylabel("Probability Density")
plt.grid(True)
plt.legend()
plt.show()


10:import numpy as np
from scipy import stats

def two_sample_t_test(sample1, sample2, alpha=0.05):
    """
    Perform a two-sample t-test for independent samples of equal size.

    Parameters:
    - sample1, sample2: Lists or arrays of sample data
    - alpha: Significance level (default = 0.05)

    Returns:
    - t_statistic: Calculated t-value
    - p_value: Associated p-value
    - decision: Hypothesis test result
    """

    # Check if samples have equal size
    if len(sample1) != len(sample2):
        raise ValueError("Samples must have equal size.")

    # Perform two-sample t-test (assume unequal variances unless known)
    t_statistic, p_value = stats.ttest_ind(sample1, sample2, equal_var=True)

    # Decision
    if p_value < alpha:
        decision = "Reject the null hypothesis (means are significantly different)"
    else:
        decision = "Fail to reject the null hypothesis (no significant difference)"

    return t_statistic, p_value, decision

# Example usage
sample1 = [12, 14, 15, 13, 16, 15, 14]
sample2 = [10, 11, 13, 12, 10, 11, 12]

t_stat, p_val, result = two_sample_t_test(sample1, sample2)

print(f"T-statistic: {t_stat:.4f}")
print(f"P-value: {p_val:.4f}")
print(f"Conclusion: {result}")


11:Student’s t-distribution (commonly known as the t-distribution) is a type of
probability distribution that is used when estimating population parameters
 (such as the mean) from small sample sizes. It was first introduced by William
 Sealy Gosset under the pseudonym "Student".

The t-distribution is similar to the standard normal distribution but has heavier
 tails, which means it accounts for the greater variability observed in small
 samples.

12:The t-statistic is a measure used in hypothesis testing to determine if the
sample mean is significantly different from the population mean, or to compare
the means of two independent samples. It tells you how many standard errors the
sample mean is away from the population mean.