# DA Session 10 –  Hypothesis Testing Assignment

This Jupyter Notebook contains **complete, self-contained problem statements** and
interpretable one-sample t-tests. 

Every test:
- clearly states the real-world problem
- defines all statistical inputs in code
- explains what rejecting or failing to reject H₀ actually means
- contributes automatically to a final summary table


In [1]:
import math
import pandas as pd
from scipy import stats

# List to collect results for the final summary table
summary_results = []

def one_sample_t_test(sample_mean, population_mean, std_dev, n, alpha, tail, question_title, context):
    """
    Performs a one-sample t-test and prints a full statistical interpretation.
    """
    df = n - 1

    # Calculate t statistic using the one-sample t-test formula
    t_calculated = (sample_mean - population_mean) / (std_dev / math.sqrt(n))

    # Obtain t-critical value depending on test type
    if tail == 'right':
        t_critical = stats.t.ppf(1 - alpha, df)
    elif tail == 'left':
        t_critical = stats.t.ppf(alpha, df)
    else:  # two-tailed test
        t_critical = stats.t.ppf(1 - alpha/2, df)

    print("====================================================")
    print(question_title)
    print(context)
    print("====================================================")
    print(f"Sample Mean (x̄)               = {sample_mean}")
    print(f"Hypothesized Population Mean μ = {population_mean}")
    print(f"Sample Standard Deviation (s) = {std_dev}")
    print(f"Sample Size (n)               = {n}")
    print(f"Degrees of Freedom (df)       = {df}")
    print(f"Significance Level (α)        = {alpha}")
    print(f"Calculated t-value            = {round(t_calculated, 4)}")
    print(f"Critical t-value (table)      = {round(t_critical, 4)}")

    if abs(t_calculated) > abs(t_critical):
        decision = "Reject H₀"
        interpretation = (
            "The sample evidence is statistically significant. "
            "The observed difference is unlikely to be caused by random sampling variation alone."
        )
    else:
        decision = "Fail to Reject H₀"
        interpretation = (
            "The sample evidence is not statistically significant. "
            "The observed difference can reasonably be attributed to random sampling variation."
        )

    print(f"Decision      : {decision}")
    print("Interpretation:")
    print(interpretation)
    print("====================================================")

    # Save results for final summary table
    summary_results.append({
        "Question": question_title,
        "Sample Mean": sample_mean,
        "Population Mean": population_mean,
        "t Calculated": round(t_calculated, 4),
        "t Critical": round(t_critical, 4),
        "α": alpha,
        "Decision": decision
    })


## Question 1: Customer Satisfaction After New Feedback Process

**Complete Problem Statement:**  
A company has historically recorded an average customer satisfaction score of **72**.
After introducing a new customer feedback process, management wants to evaluate whether
this process has led to an improvement in satisfaction levels. A random sample of **30 customers**
was surveyed after implementation, resulting in an average satisfaction score of **78**
with a sample standard deviation of **10**. At the **5% significance level**, test whether
the new feedback process has **significantly improved** customer satisfaction.


In [2]:
one_sample_t_test(
    78, 72, 10, 30, 0.05, 'right',
    "Q1: Customer Satisfaction Improvement",
    "Testing whether the new feedback process has increased the mean satisfaction score"
)

Q1: Customer Satisfaction Improvement
Testing whether the new feedback process has increased the mean satisfaction score
Sample Mean (x̄)               = 78
Hypothesized Population Mean μ = 72
Sample Standard Deviation (s) = 10
Sample Size (n)               = 30
Degrees of Freedom (df)       = 29
Significance Level (α)        = 0.05
Calculated t-value            = 3.2863
Critical t-value (table)      = 1.6991
Decision      : Reject H₀
Interpretation:
The sample evidence is statistically significant. The observed difference is unlikely to be caused by random sampling variation alone.


## Question 2: School Performance Compared to National Average

**Complete Problem Statement:**  
A school claims that its students perform better than the national average mathematics score
of **75**. To verify this claim, a random sample of **50 students** was selected, producing
an average score of **77** with a standard deviation of **8**. Using a **1% significance level**,
test whether the data provides sufficient evidence to conclude that the school’s average
math score is **higher than the national average**.


In [3]:
one_sample_t_test(
    77, 75, 8, 50, 0.01, 'right',
    "Q2: School Math Score Claim",
    "Testing whether the school's mean math score exceeds the national average"
)

Q2: School Math Score Claim
Testing whether the school's mean math score exceeds the national average
Sample Mean (x̄)               = 77
Hypothesized Population Mean μ = 75
Sample Standard Deviation (s) = 8
Sample Size (n)               = 50
Degrees of Freedom (df)       = 49
Significance Level (α)        = 0.01
Calculated t-value            = 1.7678
Critical t-value (table)      = 2.4049
Decision      : Fail to Reject H₀
Interpretation:
The sample evidence is not statistically significant. The observed difference can reasonably be attributed to random sampling variation.


## Question 3: Effectiveness of a New Retail Sales Strategy

**Complete Problem Statement:**  
A retail store reports that its historical average daily sales revenue is **₹10,000**.
After implementing a new sales strategy, management wants to assess whether the strategy
has increased daily sales. Sales data collected over **15 consecutive days** shows an
average daily revenue of **₹11,200** with a standard deviation of **₹1,500**. At the
**5% significance level**, test whether the new sales strategy has resulted in a
**statistically significant increase** in average daily sales.


In [4]:
one_sample_t_test(
    11200, 10000, 1500, 15, 0.05, 'right',
    "Q3: Retail Sales Strategy",
    "Testing whether the new strategy increases average daily sales"
)

Q3: Retail Sales Strategy
Testing whether the new strategy increases average daily sales
Sample Mean (x̄)               = 11200
Hypothesized Population Mean μ = 10000
Sample Standard Deviation (s) = 1500
Sample Size (n)               = 15
Degrees of Freedom (df)       = 14
Significance Level (α)        = 0.05
Calculated t-value            = 3.0984
Critical t-value (table)      = 1.7613
Decision      : Reject H₀
Interpretation:
The sample evidence is statistically significant. The observed difference is unlikely to be caused by random sampling variation alone.


## Question 4: Accuracy of Advertised Light Bulb Lifespan

**Complete Problem Statement:**  
A light bulb manufacturer advertises that its bulbs have an average lifespan of
**1200 hours**. To verify the accuracy of this claim, a random sample of **40 bulbs** was
tested, yielding an average lifespan of **1180 hours** with a standard deviation of
**50 hours**. Using a **5% significance level**, test whether there is sufficient evidence
to conclude that the actual average lifespan **differs from the advertised value**.


In [5]:
one_sample_t_test(
    1180, 1200, 50, 40, 0.05, 'two',
    "Q4: Light Bulb Lifespan Accuracy",
    "Testing whether the true mean lifespan differs from the advertised claim"
)

Q4: Light Bulb Lifespan Accuracy
Testing whether the true mean lifespan differs from the advertised claim
Sample Mean (x̄)               = 1180
Hypothesized Population Mean μ = 1200
Sample Standard Deviation (s) = 50
Sample Size (n)               = 40
Degrees of Freedom (df)       = 39
Significance Level (α)        = 0.05
Calculated t-value            = -2.5298
Critical t-value (table)      = 2.0227
Decision      : Reject H₀
Interpretation:
The sample evidence is statistically significant. The observed difference is unlikely to be caused by random sampling variation alone.


## Question 5: Average Height Compared to National Standard

**Complete Problem Statement:**  
A researcher is studying whether the average height of adult males in a particular city
differs from the national average height of **5.8 feet**. A random sample of **25 adult males**
from the city shows an average height of **5.7 feet** with a standard deviation of **0.3 feet**.
At the **1% significance level**, test whether the average height in the city is
**significantly different** from the national average.


In [6]:
one_sample_t_test(
    5.7, 5.8, 0.3, 25, 0.01, 'two',
    "Q5: Average Height Comparison",
    "Testing whether the city's average height differs from the national average"
)

Q5: Average Height Comparison
Testing whether the city's average height differs from the national average
Sample Mean (x̄)               = 5.7
Hypothesized Population Mean μ = 5.8
Sample Standard Deviation (s) = 0.3
Sample Size (n)               = 25
Degrees of Freedom (df)       = 24
Significance Level (α)        = 0.01
Calculated t-value            = -1.6667
Critical t-value (table)      = 2.7969
Decision      : Fail to Reject H₀
Interpretation:
The sample evidence is not statistically significant. The observed difference can reasonably be attributed to random sampling variation.


## Final Automatic Summary Table

In [7]:
summary_df = pd.DataFrame(summary_results)
summary_df

Unnamed: 0,Question,Sample Mean,Population Mean,t Calculated,t Critical,α,Decision
0,Q1: Customer Satisfaction Improvement,78.0,72.0,3.2863,1.6991,0.05,Reject H₀
1,Q2: School Math Score Claim,77.0,75.0,1.7678,2.4049,0.01,Fail to Reject H₀
2,Q3: Retail Sales Strategy,11200.0,10000.0,3.0984,1.7613,0.05,Reject H₀
3,Q4: Light Bulb Lifespan Accuracy,1180.0,1200.0,-2.5298,2.0227,0.05,Reject H₀
4,Q5: Average Height Comparison,5.7,5.8,-1.6667,2.7969,0.01,Fail to Reject H₀
