# Assignment 4: Hypothesis Testing

This notebook covers two hypothesis testing problems:
1. **One-Sample Z-Test** - Testing weekly operating costs
2. **Chi-Square Test** - Testing association between device type and satisfaction

**Topics Covered:**
- Null and Alternative Hypotheses
- Z-Test for means
- Chi-Square Test for independence
- P-values and Critical Values

---
# Problem 1: Weekly Operating Cost Hypothesis Test

## Background
Bombay Hospitality Ltd. operates a franchise model. The theoretical weekly operating cost is:
$$W = \$1000 + \$5X$$
where X = number of units produced.

## Given Data
- Theoretical mean units per week (X) = 600
- Sample size (n) = 25 restaurants
- Sample mean weekly cost = Rs. 3,050
- Population standard deviation = Rs. 125 (assumed)

## Question
Are the weekly operating costs higher than the theoretical model suggests?

In [3]:
# Import libraries
import numpy as np
from scipy import stats

## Step 1: State the Hypotheses

**Null Hypothesis (H0):** The weekly operating costs are equal to the theoretical model.
$$H_0: \mu = 4000$$

**Alternative Hypothesis (H1):** The weekly operating costs are higher than the theoretical model.
$$H_1: \mu > 4000$$

This is a **one-tailed (right-tailed)** test.

In [4]:
# Step 1: Define the given values

# Theoretical weekly cost calculation
# W = 1000 + 5X, where X = 600 units
X = 600  # units per week
theoretical_cost = 1000 + (5 * X)

print("=== Given Information ===")
print("Theoretical mean units per week (X):", X)
print("Theoretical weekly cost (W = 1000 + 5*600):", theoretical_cost)

# Sample data
sample_size = 25
sample_mean = 3050
population_std = 125  # Given standard deviation

print("\nSample size (n):", sample_size)
print("Sample mean:", sample_mean)
print("Population std:", population_std)

=== Given Information ===
Theoretical mean units per week (X): 600
Theoretical weekly cost (W = 1000 + 5*600): 4000

Sample size (n): 25
Sample mean: 3050
Population std: 125


In [5]:
# State hypotheses
print("=== Hypotheses ===")
print("")
print("Null Hypothesis (H0): The mean weekly cost equals theoretical cost")
print("H0: mu =", theoretical_cost)
print("")
print("Alternative Hypothesis (H1): The mean weekly cost is HIGHER than theoretical")
print("H1: mu >", theoretical_cost)
print("")
print("This is a RIGHT-TAILED test")

=== Hypotheses ===

Null Hypothesis (H0): The mean weekly cost equals theoretical cost
H0: mu = 4000

Alternative Hypothesis (H1): The mean weekly cost is HIGHER than theoretical
H1: mu > 4000

This is a RIGHT-TAILED test


## Step 2: Calculate the Test Statistic (Z-score)

**Formula:**
$$Z = \frac{\bar{X} - \mu}{\sigma / \sqrt{n}}$$

Where:
- $\bar{X}$ = sample mean
- $\mu$ = population mean (theoretical)
- $\sigma$ = population standard deviation
- $n$ = sample size

In [6]:
# Step 2: Calculate Z-statistic

# Calculate standard error
standard_error = population_std / (sample_size ** 0.5)
print("Standard Error = sigma / sqrt(n)")
print("Standard Error =", population_std, "/", "sqrt(", sample_size, ")")
print("Standard Error =", standard_error)

# Calculate Z-statistic
z_statistic = (sample_mean - theoretical_cost) / standard_error
print("")
print("Z-statistic = (sample_mean - theoretical_cost) / standard_error")
print("Z-statistic = (", sample_mean, "-", theoretical_cost, ") /", standard_error)
print("Z-statistic =", round(z_statistic, 4))

Standard Error = sigma / sqrt(n)
Standard Error = 125 / sqrt( 25 )
Standard Error = 25.0

Z-statistic = (sample_mean - theoretical_cost) / standard_error
Z-statistic = ( 3050 - 4000 ) / 25.0
Z-statistic = -38.0


## Step 3: Determine Critical Value and P-value

In [7]:
# Step 3: Get critical value and p-value

# Significance level
alpha = 0.05
print("Significance level (alpha):", alpha)

# Critical value for right-tailed test at alpha = 0.05
z_critical = stats.norm.ppf(1 - alpha)
print("Critical value (z-critical):", round(z_critical, 4))

# Calculate p-value (right-tailed)
p_value = 1 - stats.norm.cdf(z_statistic)
print("P-value:", round(p_value, 6))

Significance level (alpha): 0.05
Critical value (z-critical): 1.6449
P-value: 1.0


## Step 4: Make Decision

In [8]:
# Step 4: Make decision
print("=== Decision ===")
print("")
print("Z-statistic:", round(z_statistic, 4))
print("Critical value:", round(z_critical, 4))
print("P-value:", round(p_value, 6))
print("Alpha:", alpha)
print("")

# Decision based on critical value
print("Method 1: Comparing Z-statistic with Critical Value")
if z_statistic > z_critical:
    print("Since Z-statistic (", round(z_statistic, 4), ") > Critical value (", round(z_critical, 4), ")")
    print("Decision: REJECT the null hypothesis")
else:
    print("Since Z-statistic (", round(z_statistic, 4), ") <= Critical value (", round(z_critical, 4), ")")
    print("Decision: FAIL TO REJECT the null hypothesis")

print("")

# Decision based on p-value
print("Method 2: Comparing P-value with Alpha")
if p_value < alpha:
    print("Since P-value (", round(p_value, 6), ") < Alpha (", alpha, ")")
    print("Decision: REJECT the null hypothesis")
else:
    print("Since P-value (", round(p_value, 6), ") >= Alpha (", alpha, ")")
    print("Decision: FAIL TO REJECT the null hypothesis")

=== Decision ===

Z-statistic: -38.0
Critical value: 1.6449
P-value: 1.0
Alpha: 0.05

Method 1: Comparing Z-statistic with Critical Value
Since Z-statistic ( -38.0 ) <= Critical value ( 1.6449 )
Decision: FAIL TO REJECT the null hypothesis

Method 2: Comparing P-value with Alpha
Since P-value ( 1.0 ) >= Alpha ( 0.05 )
Decision: FAIL TO REJECT the null hypothesis


In [9]:
# Conclusion
print("=== CONCLUSION ===")
print("")
if z_statistic > z_critical:
    print("There is STRONG EVIDENCE that the weekly operating costs")
    print("are higher than the theoretical model suggests.")
    print("")
    print("The restaurant owners' claim is SUPPORTED by the data.")
else:
    print("There is NOT ENOUGH EVIDENCE to conclude that weekly")
    print("operating costs are higher than the theoretical model.")
    print("")
    print("The restaurant owners' claim is NOT SUPPORTED by the data.")

=== CONCLUSION ===

There is NOT ENOUGH EVIDENCE to conclude that weekly
operating costs are higher than the theoretical model.

The restaurant owners' claim is NOT SUPPORTED by the data.


---
# Problem 2: Chi-Square Test for Independence

## Background
Mizzare Corporation wants to know if there's an association between:
- Type of smart home device (Smart Thermostat vs Smart Light)
- Customer satisfaction level

## Data (Contingency Table)
| Satisfaction | Smart Thermostat | Smart Light |
|--------------|-----------------|-------------|
| Very Satisfied | 50 | 70 |
| Satisfied | 80 | 100 |
| Neutral | 60 | 90 |
| Unsatisfied | 30 | 50 |
| Very Unsatisfied | 20 | 50 |

In [10]:
# Import pandas for contingency table
import pandas as pd

## Step 1: State the Hypotheses

**Null Hypothesis (H0):** There is NO association between device type and satisfaction level.
(They are independent)

**Alternative Hypothesis (H1):** There IS an association between device type and satisfaction level.
(They are not independent)

In [11]:
# Step 1: State hypotheses
print("=== Hypotheses ===")
print("")
print("Null Hypothesis (H0):")
print("Device type and customer satisfaction are INDEPENDENT")
print("(No association between them)")
print("")
print("Alternative Hypothesis (H1):")
print("Device type and customer satisfaction are NOT INDEPENDENT")
print("(There IS an association between them)")

=== Hypotheses ===

Null Hypothesis (H0):
Device type and customer satisfaction are INDEPENDENT
(No association between them)

Alternative Hypothesis (H1):
Device type and customer satisfaction are NOT INDEPENDENT
(There IS an association between them)


In [12]:
# Create the contingency table (observed frequencies)
observed = [
    [50, 70],    # Very Satisfied
    [80, 100],   # Satisfied
    [60, 90],    # Neutral
    [30, 50],    # Unsatisfied
    [20, 50]     # Very Unsatisfied
]

# Convert to numpy array
observed = np.array(observed)

# Display the table
satisfaction_levels = ['Very Satisfied', 'Satisfied', 'Neutral', 'Unsatisfied', 'Very Unsatisfied']
devices = ['Smart Thermostat', 'Smart Light']

print("=== Observed Frequencies (Contingency Table) ===")
print("")
print("Satisfaction Level    | Smart Thermostat | Smart Light")
print("-" * 60)
for i in range(len(satisfaction_levels)):
    print(satisfaction_levels[i].ljust(20), "|", str(observed[i][0]).center(16), "|", str(observed[i][1]).center(11))

=== Observed Frequencies (Contingency Table) ===

Satisfaction Level    | Smart Thermostat | Smart Light
------------------------------------------------------------
Very Satisfied       |        50        |      70    
Satisfied            |        80        |     100    
Neutral              |        60        |      90    
Unsatisfied          |        30        |      50    
Very Unsatisfied     |        20        |      50    


## Step 2: Compute the Chi-Square Statistic

**Formula:**
$$\chi^2 = \sum \frac{(O - E)^2}{E}$$

Where:
- O = Observed frequency
- E = Expected frequency

**Expected frequency formula:**
$$E = \frac{Row\ Total \times Column\ Total}{Grand\ Total}$$

In [13]:
# Step 2: Calculate expected frequencies and Chi-Square manually

# Calculate row totals
row_totals = []
for row in observed:
    row_total = row[0] + row[1]
    row_totals.append(row_total)
print("Row totals:", row_totals)

# Calculate column totals
col_totals = [0, 0]
for row in observed:
    col_totals[0] = col_totals[0] + row[0]
    col_totals[1] = col_totals[1] + row[1]
print("Column totals:", col_totals)

# Calculate grand total
grand_total = col_totals[0] + col_totals[1]
print("Grand total:", grand_total)

Row totals: [np.int64(120), np.int64(180), np.int64(150), np.int64(80), np.int64(70)]
Column totals: [np.int64(240), np.int64(360)]
Grand total: 600


In [14]:
# Calculate expected frequencies
print("=== Expected Frequencies ===")
print("Formula: Expected = (Row Total * Column Total) / Grand Total")
print("")

expected = []
for i in range(len(observed)):
    row_expected = []
    for j in range(2):
        exp_value = (row_totals[i] * col_totals[j]) / grand_total
        row_expected.append(exp_value)
    expected.append(row_expected)

# Display expected frequencies
print("Satisfaction Level    | Smart Thermostat | Smart Light")
print("-" * 60)
for i in range(len(satisfaction_levels)):
    print(satisfaction_levels[i].ljust(20), "|", str(round(expected[i][0], 2)).center(16), "|", str(round(expected[i][1], 2)).center(11))

=== Expected Frequencies ===
Formula: Expected = (Row Total * Column Total) / Grand Total

Satisfaction Level    | Smart Thermostat | Smart Light
------------------------------------------------------------
Very Satisfied       |       48.0       |     72.0   
Satisfied            |       72.0       |    108.0   
Neutral              |       60.0       |     90.0   
Unsatisfied          |       32.0       |     48.0   
Very Unsatisfied     |       28.0       |     42.0   


In [15]:
# Calculate Chi-Square statistic
print("=== Calculating Chi-Square Statistic ===")
print("Formula: Chi-Square = Sum of (O - E)^2 / E")
print("")

chi_square = 0
for i in range(len(observed)):
    for j in range(2):
        O = observed[i][j]
        E = expected[i][j]
        contribution = ((O - E) ** 2) / E
        chi_square = chi_square + contribution
        print("Cell [", i, ",", j, "]: O=", O, ", E=", round(E, 2), ", Contribution=", round(contribution, 4))

print("")
print("Chi-Square Statistic:", round(chi_square, 4))

=== Calculating Chi-Square Statistic ===
Formula: Chi-Square = Sum of (O - E)^2 / E

Cell [ 0 , 0 ]: O= 50 , E= 48.0 , Contribution= 0.0833
Cell [ 0 , 1 ]: O= 70 , E= 72.0 , Contribution= 0.0556
Cell [ 1 , 0 ]: O= 80 , E= 72.0 , Contribution= 0.8889
Cell [ 1 , 1 ]: O= 100 , E= 108.0 , Contribution= 0.5926
Cell [ 2 , 0 ]: O= 60 , E= 60.0 , Contribution= 0.0
Cell [ 2 , 1 ]: O= 90 , E= 90.0 , Contribution= 0.0
Cell [ 3 , 0 ]: O= 30 , E= 32.0 , Contribution= 0.125
Cell [ 3 , 1 ]: O= 50 , E= 48.0 , Contribution= 0.0833
Cell [ 4 , 0 ]: O= 20 , E= 28.0 , Contribution= 2.2857
Cell [ 4 , 1 ]: O= 50 , E= 42.0 , Contribution= 1.5238

Chi-Square Statistic: 5.6382


## Step 3: Determine the Critical Value

**Degrees of Freedom:**
$$df = (rows - 1) \times (columns - 1)$$

In [16]:
# Step 3: Calculate degrees of freedom and critical value

# Degrees of freedom
rows = 5  # number of satisfaction levels
cols = 2  # number of device types
degrees_of_freedom = (rows - 1) * (cols - 1)

print("Degrees of Freedom = (rows - 1) * (columns - 1)")
print("Degrees of Freedom = (", rows, "- 1) * (", cols, "- 1)")
print("Degrees of Freedom =", degrees_of_freedom)

# Significance level
alpha = 0.05
print("")
print("Significance level (alpha):", alpha)

# Critical value from chi-square distribution
chi_critical = stats.chi2.ppf(1 - alpha, degrees_of_freedom)
print("Critical value:", round(chi_critical, 4))

Degrees of Freedom = (rows - 1) * (columns - 1)
Degrees of Freedom = ( 5 - 1) * ( 2 - 1)
Degrees of Freedom = 4

Significance level (alpha): 0.05
Critical value: 9.4877


In [17]:
# Calculate p-value
p_value = 1 - stats.chi2.cdf(chi_square, degrees_of_freedom)
print("P-value:", round(p_value, 6))

P-value: 0.227844


## Step 4: Make Decision

In [18]:
# Step 4: Make decision
print("=== Decision ===")
print("")
print("Chi-Square Statistic:", round(chi_square, 4))
print("Critical Value:", round(chi_critical, 4))
print("P-value:", round(p_value, 6))
print("Alpha:", alpha)
print("")

# Decision
print("Comparing Chi-Square with Critical Value:")
if chi_square > chi_critical:
    print("Since Chi-Square (", round(chi_square, 4), ") > Critical value (", round(chi_critical, 4), ")")
    print("Decision: REJECT the null hypothesis")
else:
    print("Since Chi-Square (", round(chi_square, 4), ") <= Critical value (", round(chi_critical, 4), ")")
    print("Decision: FAIL TO REJECT the null hypothesis")

print("")
print("Comparing P-value with Alpha:")
if p_value < alpha:
    print("Since P-value (", round(p_value, 6), ") < Alpha (", alpha, ")")
    print("Decision: REJECT the null hypothesis")
else:
    print("Since P-value (", round(p_value, 6), ") >= Alpha (", alpha, ")")
    print("Decision: FAIL TO REJECT the null hypothesis")

=== Decision ===

Chi-Square Statistic: 5.6382
Critical Value: 9.4877
P-value: 0.227844
Alpha: 0.05

Comparing Chi-Square with Critical Value:
Since Chi-Square ( 5.6382 ) <= Critical value ( 9.4877 )
Decision: FAIL TO REJECT the null hypothesis

Comparing P-value with Alpha:
Since P-value ( 0.227844 ) >= Alpha ( 0.05 )
Decision: FAIL TO REJECT the null hypothesis


In [19]:
# Conclusion
print("=== CONCLUSION ===")
print("")
if chi_square > chi_critical:
    print("There IS a SIGNIFICANT ASSOCIATION between device type")
    print("and customer satisfaction level.")
    print("")
    print("The type of device purchased (Smart Thermostat vs Smart Light)")
    print("is related to the customer's satisfaction level.")
else:
    print("There is NO SIGNIFICANT ASSOCIATION between device type")
    print("and customer satisfaction level.")
    print("")
    print("The type of device purchased does NOT appear to be")
    print("related to the customer's satisfaction level.")

=== CONCLUSION ===

There is NO SIGNIFICANT ASSOCIATION between device type
and customer satisfaction level.

The type of device purchased does NOT appear to be
related to the customer's satisfaction level.


In [20]:
# Verify with scipy's chi2_contingency function
print("=== Verification using scipy ===")
from scipy.stats import chi2_contingency

chi2, p, dof, expected_scipy = chi2_contingency(observed)
print("Chi-Square (scipy):", round(chi2, 4))
print("P-value (scipy):", round(p, 6))
print("Degrees of freedom:", dof)

=== Verification using scipy ===
Chi-Square (scipy): 5.6382
P-value (scipy): 0.227844
Degrees of freedom: 4


---
## Summary

### Problem 1: Z-Test
- Used for testing claims about population mean
- Requires known population standard deviation
- Compare Z-statistic with critical value or P-value with alpha

### Problem 2: Chi-Square Test
- Used for testing association between categorical variables
- Calculate expected frequencies from row and column totals
- Compare Chi-Square statistic with critical value

### Key Concepts:
1. **P-value < alpha** = Reject null hypothesis (significant result)
2. **P-value >= alpha** = Fail to reject null hypothesis
3. **Degrees of freedom** affects the critical value