#Hypothesis Testing



#Background:
Bombay hospitality Ltd. operates a franchise model for producing exotic Norwegian dinners throughout New England. The operating cost for a franchise in a week (W) is given by the equation W = $1,000 + $5X, where X represents the number of units produced in a week. Recent feedback from restaurant owners suggests that this cost model may no longer be accurate, as their observed weekly operating costs are higher.


#Objective:
To investigate the restaurant owners' claim about the increase in weekly operating costs using hypothesis testing.

Given:-

-The theoretical weekly operating cost of the model: W = $1,000 + $5X

-Sample of 25 restaurants with a mean weekly cost of Rs. 3,050

-Number of units produced in a week (X) follows a normal distribution with a    mean (μ) of 600 units and a standard deviation (σ) of 25 units


#Assignment Tasks:

1.State the hypothesis
   - Null Hypothesis (H0): The mean weekly operating cost is equal to or less than the theoretical mean weekly cost, i.e., μ ≤ 4000.

   
   - Alternative Hypothesis (H1): The mean weekly operating cost is greater than the theoretical mean weekly cost, i.e., μ > 4000.

2.Calculate the test statistic

-sample mean weekly cost (x̄)= 3050

-theoritical mean weekly cost(μ)=4000(1000+5x where x=600)

-standard deviation of units produced(σ)

-cost per unit=5

-standard deviation of cost(σ_cost)= 5*25=125

-sample size(n)=25


#Test statistic (Z) = (x̄ - μ) / (σ_cost / √n)
                    = (3050-4000)/(125/√25)
                    = -38

In [9]:
 (3050-4000)/(125/5)

-38.0

In [10]:
# Import the necessary library for statistical distributions.
from scipy.stats import norm



# Define the significance level (alpha) for the hypothesis test.
alpha = 0.05



# Since the alternative hypothesis is H1: μ > 4000, this is a one-tailed test
# for a right-tailed test, we find the z-score corresponding to 1 - alpha
critical_value = norm.ppf(1 - alpha)



# Print the calculated critical value.
print(f"The critical value for a one-tailed test with alpha = {alpha} is: {critical_value:.4f}")

The critical value for a one-tailed test with alpha = 0.05 is: 1.6449


In [11]:
# Define the calculated test statistic.
# This value was calculated based on the sample mean, theoretical mean,
# standard deviation of the cost, and sample size as described in the markdown comments
test_statistic = -38



# Compare the test statistic with the critical value to make a decision
# about the null hypothesis.
if test_statistic > critical_value:
  print("The test statistic is greater than the critical value.")
  print("Reject the null hypothesis (H0).")
  print("There is sufficient evidence to conclude that the mean weekly operating cost is greater than the theoretical mean weekly cost.")
else:
  print("The test statistic is less than or equal to the critical value.")
  print("Fail to reject the null hypothesis (H0).")
  print("There is not enough evidence to conclude that the mean weekly operating cost is greater than the theoretical mean weekly cost.")

The test statistic is less than or equal to the critical value.
Fail to reject the null hypothesis (H0).
There is not enough evidence to conclude that the mean weekly operating cost is greater than the theoretical mean weekly cost.



#Conclusions:

- The test statistic was calculated as -38.

- The critical value for a one-tailed test with an alpha level of 0.05 (for the alternative hypothesis H1: μ > 4000) is approximately 1.6449.


Since the calculated test statistic (-38) is significantly less than the critical value (1.6449), we fail to reject the null hypothesis (H0).Therefore, there is NOT enough evidence to support the restaurant owners' claim that the weekly operating costs are higher than the model suggests.

The sample mean of 3,050 is considerably lower than the theoretical mean of $4,000, and the test results indicate that this difference is statistically significant in the opposite direction of the owners' claim.

##Chi-square Test

#Background:
Mizzare Corporation has collected data on customer satisfaction levels for two types of smart home devices: Smart Thermostats and Smart Lights. They want to determine if there's a significant association between the type of device purchased and the customer's satisfaction level.

#Data Provided:
The data is summarized in a contingency table showing the counts of customers in each satisfaction level for both types of devices:

In [3]:
import pandas as pd
df= {'Satisfaction': ['Very Satisfied', 'Satisfied', 'Neutral', 'Unsatisfied', 'Very unsatisfied', 'Total'],
     'Smart Thermostat': [50, 80, 60, 30, 20, 240],
     'Smart Light': [70, 100, 90, 50, 50, 360],
     'Total': [120, 180, 150, 80, 70, 600]}
df=pd.DataFrame(df)
df

Unnamed: 0,Satisfaction,Smart Thermostat,Smart Light,Total
0,Very Satisfied,50,70,120
1,Satisfied,80,100,180
2,Neutral,60,90,150
3,Unsatisfied,30,50,80
4,Very unsatisfied,20,50,70
5,Total,240,360,600


#Assignment Tasks



STEP 1: State the hypothesis

- Null Hypothesis (H0): There is no significant association between the type of smart home device purchased and the customer's satisfaction level.

- Alternative Hypothesis (H1): There is a significant association between the type of smart home device purchased and the customer's satisfaction level.


STEP 2: COMPUTE CHI SQUARE STATISTIC

In [8]:
import numpy as np
from scipy.stats import chi2_contingency



# Create a contingency table using the relevant columns (excluding 'Total' row and column)
contingency_table = df.iloc[:-1, 1:3]
contingency_table

Unnamed: 0,Smart Thermostat,Smart Light
0,50,70
1,80,100
2,60,90
3,30,50
4,20,50


STEP 3: COMPUTE THECHI SQUARE STATISTIC

In [14]:
#Using dataframe contingency_table: compute chi square

from scipy.stats import chi2_contingency
chi2, p, dof, expected = chi2_contingency(contingency_table)

print(f"Chi-square statistic: {chi2}")

Chi-square statistic: 5.638227513227513


In [15]:
print(f"P-value: {p}")

P-value: 0.22784371130697179


In [16]:
print(f"Degrees of freedom: {dof}")

Degrees of freedom: 4


In [17]:
print(f"Expected frequencies: \n{expected}")

Expected frequencies: 
[[ 48.  72.]
 [ 72. 108.]
 [ 60.  90.]
 [ 32.  48.]
 [ 28.  42.]]


STEP 4: MAKE A DECISION

To determine the critical value, we need to know the degrees of freedom and the significance level (alpha).

Let's assume we have a Chi-Square distribution with:

- Degrees of freedom (df) = (Number of rows - 1) * (Number of columns - 1)
- Significance level (alpha) = 0.05

For a 2x4 contingency table:

df = (2 - 1) * (5 - 1) = 4

Using a Chi-Square distribution table or calculator, we can find the critical value:

Critical Value = χ²(4, 0.05) = 9.488

This means that if our calculated Chi-Square statistic is greater than 9.488, we reject the null hypothesis.

In [26]:

# Define the significance level (alpha)
alpha = 0.05

# Compare the p-value with the significance level
if p < alpha:
  print("The p-value is less than the significance level (alpha).")
  print("Reject the null hypothesis (H0).")
  print("There is a significant association between the type of smart home device purchased and the customer's satisfaction level.")
else:
  print("The p-value is greater than or equal to the significance level (alpha).")
  print("Fail to reject the null hypothesis (H0).")
  print("There is no significant association between the type of smart home device purchased and the customer's satisfaction level.")

# Alternatively, compare the chi-square statistic with the critical value
# Critical value for df=4 and alpha=0.05 is 9.488
critical_value_chi2 = 9.488

if chi2 > critical_value_chi2:
  print(f"The Chi-square statistic ({chi2:.4f}) is greater than the critical value ({critical_value_chi2:.4f}).")
  print("Reject the null hypothesis (H0).")
  print("There is a significant association between the type of smart home device purchased and the customer's satisfaction level.")
else:
  print(f"The Chi-square statistic ({chi2:.4f}) is less than or equal to the critical value ({critical_value_chi2:.4f}).")
  print("Fail to reject the null hypothesis (H0).")
  print("There is no significant association between the type of smart home device purchased and the customer's satisfaction level.")

The p-value is greater than or equal to the significance level (alpha).
Fail to reject the null hypothesis (H0).
There is no significant association between the type of smart home device purchased and the customer's satisfaction level.
The Chi-square statistic (5.6382) is less than or equal to the critical value (9.4880).
Fail to reject the null hypothesis (H0).
There is no significant association between the type of smart home device purchased and the customer's satisfaction level.


#Interpretation
If the Chi-Square statistic is smaller than the critical value and the p-value is greater than the significance level (alpha), we:

- Fail to reject the null hypothesis (H0)
- Conclude that there is no significant association between the variables

In the context of the smart home device example:

- We would conclude that the type of smart home device purchased is not significantly associated with customer satisfaction

This doesn't necessarily mean that there is no relationship at all, but rather that the data doesn't provide sufficient evidence to support the alternative hypothesis.



- Calculated Chi-Square statistic: 5.6382

- Degrees of freedom (dof): 4

- p-value: (0.2278)

- Significance level (alpha): 0.05

- Critical value for Chi-square test at alpha=0.05 with 4 degrees of freedom: 9.4880


Since the calculated Chi-square statistic (5.6382) is less than the critical value (9.4880), and the p-value (0.2278)  is greater than the significance level (0.05), we fail to reject the null hypothesis.

**Conclusion:**

There is no statistically significant association between the type of smart home device purchased (Smart Thermostat vs. Smart Light) and the customer's satisfaction level at a 0.05 significance level. The observed differences in satisfaction levels between the two device types are not large enough to conclude that they are dependent.