### Assignment To Perform Chi-Squared Test on Association Between Device Type and Customer Satisfaction.

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [4]:
#Given Data

df = pd.DataFrame({
    "Satisfaction" : ["Very Satisfied","Satisfied","Neutral","Unsatisfied","Very Unsatisfied"],
    "Smart Thermostat": [50,80,60,30,20],
    "Smart Light": [70,100,90,50,50],
})
df

Unnamed: 0,Satisfaction,Smart Thermostat,Smart Light
0,Very Satisfied,50,70
1,Satisfied,80,100
2,Neutral,60,90
3,Unsatisfied,30,50
4,Very Unsatisfied,20,50


In [6]:
df['Total'] = df['Smart Thermostat'] + df['Smart Light']
df

Unnamed: 0,Satisfaction,Smart Thermostat,Smart Light,Total
0,Very Satisfied,50,70,120
1,Satisfied,80,100,180
2,Neutral,60,90,150
3,Unsatisfied,30,50,80
4,Very Unsatisfied,20,50,70


USE CASE : To use the Chi-Square test for independence to determine if 
there's a significant association between the type of smart home device 
purchased (Smart Thermostats vs. Smart Lights) and the customer satisfaction level

In [9]:
#Perfroming Basic EDA
df.shape

(5, 4)

In [11]:
df.columns

Index(['Satisfaction', 'Smart Thermostat', 'Smart Light', 'Total'], dtype='object')

In [13]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 4 columns):
 #   Column            Non-Null Count  Dtype 
---  ------            --------------  ----- 
 0   Satisfaction      5 non-null      object
 1   Smart Thermostat  5 non-null      int64 
 2   Smart Light       5 non-null      int64 
 3   Total             5 non-null      int64 
dtypes: int64(3), object(1)
memory usage: 292.0+ bytes


In [15]:
df.describe()

Unnamed: 0,Smart Thermostat,Smart Light,Total
count,5.0,5.0,5.0
mean,48.0,72.0,120.0
std,23.874673,22.803509,46.368092
min,20.0,50.0,70.0
25%,30.0,50.0,80.0
50%,50.0,70.0,120.0
75%,60.0,90.0,150.0
max,80.0,100.0,180.0


In [17]:
df.isnull().sum()

Satisfaction        0
Smart Thermostat    0
Smart Light         0
Total               0
dtype: int64

#### 1. State The Hypothesis

* **H0/Null-Hypothesis :** There is no significant association between the type of smart home device purchased (Smart Thermostats vs. Smart Lights) and the customer satisfaction level.

* **HA/Alternate-Hypothesis :** There is a significant association between the type of dmsrt home device purchsed and customer satisfaction level.

In [20]:
# Separating Observed Value

observed = df[['Smart Thermostat','Smart Light']].values
print(observed)

[[ 50  70]
 [ 80 100]
 [ 60  90]
 [ 30  50]
 [ 20  50]]


#### 2. Computing the Chi-Square Statistic:

In [23]:
import scipy.stats as stats
from scipy.stats import chi2_contingency

#chi2_contingency takes observed values as argumen and returns 4 values - chi2_stats, p_value, Degrees_of_freedom, expected value
#most direct and forward way of performing chi-squared test

chi2_stat, p_value, dof, expected = stats.chi2_contingency(observed)
chi2_stat, p_value, dof, expected

(5.638227513227513,
 0.22784371130697179,
 4,
 array([[ 48.,  72.],
        [ 72., 108.],
        [ 60.,  90.],
        [ 32.,  48.],
        [ 28.,  42.]]))

**Formula to calculate Expected value:** E = (Total_rows * Total_columns)/Grand_total

**Formula to calculate Chi-Squared Statistics :** 
chi_square = summation[(Observed_value - Expected_value)**2/Expected_value]

In [25]:
#Manual way of calculating chi_statistics

chi_square = sum([(o-e)**2/e for o,e in zip(observed, expected)])
chi_square

array([3.38293651, 2.25529101])

In [26]:
chi_square_statistics = chi_square[0]+chi_square[1]
chi_square_statistics

5.638227513227513

We can see manual way and using the function gave the same chi_stats value (that is 5.638227513227513)

3. Determining the Critical Value:

**To determine Critical Value :** 

**Degrees of freedom (DOF) =** (number_of_rows - 1)*(numer_of_columns - 1)
**Significance Level (alpa)=** 0.05

In [32]:
#Finding Degrees of freedom manually

raw_data = pd.DataFrame(observed)
rows = len(raw_data.iloc[:,:1])     #Selects the total number of rows  (5)
cols = len(raw_data.columns)        #Selects the total number of columns  (2)
dof = (rows-1)*(cols-1)             #manual way of finding Defrees of freedom
alpha = 0.05                        #Singnificance level

In [34]:
#Finding the Criical Value
from scipy.stats import chi2

critical_value = chi2.ppf(q=1-alpha, df=dof)      #q is the quantile value which is equal to 1-significance_level
print("The Critical Value is : ",critical_value)

The Critical Value is :  9.487729036781154


4. Making a Decision:

In [39]:
if critical_value <= chi_square_statistics:
    print("Reject Null Hypothesis")
else:
    print("Falield to Reject Null Hypothesis")

Falield to Reject Null Hypothesis


Since the chi_statistics value is less than that of critical value we can conclude by saying that we failed to reject NULL HYPOTHESE (H0).

Hence we can say based on the hypothesis test "there is no significant association between the type of smart home device purchased (Smart Thermostats vs. Smart Lights) and the customer satisfaction level."