In [1]:
import numpy as np
import pandas as pd
import scipy.stats as st
from scipy.stats import chi2_contingency

#### Q1 A F&B manager wants to determine whether there is any significant difference in the diameter of the cutlet between two units. A randomly selected sample of cutlets was collected from both units and measured? Analyze the data and draw inferences at 5% significance level. Please state the assumptions and tests that you carried out to check validity of the assumptions.

In [2]:
q1 = pd.read_csv('Cutlets.csv')
q1

Unnamed: 0,Unit A,Unit B
0,6.809,6.7703
1,6.4376,7.5093
2,6.9157,6.73
3,7.3012,6.7878
4,7.4488,7.1522
5,7.3871,6.811
6,6.8755,7.2212
7,7.0621,6.6606
8,6.684,7.2402
9,6.8236,7.0503


- Null hypothesis - There is no difference between diameters of cutlets.
- Alternate hypothesis - There is a difference between diameters of cutlets.
- Two tailed t-test
- alpha is given as 5% i.e. 0.05

In [3]:
stat1, p_value1 = st.ttest_ind(q1['Unit A'],q1['Unit B'])
print(p_value1)

0.4722394724599501


In [4]:
if (p_value1>0.05):
    print("Since p_value is greater than alpha")
    print("Failed to reject null hypothesis")
    print("There is no difference between diameters of cutlets")
else:
    print("Since alpha is greater than p_value")
    print("Reject null hypothesis")
    print("There is some difference between diameters of cutlets")

Since p_value is greater than alpha
Failed to reject null hypothesis
There is no difference between diameters of cutlets


#### Q2    A hospital wants to determine whether there is any difference in the average Turn Around Time (TAT) of reports of the laboratories on their preferred list. They collected a random sample and recorded TAT for reports of 4 laboratories. TAT is defined as sample collected to report dispatch. Analyze the data and determine whether there is any difference in average TAT among the different laboratories at 5% significance level.

In [5]:
q2 = pd.read_csv('LabTAT.csv')
q2

Unnamed: 0,Laboratory 1,Laboratory 2,Laboratory 3,Laboratory 4
0,185.35,165.53,176.70,166.13
1,170.49,185.91,198.45,160.79
2,192.77,194.92,201.23,185.18
3,177.33,183.00,199.61,176.42
4,193.41,169.57,204.63,152.60
...,...,...,...,...
115,178.49,170.66,193.80,172.68
116,176.08,183.98,215.25,177.64
117,202.48,174.54,203.99,170.27
118,182.40,197.18,194.52,150.87


- Null hypothesis: There is no diference between TAT of different laboratories
- Alternate hypothesis: At least one laboratory has different TAT
- Anova Test
- alpha = 0.05

In [6]:
stat2, p_value2 = st.f_oneway(q2['Laboratory 1'],q2['Laboratory 2'],q2['Laboratory 3'],q2['Laboratory 4'])

In [7]:
if (p_value2>0.05):
    print("Since p_value is greater than alpha")
    print("Failed to reject null hypothesis")
    print("TATs of all libraries are equal")
else:
    print("Since alpha is greater than p_value")
    print("Reject null hypothesis")
    print("TAT of at least one laboratory is different")

Since alpha is greater than p_value
Reject null hypothesis
TAT of at least one laboratory is different


#### Q3 Sales of products in four different regions is tabulated for males and females. Find if male-female buyer ratios are similar across regions.

In [8]:
q3 = pd.read_csv('BuyerRatio.csv', index_col='Observed Values')
q3

Unnamed: 0_level_0,East,West,North,South
Observed Values,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Males,50,142,131,70
Females,435,1523,1356,750


- Null hypothesis = All male-female Buyer ratios are equal
- Alternate hypothesis = At least one male-female buyer ratio is different
- chi-square test will be used test will be used

In [9]:
#we need to create an array to use the chi-square test
q3_array = np.array([[50,142,131,70],[435,1523,1356,750]])

In [10]:
statistics3, p_value3, dof3, array3 = chi2_contingency(q3_array)

In [11]:
statistics3

1.595945538661058

In [12]:
p_value3

0.6603094907091882

In [13]:
dof3

3

In [14]:
array3

array([[  42.76531299,  146.81287862,  131.11756787,   72.30424052],
       [ 442.23468701, 1518.18712138, 1355.88243213,  747.69575948]])

In [15]:
if (p_value3>0.05):
    print("Since p_value is greater than alpha")
    print("Failed to reject null hypothesis")
    print("All male-female Buyer ratios are equal")
else:
    print("Since alpha is greater than p_value")
    print("Reject null hypothesis")
    print(" At least one male-female buyer ratio is different")

Since p_value is greater than alpha
Failed to reject null hypothesis
All male-female Buyer ratios are equal


#### Q4 TeleCall uses 4 centers around the globe to process customer order forms. They audit a certain %  of the customer order forms. Any error in order form renders it defective and has to be reworked before processing.  The manager wants to check whether the defective %  varies by centre. Please analyze the data at 5% significance level and help the manager draw appropriate inferences

In [16]:
q4 = pd.read_csv('Costomer+OrderForm.csv')
q4

Unnamed: 0,Phillippines,Indonesia,Malta,India
0,Error Free,Error Free,Defective,Error Free
1,Error Free,Error Free,Error Free,Defective
2,Error Free,Defective,Defective,Error Free
3,Error Free,Error Free,Error Free,Error Free
4,Error Free,Error Free,Defective,Error Free
...,...,...,...,...
295,Error Free,Error Free,Error Free,Error Free
296,Error Free,Error Free,Error Free,Error Free
297,Error Free,Error Free,Defective,Error Free
298,Error Free,Error Free,Error Free,Error Free


- Null hypothesis: Defective % of all centers is equal
- Alternate hypothesis: At least one center has different defective %
- Chi-square Test

In [17]:
#First we need to count the error free and defective values in all columns

In [18]:
q4['Phillippines'].value_counts()

Error Free    271
Defective      29
Name: Phillippines, dtype: int64

In [19]:
q4['Indonesia'].value_counts()

Error Free    267
Defective      33
Name: Indonesia, dtype: int64

In [20]:
q4['Malta'].value_counts()

Error Free    269
Defective      31
Name: Malta, dtype: int64

In [21]:
q4['India'].value_counts()

Error Free    280
Defective      20
Name: India, dtype: int64

In [22]:
# Creating an array of this data
q4_array = np.array([[271,267,269,280],[29,33,31,20]])

In [23]:
statistics4, p_value4, dof4, array4 = chi2_contingency(q4_array)

In [24]:
statistics4

3.858960685820355

In [25]:
p_value4

0.2771020991233135

In [26]:
dof4

3

In [27]:
array4

array([[271.75, 271.75, 271.75, 271.75],
       [ 28.25,  28.25,  28.25,  28.25]])

In [28]:
if (p_value4>0.05):
    print("Since p_value is greater than alpha")
    print("Failed to reject null hypothesis")
    print("Defective % of all centers is equal")
else:
    print("Since alpha is greater than p_value")
    print("Reject null hypothesis")
    print("Defective % of at least one center is different")

Since p_value is greater than alpha
Failed to reject null hypothesis
Defective % of all centers is equal
