### Importing Libraries

In [1]:
import numpy as np
import pandas as pd
import scipy.stats as stats

### Ques1. 

A physician is evaluating a new diet for her patients with a family history of heart disease. To test the effectiveness of this diet, 16 patients are placed on the diet for 6 months. Their weights and triglyceride levels are measured before and after the study, and the physician wants to know if either set of measurements has changed. (Data set: dietstudy.csv)


In [2]:
dietstudy = pd.read_csv('dietstudy.csv')
dietstudy.head()

Unnamed: 0,patid,age,gender,tg0,tg1,tg2,tg3,tg4,wgt0,wgt1,wgt2,wgt3,wgt4
0,1,45,Male,180,148,106,113,100,198,196,193,188,192
1,2,56,Male,139,94,119,75,92,237,233,232,228,225
2,3,50,Male,152,185,86,149,118,233,231,229,228,226
3,4,46,Female,112,145,136,149,82,179,181,177,174,172
4,5,64,Male,156,104,157,79,97,219,217,215,213,214


To test the effectiveness of the diet, would do a Paired sample T-Test on weights and triglyceride levels take before and after the study.

Test1 for weights:
    
    H0: wgt0 == wgt4
        (The weights before and after study are same ie no change in the weights)
    H1: wgt0 <> wgt4
        (The weights before and after study are different ie there's change in the weights)
   

Test2 for triglyceride levels:
         
    H0: tg0 == tg4
        (The triglyceride levels before and after study are same ie no change in the triglyceride levels)
    H1: wgt0 <> wgt4
        (The triglyceride levels before and after study are different ie there's change in the triglyceride levels)
    
    
Both are Two tailed test.

In [9]:
# Test1 for weights:

# step 1 : 

before_diet = dietstudy.wgt0
after_diet  = dietstudy.wgt4


print("The average weights before the diet is {}.".format(before_diet.mean()))
print("The average weights after the diet is {}.".format(after_diet.mean()))

print(' ')

# Step 2: Performing Two Sample Relative/Paired T-Test:

t_stat1 = stats.ttest_rel(a=before_diet, b=after_diet)

print("The T-score is {} and p-value is {}.".format(t_stat1.statistic, t_stat1.pvalue))

The average weights before the diet is 198.375.
The average weights after the diet is 190.3125.
 
The T-score is 11.174521688532522 and p-value is 1.137689414996614e-08.


In [10]:
# checking if p-value more than 5% alpha for Test 1:

t_stat1.pvalue > 0.05

False

In [12]:
# Test2 for triglyceride levels:

# Step 1: 

before_diet_tg = dietstudy.tg0
after_diet_tg  = dietstudy.tg4

print("The average triglyceride levels before diet is {}".format(before_diet_tg.mean()))
print("The average triglyceride levels after diet is {}.".format(after_diet_tg.mean()))

print(' ')

# Step 2: Testing Relative/Paired Sample T-Test

t_Stat2 = stats.ttest_rel(a=before_diet_tg, b=after_diet_tg)

print("The T-score is {} and pvalue is {}.".format(t_Stat2.statistic, t_Stat2.pvalue))

The average triglyceride levels before diet is 138.4375
The average triglyceride levels after diet is 124.375.
 
The T-score is 1.2000008533342437 and pvalue is 0.24874946576903698.


In [13]:
# Checking if p-value is greater than 5% alpha for Test2:

t_Stat2.pvalue > 0.05

True

#### Conclusion:

Test1 on Weights: The p-value is lower than 5% alpha hence would reject Null Hypothesis and with 95% confidence can say that weights before and after diet are statistically significant that is are different.
    
    
Test2 on triglyceride levels: The p-value is higher than 5% alpha hence, would fail to reject Null Hypothesis and with 95% confidence conclude that the triglyceride levels before and after diet are not statistically significant that mean have not changed pre and post the diet.

-----------------

### Ques2. 

An analyst at a department store wants to evaluate a recent credit card promotion. To this end, 500 cardholders were randomly selected. Half received an ad promoting a reduced interest rate on purchases made over the next three months, and half received a standard seasonal ad. Is the promotion effective to increase sales? (Data set: creditpromo.csv)


In [16]:
creditpromo = pd.read_csv('creditpromo.csv')
creditpromo.head()

Unnamed: 0,id,insert,dollars
0,148,Standard,2232.771979
1,572,New Promotion,1403.807542
2,973,Standard,2327.092181
3,1096,Standard,1280.030541
4,1541,New Promotion,1513.5632


To test the effectiveness of the promotion, would conduct Two Sample Independent T test:
    
    H0: Sales from New Promotion == Sales from Standard Ad
        (There is no improvement on sales post Promotion)

    H1: Sales from New Promotion > Sales from Standard Ad
        (There is improvement on Sales post Promotion)


In [27]:
# step 1: 

std       = creditpromo['dollars'].loc[creditpromo['insert'] == 'Standard']
new_promo = creditpromo['dollars'].loc[creditpromo['insert'] == 'New Promotion']


# Step 2: Performing Two Sample Independent T-test:

# Testing under equal variance:

equal_var_ad = stats.ttest_ind(a=std, b=new_promo, equal_var = True)
equal_var_ad

Ttest_indResult(statistic=-2.2604227264649963, pvalue=0.024225996894147814)

In [28]:
# Testing under not having equal variance:

unequal_var_ad = stats.ttest_ind(a=std, b= new_promo, equal_var = False)
unequal_var_ad

Ttest_indResult(statistic=-2.260422726464996, pvalue=0.024226348191648994)

In [29]:
# difference b/w equal var statistic and unequal var statistic:

diff = equal_var_ad.statistic - unequal_var_ad.statistic
diff

-4.440892098500626e-16

As the difference between equal variance statistic and unequal variance statistic is very less, almost negligible, hence we would conduct the test under equal variance assumption.

In [31]:
# Checking if p-value is more or less than 5% alpha:

equal_var_ad.pvalue > 0.05

False

#### Conclusion:

The p-value under the equal variance assumption is less than 5% alpha, hence reject the null hypothesis and with 95% confidence conclude that the New Promotion has improvement on the Sales.

To check how much improvement, we would need to compare the average of Sales taken for Standard and that of New Promotion.

In [33]:
# Checking how much improvement on Sales:

print("The average of Sales for people who received Standard Seasonal Ad is {}.".format(std.mean()))
print("The average of Sales for people who received New Promotion is {}.".format(new_promo.mean()))

The average of Sales for people who received Standard Seasonal Ad is 1566.3890309659348.
The average of Sales for people who received New Promotion is 1637.4999830647992.


From above, we can see that promotion has been effective and has had significant increase in the Sales.

-----------------

### Ques3. 

An experiment is conducted to study the hybrid seed production of bottle gourd under open field conditions. The main aim of the investigation is to compare natural pollination and hand pollination. The data are collected on 10 randomly selected plants from each of natural pollination and hand pollination. The data are collected on fruit weight (kg), seed yield/plant (g) and seedling length (cm). (Data set: pollination.csv)



In [35]:
pollination = pd.read_csv('pollination.csv')
pollination.head()

Unnamed: 0,Group,Fruit_Wt,Seed_Yield_Plant,Seedling_length
0,Natural,1.85,147.7,16.86
1,Natural,1.86,136.86,16.77
2,Natural,1.83,149.97,16.35
3,Natural,1.89,172.33,18.26
4,Natural,1.8,144.46,17.9


### a. Is the overall population of Seed yield/plant (g) equals to 200?


To test the overall population equal to 200 or not, conduct one sample T-Test:
    
    
    H0: sample mean of Seed yield/plant(g) == pop mean of Seed yield/plant(g) == 200
    H1: sample mean of Seed yield/plant(g) <> pop mean of Seed yield/plant(g) <> 200
        
    It's a two - tailed test.

In [52]:
# Step 1:

seed = pollination.Seed_Yield_Plant

# Step 2: Perform One - Sample t-Test:

one_sample_seed = stats.ttest_1samp(a=seed, popmean = 200)
one_sample_seed

Ttest_1sampResult(statistic=-2.3009121248548645, pvalue=0.032891040921283025)

In [53]:
print("The T-score is {} and p-value is {}.".format(one_sample_seed.statistic, one_sample_seed.pvalue))

The T-score is -2.3009121248548645 and p-value is 0.032891040921283025.


In [54]:
# Checking if p-value is higher than 5% alpha:

one_sample_seed.pvalue > 0.05

False

#### Conclusion:

The p-value for the test is lower than 5% alpha, hence we reject the Null Hypothesis and with 95% confidence conclude that Seed yield/plant(g) is statistically significant from 200.

-----------------

### b. Test whether the natural pollination and hand pollination under open field conditions are equally effective or are significantly different.

To test the effectiveness of the pollination, would conduct Two Sample Independent T-test for the respective open field conditions:
  
For Fruit_Wt:

    H0: Fruit Weight from Natural Pollination == Fruit Weight from Hand Pollination
        (There is no difference in fruit weight)

    H1: Fruit Weight from Natural Pollination <> Fruit Weight from Hand Pollination
        (There is significant difference in fruit weight)

In [69]:
# step 1: Fetch the respective values:

natural_fruit_weight = pollination.Fruit_Wt.loc[pollination.Group == 'Natural']
hand_fruit_weight    = pollination.Fruit_Wt.loc[pollination.Group == 'Hand']

# step 2: Perform Two Sample T-test:

# Under equal variance

test1a = stats.ttest_ind(a= natural_fruit_weight, b= hand_fruit_weight, equal_var = True)

print("The T-score is {} and p-value is {}.".format(test1a.statistic, test1a.pvalue))

The T-score is -17.669989614440286 and p-value is 8.078362076486221e-13.


In [70]:
# Under unequal variance

test1b = stats.ttest_ind(a=natural_fruit_weight, b= hand_fruit_weight, equal_var=False)

print("The T-score is {} and p-value is {}.".format(test1b.statistic, test1b.pvalue))

The T-score is -17.669989614440286 and p-value is 4.306871213074868e-09.


In [71]:
# Difference between equal variance and unequal variance statistic:

test1a.statistic - test1b.statistic

0.0

Since, there isn't any difference between equal variance and unequal variance statistic would conduct the test under equal variance assumption

In [73]:
# Checking if p-value under equal variance ass is higher than 5% alpha:

test1a.pvalue > 0.05

False

#### Conclusion for Fruit Weight test:

The p-value is lower than 5% alpha, hence reject the null hypothesis and with 95% confidence conclude that there is significant difference in fruit weight under the Normal and Hand Pollination.

For Seed_Yield_Plant:

    H0: Seed Yield from Natural Pollination == Seed Yield from Hand Pollination
        (There is no difference in Seed Yield)

    H1: Seed Yield from Natural Pollination <> Seed Yield from Hand Pollination
        (There is significant difference in Seed Yield)

In [75]:
# step 1: Fetch the respective values:

natural_seed_yield = pollination.Seed_Yield_Plant.loc[pollination.Group == 'Natural']
hand_seed_yield    = pollination.Seed_Yield_Plant.loc[pollination.Group == 'Hand']


# step 2: Perform Two Sample T-test:

# Under equal variance

test2a = stats.ttest_ind(a= natural_seed_yield, b= hand_seed_yield, equal_var = True)

print("The T-score is {} and p-value is {}.".format(test2a.statistic, test2a.pvalue))

The T-score is -13.958260515902547 and p-value is 4.2714815854843853e-11.


In [77]:
# Under unequal variance

test2b = stats.ttest_ind(a=natural_seed_yield, b= hand_seed_yield, equal_var=False)

print("The T-score is {} and p-value is {}.".format(test2b.statistic, test2b.pvalue))

The T-score is -13.958260515902547 and p-value is 5.136161282685624e-11.


In [79]:
# Difference between equal variance and unequal variance statistic:

test2a.statistic - test2b.statistic

0.0

Since, there isn't any difference between equal variance and unequal variance statistic would conduct the test under equal variance assumption

In [84]:
# Checking if p-value under equal variance ass is higher than 5% alpha:

test2a.pvalue > 0.05

False

#### Conclusion for Seed Yield test:

The p-value is lower than 5% alpha, hence reject the null hypothesis and with 95% confidence conclude that there is significant difference in seed yield under the Normal and Hand Pollination.

For Seedling length:

    H0: Seedling length from Natural Pollination == Seedling length from Hand Pollination
        (There is no difference in Seedling length)

    H1: Seedling length from Natural Pollination <> Seedling length from Hand Pollination
        (There is significant difference in Seedling length)

In [76]:
# step 1: Fetch the respective values:

natural_seedling_length = pollination.Seedling_length.loc[pollination.Group == 'Natural']
hand_seedling_length    = pollination.Seedling_length.loc[pollination.Group == 'Hand']

# step 2: Perform Two Sample T-test:

# Under equal variance

test3a = stats.ttest_ind(a= natural_seedling_length, b= hand_seedling_length, equal_var = True)

print("The T-score is {} and p-value is {}.".format(test3a.statistic, test3a.pvalue))

The T-score is -2.542229999657055 and p-value is 0.020428817064110226.


In [78]:
# Under unequal variance

test3b = stats.ttest_ind(a=natural_seedling_length, b= hand_seedling_length, equal_var=False)

print("The T-score is {} and p-value is {}.".format(test3b.statistic, test3b.pvalue))

The T-score is -2.542229999657055 and p-value is 0.021430608378161634.


In [80]:
# Difference between equal variance and unequal variance statistic:

test3a.statistic - test3b.statistic

0.0

Since, there isn't any difference between equal variance and unequal variance statistic would conduct the test under equal variance assumption

In [85]:
# Checking if p-value under equal variance ass is higher than 5% alpha:

test3a.pvalue > 0.05

False

#### Conclusion for Seedling Length test:

The p-value is lower than 5% alpha, hence reject the null hypothesis and with 95% confidence conclude that there is significant difference in seedling length under the Normal and Hand Pollination.

-----------------

### Ques4. 

An electronics firm is developing a new DVD player in response to customer requests. Using a prototype, the marketing team has collected focus data for different age groups viz. Under 25; 25-34; 35-44; 45-54; 55-64; 65 and above. Do you think that consumers of various ages rated the design differently? (Data set: dvdplayer.csv).
2


In [43]:
dvdplayer = pd.read_csv('dvdplayer.csv')
dvdplayer.head()

Unnamed: 0,agegroup,dvdscore
0,65 and over,38.454803
1,55-64,17.669677
2,65 and over,31.704307
3,65 and over,25.92446
4,Under 25,30.450007


To test whether the various agegroup is influencing or not dvdscore, we conduct ANOVA test:
    
    H0: mean of agegroup Under 25 == mean of agegroup (25-34) == mean of agegroup (35-44)
          == mean of agegroup 45-54) == mean of agegroup(55-64) == mean of agegroup (65 and over)
            
         (There is no influence of age group of dvdscore ie consumers of various ages did not rate the design differently.)
        
    H1: mean of agegroup Under 25 <> mean of agegroup (25-34) <> mean of agegroup (35-44)
          == mean of agegroup (45-54) <> mean of agegroup(55-64) <> mean of agegroup (65 and over)
            
         (There is influence of age group of dvdscore ie consumers of various ages rated the design differently.)
        
    This is a one-tailed test as ANOVA is based on F-Test, which is asymmetrical distribution, hence has only one-tail.

In [49]:
a1 = dvdplayer.dvdscore.loc[dvdplayer.agegroup == 'Under 25']
a2 = dvdplayer.dvdscore.loc[dvdplayer.agegroup == '25-34']
a3 = dvdplayer.dvdscore.loc[dvdplayer.agegroup == '35-44']
a4 = dvdplayer.dvdscore.loc[dvdplayer.agegroup == '45-54']
a5 = dvdplayer.dvdscore.loc[dvdplayer.agegroup ==  '55-64']
a6 = dvdplayer.dvdscore.loc[dvdplayer.agegroup ==  '65 and over']

In [51]:
# Perform ANOVA

anova_dvd = stats.f_oneway(a1, a2, a3, a4, a5, a6)
anova_dvd

F_onewayResult(statistic=6.992526962676518, pvalue=3.087324905679639e-05)

In [55]:
print("The F-score is {} and p-value is {}.".format(anova_dvd.statistic, anova_dvd.pvalue))

The F-score is 6.992526962676518 and p-value is 3.087324905679639e-05.


In [56]:
# checking if pvalue is more or less than 5% alpha:

anova_dvd.pvalue > 0.05

False

#### Conclusion:

The p-value is lower than 5% alpha hence we reject Null Hypothesis. With 95% confidence can conclude that there is statisical significance between the age group and has influence on the dvdscore. Hence, the consumers of various ages rated the design differently.

-----------------

### Ques5. 

A survey was conducted among 2800 customers on several demographic characteristics. Working status, sex, age, age-group, race, happiness, no. of child, marital status, educational qualifications, income group etc. had been captured for that purpose. (Data set: sample_survey.csv).


In [2]:
sample_survey = pd.read_csv('sample_survey.csv')
sample_survey.head()

Unnamed: 0,id,wrkstat,marital,childs,age,educ,paeduc,maeduc,speduc,degree,...,agecat,childcat,news1,news2,news3,news4,news5,car1,car2,car3
0,1,Working full time,Divorced,2.0,60.0,12.0,12.0,12.0,,High school,...,55 to 64,1-2,No,No,No,No,No,American,Japanese,Japanese
1,2,Working part-time,Never married,0.0,27.0,17.0,20.0,,,Junior college,...,25 to 34,,No,No,Yes,No,No,American,German,Japanese
2,3,Working full time,Married,2.0,36.0,12.0,12.0,12.0,16.0,High school,...,35 to 44,1-2,No,No,No,Yes,Yes,American,American,
3,4,Working full time,Never married,0.0,21.0,13.0,,12.0,,High school,...,Less than 25,,No,No,No,Yes,Yes,American,Other,
4,5,Working full time,Never married,0.0,35.0,16.0,,12.0,,Bachelor,...,35 to 44,,No,No,No,No,No,American,American,Korean


### a. Is there any relationship in between labour force status with marital status?


To test realationship between labour force status and martial status, conduct Chi - square Test as both are Categorical variables.

     H0: Observed values == Expected values
           (There is no relationship b/w labour force status and martial status)
           
    H1: Observed values <> Expected values
            (There is relationship b/w labour force status and martial status)
            
    This is a one-tailed test as Chi-Square dis is based only on one-tailed.

In [105]:
# Step1: Create the CrossTab of the observed values:

labour_marital = pd.crosstab(index=sample_survey.wrkstat, columns=sample_survey.marital)
labour_marital

marital,Divorced,Married,Never married,Separated,Widowed
wrkstat,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Keeping house,25,200,35,13,55
Other,12,16,14,4,8
Retired,53,168,17,6,150
School,7,9,60,2,1
Temporarily not working,9,23,11,1,2
"Unemployed, laid off",10,13,32,0,3
Working full time,295,778,392,58,44
Working part-time,35,138,102,9,20


In [106]:
# Step2 : Perform Chi-Square Test:

chisqtest_1 = stats.chi2_contingency(observed = labour_marital)

print('The Chi-Square Test is {} and p-value is {}.'.format(chisqtest_1[0], chisqtest_1[1]))

The Chi-Square Test is 729.2421426572284 and p-value is 1.4875268409067568e-135.


In [107]:
# checking if p-value is greater or less than 5% alpha:

chisqtest_1[1] > 0.05

False

#### Conclusion:

The p-value is lower than 5% alpha hence will reject the Null Hypothesis. With 95 % confidence can conclude that there is statistical significance and there is relationship between labour force status and martial status.

-----------------

### b. Do you think educational qualification is somehow controlling the marital status?


To test relationship between educational qualification(degree) and martial status, conduct Chi - square Test as both are Categorical variables.

     H0: Observed values == Expected values
           (There is no relationship b/w educational qualification(degree) and martial status)
           
    H1: Observed values <> Expected values
            (There is relationship b/w educational qualification(degree) and martial status)
            
    This is a one-tailed test as Chi-Square dis is based only on one-tailed.

In [108]:
# Step1: Create CrossTab of the observed values:

edu_marital = pd.crosstab(index=sample_survey.degree, columns= sample_survey.marital)
edu_marital

marital,Divorced,Married,Never married,Separated,Widowed
degree,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Bachelor,58,251,129,12,28
Graduate,29,123,41,3,9
High school,241,686,367,58,148
Junior college,45,108,46,3,6
LT High school,70,174,77,17,92


In [114]:
# Step2 : Perform Chi-Square test:

chisqtest_2 = stats.chi2_contingency(observed = edu_marital)

print('The Chi Sq Score is {} and p-value is {}.'.format(chisqtest_2[0], chisqtest_2[1]))

The Chi Sq Score is 122.68449020508541 and p-value is 1.6707923432360119e-18.


In [115]:
# checking if pvalue is more or less than 5% alpha

chisqtest_2[1] > 0.05

False

#### Conclusion:

The p-value is lower than 5% alpha hence will reject the Null Hypothesis. With 95 % confidence can conclude that there is statistical significance and there is relationship between educational qualification and martial status.

-----------------

### c. Is happiness is driven by earnings or marital status?

To test if happiness is driven by earnings or marital status would conduct Chi - square Test as both are Categorical variables.

#### Performing Chi Square test between happiness and earnings

To test relationship between happiness and earnings, conduct Chi - square Test.

     H0: Observed values == Expected values
           (There is no relationship b/w happy and earnings)
           
    H1: Observed values <> Expected values
            (There is relationship b/w happy and earnings)
            
    This is a one-tailed test as Chi-Square dis is based only on one-tailed.

In [10]:
# Step 1: Create Crosstab b/w happy and earnings

happy_earnings = pd.crosstab(index = sample_survey['happy'], columns = sample_survey['income'])
happy_earnings

income,$1000 TO 2999,$10000 - 14999,$15000 - 19999,$20000 - 24999,$25000 or more,$3000 TO 3999,$4000 TO 4999,$5000 TO 5999,$6000 TO 6999,$7000 TO 7999,$8000 TO 9999,LT $1000
happy,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
Not too happy,7,39,33,40,113,9,9,6,14,12,9,11
Pretty happy,20,107,119,155,888,11,13,18,13,21,30,13
Very happy,5,44,26,50,571,4,10,11,6,14,19,11


In [13]:
# Step 2 : Peform Chi-Square Test:

chisqtest_3 = stats.chi2_contingency(observed=happy_earnings)
chisqtest_3

print("The Chi Sq Score is {} and p-value is {}.".format(chisqtest_3[0], chisqtest_3[1]))

The Chi Sq Score is 178.9505306121643 and p-value is 1.4107677273473057e-26.


In [15]:
# Check if p-value is more or less than 5% alpha

chisqtest_3[1] > 0.05

False

#### Conclusion:

The p-value is lower than 5% alpha, hence we reject the Null Hypothesis. With 95% confidence, can conclude that there is relationship between happiness and earnings i/e is statistically significant. Hence, Happiness is driven by earnings.

#### Performing Chi Square test between happiness and marital status

To test relationship between happiness and earnings, conduct Chi - square Test.

     H0: Observed values == Expected values
           (There is no relationship b/w happy and martial status)
           
    H1: Observed values <> Expected values
            (There is relationship b/w happy and martial status)
            
    This is a one-tailed test as Chi-Square dis is based only on one-tailed.

In [18]:
# Step 1: Creating CrossTab b/w Happiness and Marital Status

happy_marital = pd.crosstab(index=sample_survey.happy, columns= sample_survey.marital)
happy_marital

marital,Divorced,Married,Never married,Separated,Widowed
happy,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Not too happy,72,71,108,30,59
Pretty happy,278,684,426,49,137
Very happy,93,582,120,13,83


In [21]:
# Step 2: Perform Chi - Sq Test:

chisqtest_4 = stats.chi2_contingency(observed = happy_marital)

print("The Chi - Sq Score is {} and p-value is {}.".format(chisqtest_4[0], chisqtest_4[1]))

The Chi - Sq Score is 260.6894389418282 and p-value is 9.3147261197964e-52.


In [22]:
# Checking if p-value is less than or more than 5% alpha

chisqtest_4[1] > 0.05

False

#### Conclusion:

The p-value is smaller than 5% alpha, hence would reject Null Hypothesis. With 95% confidence, can concude that happy and marital status are statistically significant. Hence, there is a relationship between happiness and marital status and happiness is driven by marital status.

-----------------