# Hypothesis Test

Null hypothesis h0: Joy, Sadness,Surprise,Anger,Love,Fear Sentiments are INDEPENDENT upon different continuents.
    
Alternative hypothesis h1: Joy, Sadness,Surprise,Anger,Love,Fear Sentiments are INDEPENDENT upon different continuents.    

The chi-square statistic measures the difference between the observed and expected frequencies. The p-value is the probability of obtaining a chi-square statistic at least as large as the one observed, assuming that the null hypothesis is true. The degrees of freedom is a measure of the number of independent ways in which the observed frequencies can vary. The expected frequencies are the frequencies that would be expected if the variables are independent.

To interpret the result, you can compare the p-value to a significance level (usually 0.05). If the p-value is less than the significance level, you can reject the null hypothesis and conclude that there is a significant association between the variables. If the p-value is greater than the significance level, you cannot reject the null hypothesis and must conclude that there is no significant association between the variables.


Continent	Joy	Sadness	Surprise	Anger	Love	Fear

America	1256	238	45	120	86	63

Europe	490	89	18	52	31	24

Asia	74	11	1	2	6	2

Australia	36	6	1	4	1	1

Africa	1	0	0	0	0	0

In [1]:
import numpy as np
import pandas as pd
import scipy.stats as stats
from scipy.stats import chisquare
from scipy.stats import chi2_contingency

tab_data = [
    [1256, 238, 45, 120, 86, 63],
    [490, 89, 18, 52, 31, 24],
    [74, 11, 1, 2, 6, 2],
    [36, 6, 1, 4, 1, 1]]

chi2_contingency(tab_data)

Chi2ContingencyResult(statistic=7.897248614086874, pvalue=0.9278359731177406, dof=15, expected_freq=array([[1.26294618e+03, 2.34080542e+02, 4.42303350e+01, 1.21123071e+02,
        8.43778698e+01, 6.12420023e+01],
       [4.91766654e+02, 9.11464057e+01, 1.72224313e+01, 4.71629658e+01,
        3.28550997e+01, 2.38464434e+01],
       [6.70590892e+01, 1.24290553e+01, 2.34851336e+00, 6.43131351e+00,
        4.48024087e+00, 3.25178773e+00],
       [3.42280768e+01, 6.34399699e+00, 1.19872036e+00, 3.28264960e+00,
        2.28678961e+00, 1.65976665e+00]]))

# Critical value table 

![chisquare.jpg](attachment:chisquare.jpg)

# America and Europe

In [2]:
tab_data = [[1231, 257, 49, 121, 97, 53],[488, 89, 20, 51, 38, 18]]
chi2_contingency(tab_data)

Chi2ContingencyResult(statistic=1.5315393641346333, pvalue=0.9094017334607224, dof=5, expected_freq=array([[1237.24203822,  249.03184713,   49.66242038,  123.79617834,
          97.1656051 ,   51.10191083],
       [ 481.75796178,   96.96815287,   19.33757962,   48.20382166,
          37.8343949 ,   19.89808917]]))

1.5315393641346333, is the chi-squared statistic, used to assess whether the observed frequencies are significantly different from expected frequencies.

0.9094017334607224, is the p-value = probability of obtaining a chi-squared statistic at least as large as the one observed, assuming that the null hypothesis is true. 

p-value 0.9094017334607224 means there is a 90% chance of obtaining the observed result if the null hypothesis is true. The p-value is greater than 0.05, means it is not possible to reject the null hypothesis of independence based on the observed data.

There is a moderate probability of obtaining the observed frequencies if the variables (continent and sentiment) are independent. Therefore, we CAN NOT reject the null hypothesis and conclude that there is no significant association between the continent (America vs Europe) and sentiment.

5 is degrees of freedom, measure of the number of independent ways in which the observed frequencies can vary. 

The critical value for the chi-squared statistic with a df = 5 and p-value of 0.05 is 11.07. 
The observed chi-squared statistic of 1.5315393641346333 does not exceed this critical value, indicating that the observed frequencies are not significantly different from the expected frequencies. 

This means that it is not possible to reject the null hypothesis of independence between the two variables.

# America and Asia

In [3]:
tab_data = [[1231, 257, 49, 121, 97, 53],[73, 13, 2, 1, 5, 3]]
chi2_contingency(tab_data)

Chi2ContingencyResult(statistic=5.502536364949115, pvalue=0.357667839383391, dof=5, expected_freq=array([[1237.60209974,  256.2519685 ,   48.40314961,  115.78792651,
          96.80629921,   53.14855643],
       [  66.39790026,   13.7480315 ,    2.59685039,    6.21207349,
           5.19370079,    2.85144357]]))

# America and Australia

In [4]:
tab_data = [[1231, 257, 49, 121, 97, 53],[36, 4, 1, 4, 3, 1]]
chi2_contingency(tab_data)

Chi2ContingencyResult(statistic=1.8592283981462594, pvalue=0.8682603886341339, dof=5, expected_freq=array([[1233.56812062,  254.11308562,   48.68066774,  121.70166936,
          97.36133549,   52.57512116],
       [  33.43187938,    6.88691438,    1.31933226,    3.29833064,
           2.63866451,    1.42487884]]))

# Europe and Asia

In [5]:
tab_data = [[488, 89, 20, 51, 38, 18],[73, 13, 2, 1, 5, 3]]
chi2_contingency(tab_data)

Chi2ContingencyResult(statistic=5.82916998028012, pvalue=0.3231979538617116, dof=5, expected_freq=array([[493.06367041,  89.64794007,  19.33583021,  45.70287141,
         37.79275905,  18.45692884],
       [ 67.93632959,  12.35205993,   2.66416979,   6.29712859,
          5.20724095,   2.54307116]]))

# Europe and Australia

In [6]:
tab_data = [[488, 89, 20, 51, 38, 18],[36, 4, 1, 4, 3, 1]]
chi2_contingency(tab_data)

Chi2ContingencyResult(statistic=1.108151975707207, pvalue=0.9533795694591899, dof=5, expected_freq=array([[489.90172643,  86.94820717,  19.63346614,  51.42098274,
         38.33200531,  17.76361222],
       [ 34.09827357,   6.05179283,   1.36653386,   3.57901726,
          2.66799469,   1.23638778]]))

# Asia and Australia

In [7]:
tab_data = [[73, 13, 2, 1, 5, 3],[36, 4, 1, 4, 3, 1]]
chi2_contingency(tab_data)

Chi2ContingencyResult(statistic=5.804215313288845, pvalue=0.3257382021400811, dof=5, expected_freq=array([[72.41780822, 11.29452055,  1.99315068,  3.32191781,  5.31506849,
         2.65753425],
       [36.58219178,  5.70547945,  1.00684932,  1.67808219,  2.68493151,
         1.34246575]]))