This test is invalid when the observed or expected frequencies in each category are too small.
A typical rule is that all of the observed and expected frequencies should be at least 5.
According to [3], the total number of samples is recommended to be greater than 13, otherwise exact tests (such as Barnard’s Exact test) should be used because they do not overreject.

Also, the sum of the observed and expected frequencies must be the same for the test to be valid; chisquare raises an error if the sums do not agree within a relative tolerance of 1e-8.

The default degrees of freedom, k-1, are for the case when no parameters of the distribution are estimated. If p parameters are estimated by efficient maximum likelihood then the correct degrees of freedom are k-1-p.
If the parameters are estimated in a different way, then the dof can be between k-1-p and k-1.
However, it is also possible that the asymptotic distribution is not chi-square, in which case this test is not appropriate.

In [145]:
import bundle.baselines.st3 as st3
import bundle.baselines.st2 as st2
from models.helpers import *
import pandas as pd
from sklearn.metrics import classification_report as report


In [146]:
languages_with_train = ["en", "fr", "ge", "it", "po", "ru"]

train = pd.DataFrame()
test = pd.DataFrame()
train_outer = pd.DataFrame()

for lang in languages_with_train:
    paths_st3 = get_paths(lang)
    paths_st2 = get_paths(lang, subtask=2)

    st3_train = st3.make_dataframe(paths_st3["train_folder"], paths_st3["train_labels"])
    st3_test = st3.make_dataframe(paths_st3["dev_folder"], paths_st3["dev_labels"])

    st2_train = st2.make_dataframe(paths_st2["train_folder"], paths_st2["train_labels"])
    st2_test = st2.make_dataframe(paths_st2["dev_folder"], paths_st2["dev_labels"])

    st3_train = st3_train.reset_index()
    st3_test = st3_test.reset_index()
    st2_train = st2_train.reset_index()
    st2_test = st2_test.reset_index()

    st3_train['id'] = st3_train['id'].astype(int)
    st2_train['id'] = st2_train['id'].astype(int)
    st3_test['id'] = st3_test['id'].astype(int)
    st2_test['id'] = st2_test['id'].astype(int)

    train_temp = st3_train.merge(st2_train, how='inner', on='id')
    train_temp_outer = st3_train.merge(st2_train, how='outer', on='id')

    test_temp = st3_test.merge(st2_test, how='inner', on='id')

    train = pd.concat([train, train_temp])
    train_outer = pd.concat([train_outer, train_temp_outer])

    test = pd.concat([test, test_temp])


446it [00:00, 5568.17it/s]
90it [00:00, 6236.16it/s]
433it [00:00, 7017.71it/s]
83it [00:00, 7038.70it/s]
158it [00:00, 5441.33it/s]
53it [00:00, 3640.41it/s]
158it [00:00, 6445.24it/s]
53it [00:00, 5302.15it/s]
132it [00:00, 6280.61it/s]
45it [00:00, 5993.77it/s]
132it [00:00, 7230.14it/s]
45it [00:00, 6528.44it/s]
227it [00:00, 6263.29it/s]
76it [00:00, 7172.49it/s]
227it [00:00, 7535.53it/s]
76it [00:00, 5900.58it/s]
145it [00:00, 5803.24it/s]
49it [00:00, 4874.32it/s]
145it [00:00, 6718.52it/s]
49it [00:00, 5949.54it/s]
143it [00:00, 5935.36it/s]
48it [00:00, 5315.69it/s]
143it [00:00, 5574.78it/s]
48it [00:00, 7131.40it/s]


In [147]:
train = train.drop(columns=['id', "line", 'text_x', 'text_y'])
test = test.drop(columns=['id', "line", 'text_x', 'text_y'])

In [148]:
train

Unnamed: 0,labels,frames
0,Doubt,"Health_and_safety,Quality_of_life"
1,Appeal_to_Authority,"Health_and_safety,Quality_of_life"
2,Repetition,"Health_and_safety,Quality_of_life"
3,Appeal_to_Fear-Prejudice,"Health_and_safety,Quality_of_life"
4,Appeal_to_Fear-Prejudice,"Health_and_safety,Quality_of_life"
...,...,...
1240,Loaded_Language,"Economic,Political,Security_and_defense"
1241,"Doubt,Loaded_Language","Economic,Political,Security_and_defense"
1242,"Doubt,Loaded_Language","Economic,Political,Security_and_defense"
1243,Doubt,"Economic,Political,Security_and_defense"


In [149]:
test

Unnamed: 0,labels,frames
0,"False_Dilemma-No_Choice,Loaded_Language","Political,External_regulation_and_reputation,P..."
1,"False_Dilemma-No_Choice,Loaded_Language,Name_C...","Political,External_regulation_and_reputation,P..."
2,Conversation_Killer,"Political,External_regulation_and_reputation,P..."
3,"Conversation_Killer,Red_Herring","Political,External_regulation_and_reputation,P..."
4,Obfuscation-Vagueness-Confusion,"Political,External_regulation_and_reputation,P..."
...,...,...
305,Doubt,"Quality_of_life,Security_and_defense"
306,Doubt,"Quality_of_life,Security_and_defense"
307,Doubt,"Quality_of_life,Security_and_defense"
308,Loaded_Language,"Quality_of_life,Security_and_defense"


In [150]:
# let's figure out how many rows we have lost in the merge
print(f"we have lost {train_outer.shape[0] - train.shape[0]} rows")

# there were some rows that were not present in both dataframes, so this is why we have a loss of data
print(f"we have {train.shape[0]} rows in the merged dataframe")

# the loss in not significant and we have enough data to perform the chi squared test

we have lost 198 rows
we have 10769 rows in the merged dataframe


# Chi squared test

### Train data

In [151]:
import scipy.stats as stats

# let's perform the chi squared test

chi2, p, dof, ex = stats.chi2_contingency(pd.crosstab(train['labels'], train['frames']))

In [152]:
H0 = "There is no significant relationship between the labels and the frames"
H1 = "There is a significant relationship between the labels and the frames"

if p < 0.05:
    print(H1)
else:
    print(H0)

print(f"The values of the chi squared test are: chi2 = {chi2}, p = {p}, dof = {dof}")

There is a significant relationship between the labels and the frames
The values of the chi squared test are: chi2 = 1290420.859997719, p = 0.0, dof = 1179821


Above, we considered different combinations of labels as a unique label.
Let's see what happens when we explode all the comma seperated labels into seperate rows.

Example:

Exaggeration-Minimisation,Slogans | Morality,Fairness_and_equality

turns into:

Exaggeration-Minimisation | Morality  
Exaggeration-Minimisation | Fairness_and_equality  
Slogans                   | Morality  
Slogans                   | Fairness_and_equality  

In [153]:
train_exploded = train.copy()

train_exploded['labels'] = train_exploded['labels'].str.split(',')
train_exploded["frames"] = train_exploded["frames"].str.split(',')

train_exploded = train_exploded.explode('labels')
train_exploded = train_exploded.explode('frames')

train_exploded

Unnamed: 0,labels,frames
0,Doubt,Health_and_safety
0,Doubt,Quality_of_life
1,Appeal_to_Authority,Health_and_safety
1,Appeal_to_Authority,Quality_of_life
2,Repetition,Health_and_safety
...,...,...
1244,Consequential_Oversimplification,Political
1244,Consequential_Oversimplification,Security_and_defense
1244,Flag_Waving,Economic
1244,Flag_Waving,Political


Let's perform the chi squared test on the exploded dataset

In [154]:
chi2, p, dof, ex = stats.chi2_contingency(pd.crosstab(train_exploded['labels'], train_exploded['frames']))

H0 = "There is no significant relationship between the labels and the frames"
H1 = "There is a significant relationship between the labels and the frames"

if p < 0.05:
    print(H1)
else:
    print(H0)

print(f"The values of the chi squared test are: chi2 = {chi2}, p = {p}, dof = {dof}")

There is a significant relationship between the labels and the frames
The values of the chi squared test are: chi2 = 2131.474432096029, p = 5.175338807045749e-279, dof = 286


### Test data

In [155]:
test_exploded = test.copy()

test_exploded['labels'] = test_exploded['labels'].str.split(',')
test_exploded["frames"] = test_exploded["frames"].str.split(',')

test_exploded = test_exploded.explode('labels')
test_exploded = test_exploded.explode('frames')

test_exploded

Unnamed: 0,labels,frames
0,False_Dilemma-No_Choice,Political
0,False_Dilemma-No_Choice,External_regulation_and_reputation
0,False_Dilemma-No_Choice,Policy_prescription_and_evaluation
0,False_Dilemma-No_Choice,Legality_Constitutionality_and_jurisprudence
0,False_Dilemma-No_Choice,Economic
...,...,...
308,Loaded_Language,Security_and_defense
309,Doubt,Quality_of_life
309,Doubt,Security_and_defense
309,Loaded_Language,Quality_of_life


In [156]:
chi2, p, dof, ex = stats.chi2_contingency(pd.crosstab(test_exploded['labels'], test_exploded['frames']))

H0 = "There is no significant relationship between the labels and the frames"
H1 = "There is a significant relationship between the labels and the frames"

if p < 0.05:
    print(H1)
else:
    print(H0)

print(f"The values of the chi squared test are: chi2 = {chi2}, p = {p}, dof = {dof}")

There is a significant relationship between the labels and the frames
The values of the chi squared test are: chi2 = 942.5922277629127, p = 4.476645064059252e-71, dof = 286


By doing the chi squared test we have proven there is a significat relationship between the 2 sets of labels, thus making them a good candidate for trasnfer learning!