Another take on the analysis of utility questions. Performed to see if the results are consistent with the previous analysis.

In [34]:
import pandas as pd
import matplotlib.pyplot as plt
from scipy.stats import fisher_exact, chi2_contingency

In [35]:
anonymized_survey_data = pd.read_excel('data\\anonymized_data.xlsx')
election_results = pd.read_excel("data\\public_data_resultsT.xlsx")
election_df = pd.DataFrame(election_results)

(A) Is there a significant difference between the political preferences as expressed in the survey and the election results for both electronic and polling station votes?

In [40]:
party_counts = anonymized_survey_data['party'].value_counts()

observed = [[party_counts.get('Party A', 0), party_counts.get('Party B', 0)],
            [election_df['Red'][5], election_df['Green'][5]]] # Total votes for each party
print(f"Observed votes (contingency table):  {observed}")

odds_ratio, p = fisher_exact(observed)

if p < 0.05:
    print(f"p-value: {p} - unlikely to be due to chance")
else:
    print(f"p-value: {p} - likely to be due to chance")
print(f"Odds ratio: {odds_ratio}")

Observed votes (contingency table):  [[133, 61], [390, 685]]
p-value: 7.430799425311284e-17 - unlikely to be due to chance
Odds ratio: 3.8295502311895753


(B) Is there a significant difference between political preferences of the voters depending on their demographic attributes recorded in the survey (that is, age, gender, education level…)?

In [47]:
df = anonymized_survey_data


categorical_vars = ['sex', 'citizenship', 'marital_status']
for var in categorical_vars:
    crosstab = pd.crosstab(df['party'], df[var])
    # Drop invalid vote from crosstab
    crosstab = crosstab.drop('Invalid vote', axis=0)

    print(f"contingency table for party and {var}:")
    print(crosstab)

    odds_ratio, p = fisher_exact(crosstab)

    significance = "SIGNIFICANT" if p < 0.05 else "NOT SIGNIFICANT"
    print(f"p-value: {p} - party and {var} correlation is {significance}")
    print(f"Odds ratio: {odds_ratio}\n")

contingency table for party and sex:
sex      Female  Male
party                
Party A      57    76
Party B      40    21
p-value: 0.005157802698565871 - party and sex correlation is SIGNIFICANT
Odds ratio: 0.39375

contingency table for party and citizenship:
citizenship  Danish  Not Danish
party                          
Party A         123          10
Party B          52           9
p-value: 0.12536580649097923 - party and citizenship correlation is NOT SIGNIFICANT
Odds ratio: 2.128846153846154

contingency table for party and marital_status:
marital_status  Married  Not Married
party                               
Party A              69           64
Party B              40           21
p-value: 0.08720134467863372 - party and marital_status correlation is NOT SIGNIFICANT
Odds ratio: 0.566015625



(C) Is there a significant difference between voter’s choice of the voting channel (that is, if they decide to vote either online or in person) depending on their demographic attributes recorded in the survey?