# Political Engagement  Univariate Analysis

This notebook contains univariate analysis for features we have determined to fall into the category of 'political engagement'. Within this category there are subtopics which have labeled along with the specific questions noted above the graphic output.

The first step will be to import the necessary tools and load the preprocessed dataset.


In [15]:
import joblib

import plotly.express as px

import pandas as pd

data = joblib.load('GroupedAndUngroupedData.pkl')

Next, we create a function which will output the necessary Histograms for each of our features. The function will operate as described below, but it is important to mention that it is designed to display a histogram based on the original feature data shaded by grouped data of the same feature created during data carpentry. This was done to ensure that appropriate groupings were created during the grouping process while also giving a sense of the feature distribution.

In [16]:
def ShowHistogram (data, col, sortorder='ascending', **kwargs):
    """
    Function to display histograms and percentage breakdowns for a specified column in the dataframe.
    Parameters:
        data                        -  dataframe 
        col                         -  column name within dataframe
        sortorder                   -  ascending or descending
        **kwargs                    -  optional additional arguments to pass into the histogram call. for example, adding
                                       facet_col = 'gender_Groups' will additionally facet the histogam by gender 
    """
    tickmode='array'
    ascending = sortorder == 'ascending'
    if 'facet_col' in kwargs.keys():
        data = data.sort_values([col,kwargs['facet_col']], ascending=ascending)
    else:
        data = data.sort_values([col], ascending=ascending)
    tickvals=data[col]
    ticktext=data[f'{col}_Groups']
    colorDict = {}
    for val in sorted(set(data[f'{col}_Groups'])):
        colorDict[val] = px.colors.qualitative.G10[len(colorDict)]
    p=px.histogram(data,x=col, color=ticktext, histnorm='density',color_discrete_map = colorDict, **kwargs)
    p=p.for_each_trace(lambda t: t.update(name=t.name.split('=')[1]))
    categoryorder = f'category {sortorder}'
    p=p.update_xaxes(type='category', categoryorder=categoryorder)
    if 'facet_col' in kwargs.keys():
        p = p.for_each_annotation(lambda a: a.update(text=a.text.split("=")[-1]))
    p.show()
    print ('Grouped Columns Distribution:\n')
    print (data[f'{col}_Groups'].value_counts(normalize=True)*100)
    print ('\n')
    print ('Ungrouped Columns Distribution:\n')
    print (data[f'{col}'].value_counts(normalize=True)*100)

We will also need to create a function for creating histograms for those features which do not have corresponding grouped feature columns.

In [53]:
def ShowHistogramUnivariate (data, col, sortorder='ascending', **kwargs):
    """
    Function to display histograms and percentage breakdowns for a specified column in the dataframe.
    Parameters:
        data                        -  dataframe 
        col                         -  column name within dataframe
        sortorder                   -  ascending or descending
        **kwargs                    -  optional additional arguments to pass into the histogram call. for example, adding
                                       facet_col = 'gender_Groups' will additionally facet the histogam by gender 
    """
    tickmode='array'
    ascending = sortorder == 'ascending'
    if 'facet_col' in kwargs.keys():
        data = data.sort_values([col,kwargs['facet_col']], ascending=ascending)
    else:
        data = data.sort_values([col], ascending=ascending)
    tickvals=data[col]
    colorDict = {}
    for val in sorted(set(data[f'{col}'])):
        colorDict[val] = px.colors.qualitative.G10[len(colorDict)]
    p=px.histogram(data,x=col, color=col, histnorm='density',color_discrete_map = colorDict, **kwargs)
    p=p.for_each_trace(lambda t: t.update(name=t.name.split('=')[1]))
    categoryorder = f'category {sortorder}'
    p=p.update_xaxes(type='category', categoryorder=categoryorder)
    if 'facet_col' in kwargs.keys():
        p = p.for_each_annotation(lambda a: a.update(text=a.text.split("=")[-1]))
    p.show()

    print ('Columns Distribution:\n')
    print (data[f'{col}'].value_counts(normalize=True)*100)

# Feature Distribution

Now we will utilize the function above to generate histograms of features with the topic area. Subtopic groupings for questions will be listed in bold and the specific survey questions will be listed in the notes along with any observations.

# Political Involvement

In [17]:
#Respondents were asked to indicate how much they agree/disagree with
#statements about getting involved with politics in this case:
#
#"It’s easier to influence politics by doing it yourself, rather than relying
#on organisations"
#
#This data is concentrated in the middle (nuetral) but skews slightly to
# the right (positive)

ShowHistogram (data,'politicsInfluenceSelf')

Grouped Columns Distribution:

neutral    38.192182
pos        33.469055
neg        22.475570
N/A         5.863192
Name: politicsInfluenceSelf_Groups, dtype: float64


Ungrouped Columns Distribution:

3     38.192182
4     24.267101
2     15.309446
5      9.201954
1      7.166124
99     5.863192
Name: politicsInfluenceSelf, dtype: float64


In [18]:
#Respondents were asked to indicate how much they agree/disagree with
#statements about getting involved with politics in this case:
#
#"People like me can best influence politics at the local level"
#
#This data skews right (positive)

ShowHistogram (data,'politicsInfluenceLocal')

Grouped Columns Distribution:

pos        39.657980
neutral    36.563518
neg        19.218241
N/A         4.560261
Name: politicsInfluenceLocal_Groups, dtype: float64


Ungrouped Columns Distribution:

3     36.563518
4     29.153094
2     10.749186
5     10.504886
1      8.469055
99     4.560261
Name: politicsInfluenceLocal, dtype: float64


In [19]:
#Respondents were asked to indicate how much they agree/disagree with
#statements about getting involved with politics in this case:
#
#"Politics only appeals to me if it is going to be fun"
#
#This data skews left (negative)

ShowHistogram (data,'politicsFunImportance')

Grouped Columns Distribution:

neg        37.052117
neutral    35.097720
pos        23.859935
N/A         3.990228
Name: politicsFunImportance_Groups, dtype: float64


Ungrouped Columns Distribution:

3     35.097720
2     24.267101
4     17.671010
1     12.785016
5      6.188925
99     3.990228
Name: politicsFunImportance, dtype: float64


In [20]:
#Respondents were asked to indicate how much they agree/disagree with
#statements about getting involved with politics in this case:
#
#"I would like to engage in politics occasionally, not as a long term commitment"
#
#This data skews right (positive)

ShowHistogram (data,'politicsEngagementOccassional')

Grouped Columns Distribution:

pos        39.739414
neutral    32.654723
neg        24.267101
N/A         3.338762
Name: politicsEngagementOccassional_Groups, dtype: float64


Ungrouped Columns Distribution:

3     32.654723
4     31.596091
2     13.599349
1     10.667752
5      8.143322
99     3.338762
Name: politicsEngagementOccassional, dtype: float64


In [21]:
#Respondents were asked to indicate how much they agree/disagree with
#statements about getting involved with politics in this case:
#
#"I would only get involved if it is for something that I believe will really
# change society"
#
#This data skews right (positive)

ShowHistogram (data,'politicsInvovledOnlyImportant')

Grouped Columns Distribution:

pos        61.482085
neutral    26.302932
neg         9.446254
N/A         2.768730
Name: politicsInvovledOnlyImportant_Groups, dtype: float64


Ungrouped Columns Distribution:

4     38.762215
3     26.302932
5     22.719870
2      5.700326
1      3.745928
99     2.768730
Name: politicsInvovledOnlyImportant, dtype: float64


# Statements About Self And Politics

In [22]:
#Respondents were asked to indicate how much they agree/disagree with
#statements that some people use to describe themselves, in this case:
#
#"I feel that I have a pretty good understanding of important political issues
# facing our country."
#
#This data skews right (positive)

ShowHistogram (data,'selfUnderstandPolitics')

Grouped Columns Distribution:

pos        48.371336
neutral    28.990228
neg        19.218241
N/A         3.420195
Name: selfUnderstandPolitics_Groups, dtype: float64


Ungrouped Columns Distribution:

4     36.563518
3     28.990228
2     13.517915
5     11.807818
1      5.700326
99     3.420195
Name: selfUnderstandPolitics, dtype: float64


In [23]:
#Respondents were asked to indicate how much they agree/disagree with
#statements that some people use to describe themselves, in this case:
#
#"I think that I am better informed about politics and government than most people."
#
#This data skews right (positive), with a slight bend towards the middle (neutral)


ShowHistogram (data,'selfMoreInvolvedPolitics')

Grouped Columns Distribution:

pos        32.410423
neutral    32.003257
neg        31.351792
N/A         4.234528
Name: selfMoreInvolvedPolitics_Groups, dtype: float64


Ungrouped Columns Distribution:

3     32.003257
4     23.208469
2     19.706840
1     11.644951
5      9.201954
99     4.234528
Name: selfMoreInvolvedPolitics, dtype: float64


In [24]:
#Respondents were asked to indicate how much they agree/disagree with
#statements that some people use to describe themselves, in this case:
#
#"Sometimes politics seems so complicated that people like me can’t understand
# what’s really going on."
#
#This data skews right (positive)

ShowHistogram (data,'selfPoliticsTooComplicated')

Grouped Columns Distribution:

pos        47.231270
neutral    27.442997
neg        22.312704
N/A         3.013029
Name: selfPoliticsTooComplicated_Groups, dtype: float64


Ungrouped Columns Distribution:

4     34.201954
3     27.442997
2     15.960912
5     13.029316
1      6.351792
99     3.013029
Name: selfPoliticsTooComplicated, dtype: float64


# The Internet And Political Information

In [25]:
#Respondents were asked to indicate how much they agree/disagree with
#statements regarding the use of the internet for information about
#politics and social issues, in this case:
#
#"I consider myself skillful in using the Internet to search for information on
# politics and issues that I care about."
#
#This data skews right (positive)

ShowHistogram (data,'selfInternetFindPoliticalInfo')

Grouped Columns Distribution:

pos        63.436482
neutral    23.208469
neg         9.771987
N/A         3.583062
Name: selfInternetFindPoliticalInfo_Groups, dtype: float64


Ungrouped Columns Distribution:

4     39.169381
5     24.267101
3     23.208469
2      6.596091
99     3.583062
1      3.175896
Name: selfInternetFindPoliticalInfo, dtype: float64


In [26]:
#Respondents were asked to indicate how much they agree/disagree with
#statements regarding the use of the internet for information about
#politics and social issues, in this case:
#
#"I am comfortable with my ability to discuss politics using the Internet."
#
#This data skews right (positive)

ShowHistogram (data,'selfInternetDiscussPolitics')

Grouped Columns Distribution:

pos        46.335505
neutral    29.885993
neg        19.869707
N/A         3.908795
Name: selfInternetDiscussPolitics_Groups, dtype: float64


Ungrouped Columns Distribution:

4     32.003257
3     29.885993
5     14.332248
2     14.087948
1      5.781759
99     3.908795
Name: selfInternetDiscussPolitics, dtype: float64


In [27]:
#Respondents were asked to indicate how much they agree/disagree with
#statements regarding the use of the internet for information about
#politics and social issues, in this case:
#
#"If I am concerned about a particular issue, I feel confident that I could use
# the Internet to express my concerns."
#
#This data skews right (positive)

ShowHistogram (data,'selfInternetExpressConcern')

Grouped Columns Distribution:

pos        53.175896
neutral    27.280130
neg        15.798046
N/A         3.745928
Name: selfInternetExpressConcern_Groups, dtype: float64


Ungrouped Columns Distribution:

4     35.912052
3     27.280130
5     17.263844
2     11.644951
1      4.153094
99     3.745928
Name: selfInternetExpressConcern, dtype: float64


# Involvement In Political Issues

In [28]:
#Respondents were asked to indicate wether or not they had engaged
#in any of the listed activities and how(online/offline), in this case:
#
#"Contacted an elected leader or government organization"
#
#This data skews left (no)

ShowHistogram (data,'pyContacedPolitican')

Grouped Columns Distribution:

no         73.127036
online     19.788274
offline     7.084691
Name: pyContacedPolitican_Groups, dtype: float64


Ungrouped Columns Distribution:

1    73.127036
4    10.749186
2     9.039088
3     7.084691
Name: pyContacedPolitican, dtype: float64


In [29]:
#Respondents were asked to indicate wether or not they had engaged
#in any of the listed activities and how(online/offline), in this case:
#
#"Written to a newspaper, or commented on a news organization’s website"
#
#This data skews left (no)

ShowHistogram (data,'pyCommentedNews')

Grouped Columns Distribution:

no         80.700326
online     13.599349
offline     5.700326
Name: pyCommentedNews_Groups, dtype: float64


Ungrouped Columns Distribution:

1    80.700326
2     7.410423
4     6.188925
3     5.700326
Name: pyCommentedNews, dtype: float64


In [30]:
#Respondents were asked to indicate wether or not they had engaged
#in any of the listed activities and how(online/offline), in this case:
#
#"Voted"
#
#This data skews right (yes/offline)

ShowHistogram (data,'pyVoted')

Grouped Columns Distribution:

offline    47.964169
no         37.377850
online     14.657980
Name: pyVoted_Groups, dtype: float64


Ungrouped Columns Distribution:

3    47.964169
1    37.377850
2    11.237785
4     3.420195
Name: pyVoted, dtype: float64


In [31]:
#Respondents were asked to indicate wether or not they had engaged
#in any of the listed activities and how(online/offline), in this case:
#
#"Tried to encourage others how to vote in an election"
#
#This data skews left (no)

ShowHistogram (data,'pyEncourageVote')

Grouped Columns Distribution:

no         59.201954
online     25.000000
offline    15.798046
Name: pyEncourageVote_Groups, dtype: float64


Ungrouped Columns Distribution:

1    59.201954
2    19.951140
3    15.798046
4     5.048860
Name: pyEncourageVote, dtype: float64


In [32]:
#Respondents were asked to indicate wether or not they had engaged
#in any of the listed activities and how(online/offline), in this case:
#
#"Signed a petition related to a political or social cause"
#
#This data skews left (no)

ShowHistogram (data,'pySignedPetition')

Grouped Columns Distribution:

no         53.664495
online     33.387622
offline    12.947883
Name: pySignedPetition_Groups, dtype: float64


Ungrouped Columns Distribution:

1    53.664495
4    19.706840
2    13.680782
3    12.947883
Name: pySignedPetition, dtype: float64


In [33]:
#Respondents were asked to indicate wether or not they had engaged
#in any of the listed activities and how(online/offline), in this case:
#
#"Attended a demonstration or rally"
#
#This data skews left (no)

ShowHistogram (data,'pyAttendedRally')

Grouped Columns Distribution:

no         79.071661
offline    11.726384
online      9.201954
Name: pyAttendedRally_Groups, dtype: float64


Ungrouped Columns Distribution:

1    79.071661
3    11.726384
2     6.596091
4     2.605863
Name: pyAttendedRally, dtype: float64


In [34]:
#Respondents were asked to indicate wether or not they had engaged
#in any of the listed activities and how(online/offline), in this case:
#
#"Discussed politics with friends or family"
#
#This data skews right (yes/offline)

ShowHistogram (data,'pyDiscussedPoliticsFriendsFamily')

Grouped Columns Distribution:

offline    38.517915
online     35.097720
no         26.384365
Name: pyDiscussedPoliticsFriendsFamily_Groups, dtype: float64


Ungrouped Columns Distribution:

3    38.517915
2    31.433225
1    26.384365
4     3.664495
Name: pyDiscussedPoliticsFriendsFamily, dtype: float64


# Political Discussions

In [35]:
#Respondents were asked to indicate how often they had engaged
#in dicussions about politics or social issues with various people,in this case:
#
#"Your friends"
#
#This data skews right (rarely)

ShowHistogram (data,'discussPoliticsFriends')

Grouped Columns Distribution:

Rarely         30.700326
1-2PerMonth    21.172638
1-2PerWeek     17.833876
Never          15.553746
3-4PerWeek      9.039088
EveryDay        5.700326
Name: discussPoliticsFriends_Groups, dtype: float64


Ungrouped Columns Distribution:

5    30.700326
4    21.172638
3    17.833876
6    15.553746
2     9.039088
1     5.700326
Name: discussPoliticsFriends, dtype: float64


In [36]:
#Respondents were asked to indicate how often they had engaged
#in dicussions about politics or social issues with various people,in this case:
#
#"Your family"
#
#This data skews right (rarely)

ShowHistogram (data,'discussPoliticsFamily')

Grouped Columns Distribution:

Rarely         28.827362
1-2PerMonth    25.162866
1-2PerWeek     17.752443
Never          12.947883
3-4PerWeek      9.934853
EveryDay        5.374593
Name: discussPoliticsFamily_Groups, dtype: float64


Ungrouped Columns Distribution:

5    28.827362
4    25.162866
3    17.752443
6    12.947883
2     9.934853
1     5.374593
Name: discussPoliticsFamily, dtype: float64


In [37]:
#Respondents were asked to indicate how often they had engaged
#in dicussions about politics or social issues with various people,in this case:
#
#"Other people (e.g., coworkers, acquaintances, classmates or peers)"
#
#This data skews right (rarely)

ShowHistogram (data,'discussPoliticsOthers')

Grouped Columns Distribution:

Rarely         32.247557
Never          22.312704
1-2PerMonth    19.218241
1-2PerWeek     16.368078
3-4PerWeek      6.758958
EveryDay        3.094463
Name: discussPoliticsOthers_Groups, dtype: float64


Ungrouped Columns Distribution:

5    32.247557
6    22.312704
4    19.218241
3    16.368078
2     6.758958
1     3.094463
Name: discussPoliticsOthers, dtype: float64


# How Interested In Politics Are You?

In [38]:
#Respondents were asked to indicate how interest in politics they are
#in this case:
#
#"Some people are interested in politics all the time, even when there isn’t an
# election going on. Thinking about yourself, how interested in politics would 
# you say that you are?"
#
#This data skews left (Low Interest)

ShowHistogram (data,'politicalInterest')

Grouped Columns Distribution:

LowInterest     54.967427
HighInterest    23.859935
NoInterest      21.172638
Name: politicalInterest_Groups, dtype: float64


Ungrouped Columns Distribution:

3    28.827362
2    26.140065
1    21.172638
4    15.716612
5     8.143322
Name: politicalInterest, dtype: float64


# Group Political Activities

In [56]:
#Respondents were asked wether they had engaged in a list of group-based
#activities, in this case:
#
#"Joined, worked on, or volunteered for, a group taking a stand on political
# issues"
#
#0=No
#1=Yes
#
#This data skews left (No)

ShowHistogramUnivariate(data,'groupPoliticalIssue')

Columns Distribution:

0    83.957655
1    16.042345
Name: groupPoliticalIssue, dtype: float64


In [57]:
#Respondents were asked wether they had engaged in a list of group-based
#activities, in this case:
#
#"Joined, worked on, or volunteered for, a nonpolitical or charitable group"
#
#0=No
#1=Yes
#
#This data skews left (No)

ShowHistogramUnivariate(data,'groupSocialIssue')

Columns Distribution:

0    60.586319
1    39.413681
Name: groupSocialIssue, dtype: float64


In [58]:
#Respondents were asked wether they had engaged in a list of group-based
#activities, in this case:
#
#"Worked for or volunteered on a local community project"
#
#0=No
#1=Yes
#
#This data skews left (No)

ShowHistogramUnivariate(data,'groupLocalCommunity')

Columns Distribution:

0    65.716612
1    34.283388
Name: groupLocalCommunity, dtype: float64


In [59]:
#Respondents were asked wether they had engaged in a list of group-based
#activities, in this case:
#
#"Worked or volunteered for political groups or candidates"
#
#0=No
#1=Yes
#
#This data skews left (No)

ShowHistogramUnivariate(data,'groupPolitican')

Columns Distribution:

0    87.62215
1    12.37785
Name: groupPolitican, dtype: float64


In [61]:
#Respondents were asked wether they had engaged in a list of group-based
#activities, in this case:
#
#"Worked or volunteered with an election campaign"
#
#0=No
#1=Yes
#
#This data skews left (No)

ShowHistogramUnivariate (data,'groupElectionCampaign')

Columns Distribution:

0    88.680782
1    11.319218
Name: groupElectionCampaign, dtype: float64


# Focus Of Political Activity:(Local,National,International)

In [62]:
#Respondents were asked wether the previously mentioned activities had a 
#particular focus, in this case:
#
#"Local issues?"
#
#0=No
#1=Yes
#
#This data skews right (Yes)

ShowHistogramUnivariate (data,'groupFocusLocal')

Columns Distribution:

1    57.003257
0    30.211726
     12.785016
Name: groupFocusLocal, dtype: float64


In [63]:
#Respondents were asked wether the previously mentioned activities had a 
#particular focus, in this case:
#
#"National issues?"
#
#0=No
#1=Yes
#
#This data skews right (Yes)

ShowHistogramUnivariate (data,'groupFocusNational')

Columns Distribution:

1    55.944625
0    31.270358
     12.785016
Name: groupFocusNational, dtype: float64


In [64]:
#Respondents were asked wether the previously mentioned activities had a 
#particular focus, in this case:
#
#"International / Global issues?"
#
#0=No
#1=Yes
#
#This data skews left (No)

ShowHistogramUnivariate (data,'groupFocusInternational')

Columns Distribution:

0    47.394137
1    39.820847
     12.785016
Name: groupFocusInternational, dtype: float64
