# Analysis of Information Habits

This notebook explores how participants consume and trust different sources of political information.

We analyze both:
- Frequency of use (CH01_01 to CH01_05)
- Trust in different sources (CH02_01 to CH02_05)

### Load preprocessed data

To start, we'll import all relevant libraries needed to test this hypothesis and load the data.

In [1]:
from data_preprocessing import load_and_preprocess

# Load data
df = load_and_preprocess('data/cleaned_file.csv')

### Descriptive Stats of media consumption

The items CH01_01, CH01_02, CH01_03, CH01_04, CH01_05 asked the participants about their information habits. Namely, how frequently they've consumed political content through:
- CH01_01: friends/family
- CH01_02: social media
- CH01_03: news website
- CH01_04: television
- CH01_05: newspaper

Before calculating the mean, the columns will be renamed for clarity.

In [2]:
df= df.rename(columns={"CH01_01": "freq_friends_family"})
df= df.rename(columns={"CH01_02": "freq_social_media"})
df= df.rename(columns={"CH01_03": "freq_news_website"})
df= df.rename(columns={"CH01_04": "freq_television"})
df= df.rename(columns={"CH01_05": "freq_newspaper"})

In [6]:
# Media usage frequency
freq_cols = ['freq_friends_family', 'freq_social_media', 'freq_news_website', 'freq_television', 'freq_newspaper']
print("Mean frequency of political information consumption:")
print(df[freq_cols].mean())

Mean frequency of political information consumption:
freq_friends_family    2.571429
freq_social_media      2.982143
freq_news_website      2.410714
freq_television        1.857143
freq_newspaper         1.714286
dtype: float64


### Analyze trust in media sources

The items CH02_01, CH02_02, CH02_03, CH02_04, CH02_05 asked the participants about their information habits. Namely, how much trust they put into the following media sources:
- CH02_01: public news broadcaster
- CH02_02: private news broadcaster
- CH02_03: social media
- CH02_04: online news
- CH02_05: friends/family

Before calculating the mean, the columns will be renamed for clarity.

In [7]:
df= df.rename(columns={"CH02_01": "trust_public_news"})
df= df.rename(columns={"CH02_02": "trust_private_news"})
df= df.rename(columns={"CH02_03": "trust_social_media"})
df= df.rename(columns={"CH02_04": "trust_online_news"})
df= df.rename(columns={"CH02_05": "trust_friends_family"})

In [9]:
# Trusted sources (checkboxes)
trust_cols = ['trust_public_news', 'trust_private_news', 'trust_social_media', 'trust_online_news', 'trust_friends_family']
print("Trust in political information sources (number of mentions):")
print(df[trust_cols].sum().sort_values(ascending=False))

Trust in political information sources (number of mentions):
trust_public_news       104.0
trust_online_news        86.0
trust_friends_family     71.0
trust_social_media       70.0
trust_private_news       59.0
dtype: float64


### Conclusion

Social media and personal networks were the most frequently used sources, but traditional media (especially public broadcasters and online newspapers) were most trusted.