congresstweets-analysis

Analysing the tweets of U.S. congress members (2017-2023) in relation to political affiliation and current issues, by considering the sentiment, frequency, and trends of related statements to understand the priorities and features of the two parties.

Link to Analysis: https://html-preview.github.io/?url=https://github.com/kennethkn/congresstweets-analysis/blob/main/analysis.html

Project Description

This project aims to foster a boarder understanding of the bipartisan U.S. politics by analyzing the tweets of U.S. congress members, including democratic and republican senators and representatives. The importance of it lies in the potential to reveal patterns and trends in the political discourse of recent years. Understanding these patterns can provide insights into the priorities and strategies of the two parties. Additionally, analyzing the sentiment of tweets can reveal their stance on current issues.

The dataset available for this project is from the GitHub repository congresstweets. It contains a comprehensive collection of tweets from U.S. congress members since 2017, making it a rich resource for a diverse analysis.

Questions to Answer

What is the trend of the most common words/hashtags used by democratic and republican congress members in their tweets?
What are the sentiments of tweets by democratic and republican congress members on significant issues such as COVID-19, climate change, abortion, gun control, and etc?

Methodology

Given the enormous size of the dataset (~4M entries), I have chosen a database approach to store and query the data. The database is hosted locally on my computer via PostgresSQL, but you can reproduce the database by executing Python scripts in the scripts folder, which holds scripts for database construction as well as text mining.

Ready PostgreSQL server (brew install postgresql && brew services start postgresql if you are using macOS)
Create a database named congresstweets (createdb congresstweets)
Clone the repository
Notice the empty data/tweets/ folder. You need to download the tweets data from the congresstweets repo, as well as here for older 2017 data. Place the downloaded json files (eg 2020-03-24.json) in the data/tweets/ folder.
Setup venv and activate it (python -m venv venv && source venv/bin/activate)
Install the required packages (pip install -r requirements.txt)
Open .env and replace YOUR_USERNAME with your PostgreSQL username. (DATABASE_URL=postgresql://YOUR_USERNAME@localhost:5432/congresstweets)
Run models.py to create the tables.
Run db_insert_members.py to populate the members table in the database.
Run db_insert_tweets.py to populate the tweets table in the database.
Run text_mining.py to populate columns pertaining to text mining results in the tweets table.
Open analysis.rmd in RStudio and knit the file to generate the analysis.

Tweet Count by Party and Year
Tweet Count by Chamber and Year
Top Tweeters by Year
Top Hashtags
Top Hashtags by Party
Top Hashtags by Party and Year
Top Hashtags by Chamber
Top Words
Top Words by Party
Top Words by Party and Year
Top Words by Chamber
Sentiment Analysis by Party and Year
Sentiment Analysis by Chamber and Year
Sentiment Analysis by Topic and Party
Top Accounts Retweeted
Top Accounts Retweeted by Party
Top Accounts Quoted
Top Accounts Quoted by Party
Top Accounts Mentioned
Top Accounts Mentioned by Party

Room for Improvement

Use of BERT or GPT to infer topics from tweets.
Even more categories, such as top words by chamber and year, sentiment of tweets by topic and chamber, etc.

Citation

Major credits to Alex Litel for providing the dataset. https://github.com/alexlitel/congresstweets

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

congresstweets-analysis

Project Description

Questions to Answer

Methodology

Table of Contents

Room for Improvement

Citation

About

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
data		data
scripts		scripts
.env		.env
.gitignore		.gitignore
README.md		README.md
analysis.html		analysis.html
analysis.rmd		analysis.rmd
requirements.txt		requirements.txt

kennethkn/congresstweets-analysis

Folders and files

Latest commit

History

Repository files navigation

congresstweets-analysis

Project Description

Questions to Answer

Methodology

Table of Contents

Room for Improvement

Citation

About

Topics

Resources

Stars

Watchers

Forks

Languages