Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

05 eda twitter + text analysis + sentiment analysis #17

Open
wants to merge 6 commits into
base: dev
Choose a base branch
from

Conversation

sofiapinto
Copy link
Contributor

@sofiapinto sofiapinto commented May 8, 2024


Description

Hey Liz,

Here are the scripts I created to analyse Twitter data on HPs and boilers.
Coses #5 #11

Analysis scripts:

  • asf_online_data_exploration/analysis/twitter/00.prep_for_processing_data/Processing Twitter data.py - if you have any questions about the processing pipeline, this notebook is a step by step version in a notebook.
  • asf_online_data_exploration/analysis/twitter/01.exploring_data/Exploring data tables.py - quick notebook to explore all tables I create from the original Twitter data, together with some summary stats using YData Profiling.
  • asf_online_data_exploration/analysis/twitter/02.EDA_and_text_analysis/01. Analysis users.py - notebook to explore user data; this is where I apply the geocoding of user location (i then save it and analyse it in a separate notebook)
  • asf_online_data_exploration/analysis/twitter/02.EDA_and_text_analysis/02. Analysis tweets.py - mostly time trends and distributions
  • asf_online_data_exploration/analysis/twitter/02.EDA_and_text_analysis/04. HP tweets - text analysis.py - top words,. hashtags, bigrams, trigrams etc for hp data;
  • asf_online_data_exploration/analysis/twitter/02.EDA_and_text_analysis/05. HP tweets - users.py - here is where i analyse the geocoded data;
  • asf_online_data_exploration/analysis/twitter/03.sentiment_analysis - where the sentiment analysis scripts live. I used VADER, so not great;

There are a few other scripts, but less relevant for you i think. There's a description at the top of each one.

Getters:

  • asf_online_data_exploration/getters/twitter.py - i think you can use these for your data too!

Utils:

  • Utils live here: asf_online_data_exploration/utils/ - the text analysis ones are very rudimentary, but they do the basic important stuff such as removing mentions and urls, tokenizing etc

Checklist:

  • I have refactored my code out from notebooks/
  • I have checked the code runs
  • I have tested the code
  • I have run pre-commit and addressed any issues not automatically fixed
  • I have merged any new changes from dev
  • I have documented the code
    • Major functions have docstrings
    • Appropriate information has been added to READMEs
  • I have explained this PR above
  • I have requested a code review

@sofiapinto sofiapinto self-assigned this May 8, 2024
This was linked to issues May 8, 2024
@sofiapinto sofiapinto marked this pull request as ready for review May 8, 2024 12:20
@sofiapinto sofiapinto requested a review from lizgzil May 8, 2024 12:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Sentiment Analysis EDA Twitter
1 participant