05 eda twitter + text analysis + sentiment analysis #17
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Hey Liz,
Here are the scripts I created to analyse Twitter data on HPs and boilers.
Coses #5 #11
Analysis scripts:
asf_online_data_exploration/analysis/twitter/00.prep_for_processing_data/Processing Twitter data.py
- if you have any questions about the processing pipeline, this notebook is a step by step version in a notebook.asf_online_data_exploration/analysis/twitter/01.exploring_data/Exploring data tables.py
- quick notebook to explore all tables I create from the original Twitter data, together with some summary stats using YData Profiling.asf_online_data_exploration/analysis/twitter/02.EDA_and_text_analysis/01. Analysis users.py
- notebook to explore user data; this is where I apply the geocoding of user location (i then save it and analyse it in a separate notebook)asf_online_data_exploration/analysis/twitter/02.EDA_and_text_analysis/02. Analysis tweets.py
- mostly time trends and distributionsasf_online_data_exploration/analysis/twitter/02.EDA_and_text_analysis/04. HP tweets - text analysis.py
- top words,. hashtags, bigrams, trigrams etc for hp data;asf_online_data_exploration/analysis/twitter/02.EDA_and_text_analysis/05. HP tweets - users.py
- here is where i analyse the geocoded data;asf_online_data_exploration/analysis/twitter/03.sentiment_analysis
- where the sentiment analysis scripts live. I used VADER, so not great;There are a few other scripts, but less relevant for you i think. There's a description at the top of each one.
Getters:
asf_online_data_exploration/getters/twitter.py
- i think you can use these for your data too!Utils:
asf_online_data_exploration/utils/
- the text analysis ones are very rudimentary, but they do the basic important stuff such as removing mentions and urls, tokenizing etcChecklist:
notebooks/
pre-commit
and addressed any issues not automatically fixeddev
README
s