-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EDA + initial text analysis for MSE #11
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
3 tasks
sofiapinto
commented
Dec 1, 2023
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does not require review
sofiapinto
commented
Dec 1, 2023
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does not require review
helloaidank
reviewed
Dec 19, 2023
asf_public_discourse_home_decarbonisation/pipeline/data_processing_flows/flow_utils.py
Outdated
Show resolved
Hide resolved
…ls to the flow utils
…e_category_data.py Removing EDA notebook as it's out of date
…l_text_analysis_category_data.py Removing text analysis notebook as it's out of date
Renaming flow utils
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This PR adds scripts to perform EDA (exploratory data analysis) and initial text analysis (looking at top words and ngrams) for Money Saving Expert data.
Closes #1
Closes #5
Instructions for Reviewer
Setup
In order to test the code in this PR you need to:
git clone git@github.com:nestauk/asf_public_discourse_home_decarbonisation.git
git checkout 01_initial_analysis_mse
make install
;direnv allow
;conda activate asf_public_discourse_web_scraping
;Review
Hey @helloaidank and @lizgzil, thanks a lot for taking the time to review this PR. @crispy-wonton, I've also tagged you as you said you'd like to take a look - feel free to take as much or a little time to look at this, but really appreciate you taking the time.
Scripts to be reviewed:
There are a couple of scripts to be reviewed in this PR.
asf_public_discourse_home_decarbonisation/getters/getter_utils.py
asf_public_discourse_home_decarbonisation/getters/mse_getters.py
asf_public_discourse_home_decarbonisation/utils/plotting_utils.py
asf_public_discourse_home_decarbonisation/utils/text_processing_utils.py
asf_public_discourse_home_decarbonisation/analysis/mse/eda_mse_category_data.py
asf_public_discourse_home_decarbonisation/analysis/mse/initial_text_analysis_category_data.py
Note that the two files in
notebooks/
do not need to be reviewed. They serve only as a helper: you can open the notebooks and run them if any of the steps in analysis scripts does not make sense and you want to take a look at them. To open the files as a notebooks follow the instructions below (also present at the top of the notebook files):- Run
jupytext --to notebook asf_public_discourse_home_decarbonisation/notebooks/mse/name_of_notebook.py
- If the correct kernel does not come up (
asf_public_discourse_home_decarbonisation
), please run the following in your terminal:python -m ipykernel install --user --name=asf_public_discourse_home_decarbonisation
Code to run
Could you also please run:
python asf_public_discourse_home_decarbonisation/analysis/mse/eda_mse_category_data.py
python asf_public_discourse_home_decarbonisation/analysis/mse/initial_text_analysis_category_data.py
and let me know if it runs smoothly.
Things to pay special attention to:
Note that
Thanks a lot in advance!
Checklist:
notebooks/
pre-commit
and addressed any issues not automatically fixeddev
README
s