This is an example project for the Good Research Code Handbook. It's a reinterpretation of the Zipf's law project from Research Software Engineering in Python. It reuses and modifies some of the code from the original project, which was licensed under a CC-BY license. For this reason, this repo is under a CC-BY 4.0 license.
Make a copy of this repo (e.g. with git clone), cd into the root folder of the repo, and run:
pip install -e .
The project is organized into folders:
zipfcontains the main module code that runs the analysisscriptscontains scripts to glue the module codetestscontains tsts of the module codedatacontains the data for the analysisresultswill contain the output of the analysis
cd into the scripts folder and run run_analysis.py via:
python run_analysis.py --in_folder ../data --out_folder ../results
You can then load up visualize_results.ipynb in jupyter to visualize the results.
cd into the tests folder and run pytest.
I've pre-populated the data folder with these books from Project Gutenberg:
- Dracula →
data/dracula.txt - Frankenstein →
data/frankenstein.txt - Jane Eyre →
data/jane_eyre.txt
You can add more documents to the folder as you wish.