This is a Data Science project looking into the Peer Review Taxonomy.
As open data sources it is looking at the reviews from F1000 and validating against the BMJ Open.
The project provides configuration for either using a Python virtual environment or Docker.
- Python 3
make dev-venv
make dev-test
make dev-jupyter-start
- Docker and Docker Compose
make jupyter-start
This will build the Jupyter image and and start it via Docker.
make test
The project is structured in the following directories:
- data: data downloaded from other sources
- LDA: Notebooks related to LDA
- notebooks: Notebooks related to preprocessing the text (sentence and token splitting)
- peertax: Python package used by the notebooks
- pickles: Intermediate output files
- scripts: Other scripts (to run LDA models in parallel)
- tests: Python tests for the peertax package