Rehearser is an application for aiding in the reading of research papers.
(Image includes text from the "Demo Paper" [1])
To use the application, the only requirement is Docker and Docker Compose.
To run the server, execute the command docker-compose up -d in the root directory.
Once the server is running, you can navigate the application by going to localhost:8000. Note, the applicaiton needs ports 8000 (used by http-server) and 5000 (used by flask). If they are used elsewhere, the application will not run.
- Scrape a research paper to extract the narrational text.
- Convert scraped text to audio with segmented text-audio pairs for the original text.
- Provide a minimalistic front end for uploading documents and downloading text, and viewing the text-aligned audio with minimal playback controls.
The application uses a few key components:
- GROBID - The current library for parsing research papers. Docker Compose starts a GROBID docker image which PDFs are sent to for extraction into XML.
- BeautifulSoup - Used for parsing the XML generated by GROBID to find the parts of the research paper which are most relevant.
- ESPNet - Framework used for executing TTS functionality in python, currently used to convert text chunks into audio files.
- ljspeech_fastspeech2 - A FastSpeech2 model trained on the ljspeech dataset, the model used for speech conversion.
- PyTorch - Machine learning framework responsible for running the ESPNet models.
- nltk - Natural language processing framework necessary for grapheme to phoneme conversion in TTS.
- Flask - API Framework used to host endpoints in python.
- SQLite3 - Lightweight database used to store meta information and proessing status on uploaded papers.
- Celery - Asynchronous task queue that is used to queue paper processing requests and run them asyncronously.
- Redis - In memory data store that is used to store tasks for Celery workers.
- http-server - Static HTTP server used for serving frontend applicaiton.
This tool is still in early iterations. Please exercise caution with use.
-
The parsing may end up missing text (eg, a few lines at the top of a page) or adding undesired text (eg, such as the contents of a figure).
-
Citations are currently missing from the extracted text and audio.
The "demo paper" used in this project is:
[1] Meuschke, Norman, et al. "A benchmark of pdf information extraction tools using a multi-task and multi-domain evaluation framework for academic documents." International Conference on Information. Cham: Springer Nature Switzerland, 2023.
