Skip to content

Commit

Permalink
Merge pull request #52 from elshimone/add_grobid_concurrency_to_readme
Browse files Browse the repository at this point in the history
Added note on grobid concurrency configuration to README.
  • Loading branch information
davidmezzetti committed Dec 3, 2023
2 parents 1a09f28 + 36397dd commit 88119cc
Showing 1 changed file with 2 additions and 0 deletions.
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,8 @@ necessary for PDF files.
- [GROBID install instructions](https://grobid.readthedocs.io/en/latest/Install-Grobid/)
- [GROBID start service](https://grobid.readthedocs.io/en/latest/Grobid-service/)

Note the concurrency setting for the GROBID service is 10. Depending on the number of CPUs in your system, this may cause paperetl to exhaust the GROBID engine pool, resulting in a 503 service unable error response when parsing PDFs. You can avoid this by increasing the concurrency setting in the GROBID configuration file as described in this [section](https://grobid.readthedocs.io/en/latest/Configuration/#service-configuration) of the documentation.

### Docker

A Dockerfile with commands to install paperetl, all dependencies and scripts is available in this repository.
Expand Down

0 comments on commit 88119cc

Please sign in to comment.