Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error #74

Open
NeoH2333 opened this issue Apr 13, 2024 · 3 comments
Open

Error #74

NeoH2333 opened this issue Apr 13, 2024 · 3 comments

Comments

@NeoH2333
Copy link

Hello

I trust you are all well. I've been encountering an error for the past few days while attempting to process full text from a batch using the Python client. Despite my efforts, the error persists. My system specifications include an Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz with 8GB RAM. I've tried adjusting parameters such as concurrency in the grobid.yaml file, but unfortunately, this hasn't resolved the issue. I'm reaching out to see if there are any additional steps I can take to address this problem. Thank you for your assistance.

ERROR [2024-04-13 20:31:33,322] org.grobid.service.process.GrobidRestProcessFiles: Could not get an engine from the pool within configured time. Sending service unavailable.

@lfoppiano
Copy link
Collaborator

Hi @NeoH2333,
the default config of the client config.json file uses a batch_size of 100 which is too big. This number should be consistent with the number in the grobid.yaml.

If this does not solve the problem, could you share more information, including both config.json and grobid.yaml files?

@kermitt2
Copy link
Owner

Hello !

@NeoH2333 8GB is not enough for applying processFulltextDocument on more than one PDF at the same time in a safe manner, especially if you are using Deep Learning models on CPU only. Consider using 16GB if possible. Otherwise, set the --n argument of the client side to 1.

@lfoppiano batch_size is only for managing the acquisition of files by the ThreadPoolExecutor, it is not related to the server load or concurrency in grobid.yaml, it can stay at 100 or 1000 without any impact on the server (it will use just a bit more memory at client side to store the list of paths to the pdf).

@lfoppiano
Copy link
Collaborator

Ahh, sorry indeed, the batch_size does not impact the number of concurrent requests... 🙏
@NeoH2333 ignore my comment please.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants