Fast scanning ADF with long post-processing steps will consume all resources #4

jarrodsfarrell · 2019-05-31T17:54:08Z

Since every page will spawn a new instance of the scan_perpage script (unless verbose logging is enabled) and if the scanner is scanning pages rapidly, it'll spawn too many processes and consume all resources as a result.

Perhaps should limit the amount of scripts to as many CPU cores the host has.

The text was updated successfully, but these errors were encountered:

rocketraman · 2019-06-01T04:07:15Z

Yup, that was the intended behavior to parallelize the processing. Has running out of resources actually been an issue for you, or is this more of an academic concern? I find it difficult to believe a scanner could scan pages fast enough to cause a problem.

jarrodsfarrell · 2019-06-01T05:03:33Z

Yeah. We have a Fujitsu that can scan upto 60PPM. I was doing some testing on a laptop with the scanner on duplex, producing 78~ pages, and it'd spawn an absurd amount of tesseract processes to consume 2/3rds of the laptop's 16GB of RAM, kept CPU pegged at 100%, and all tesseract processes working at a crawl.

rocketraman · 2019-06-01T06:15:31Z

Nice scanner :-) Ok, good thing to fix.

rocketraman · 2019-08-06T19:21:42Z

@jarrodsfarrell Probably the easiest way I've found to do this is to use sem from the GNU parallel project, but it will introduce another (optional) dependency. Its widely available so I don't have a problem with adding this, but would that work for your situation?

jarrodsfarrell · 2019-08-06T21:19:29Z

Taking a look into the project's man page it seems perfectly fine to use and a non-issue to have another dependency.

rocketraman · 2019-08-08T14:34:36Z

@jarrodsfarrell Can you grab the changes in pull #5 and see if that solves your problem? If it works for you, I'll merge it.

jarrodsfarrell · 2019-08-08T17:10:12Z

Currently at work, but I will give it a try whenever I can.

jarrodsfarrell · 2019-08-08T21:31:33Z

Unfortunately we don't have the 60PPM like before so I'm using a 25PPM model instead.

Regardless, it seems like using sem is a overall good change. I think it's even letting the OCRing step work a bit faster than running all the tesseract processes all at once (less task-switching?) and pauses between scans are noticeably more brief (scan process doesn't have to fight as much for resources?). Additional bonus is that the movement of the console is a good indicator that work is still being done instead of staying still until the tesseract processes begin quiting.

~~Anyways, should the last argument be erroring like this?~~

USER@HOST:~/Workspace/sane-scan-pdf$ ./scan -d -m color --crop --deskew --ocr out.pdf
Unknown argument: out.pdf

Nevermind. It'd help if I read the documentation.

rocketraman · 2019-08-09T00:55:53Z

Thanks for reporting and testing. I'll merge this.

MoD01 · 2020-04-05T07:03:59Z

Has running out of resources actually been an issue for you, or is this more of an academic concern?

I use my Raspberry Pi 4 because my Scansnap has not WebDAV or FTP feature. The resources of the pi runs out very quickly.

@rocketraman Can you please add sem as additional requirement in the readme ? The lack of this information cost me some time to debug the bottleneg - until I found this closed ticket here telling my the if sem is installed: solve problem code insertion :)

rocketraman · 2020-04-05T07:23:23Z

@rocketraman Can you please add sem as additional requirement in the readme ? The lack of this information cost me some time to debug the bottleneg - until I found this closed ticket here telling my the if sem is installed: solve problem code insertion :)

It's already listed under optional requirements, but perhaps this issue deserves a more extensive call out.

rocketraman · 2020-04-05T21:16:54Z

@MoD01 I added an explanatory line in features for future people in your situation...

rocketraman self-assigned this Aug 6, 2019

rocketraman mentioned this issue Aug 8, 2019

Constrain per page processing CPU usage #5

Merged

rocketraman closed this as completed Aug 9, 2019

rocketraman added the enhancement label Apr 5, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fast scanning ADF with long post-processing steps will consume all resources #4

Fast scanning ADF with long post-processing steps will consume all resources #4

jarrodsfarrell commented May 31, 2019

rocketraman commented Jun 1, 2019

jarrodsfarrell commented Jun 1, 2019

rocketraman commented Jun 1, 2019

rocketraman commented Aug 6, 2019 •

edited

Loading

jarrodsfarrell commented Aug 6, 2019

rocketraman commented Aug 8, 2019

jarrodsfarrell commented Aug 8, 2019 via email •

edited

Loading

jarrodsfarrell commented Aug 8, 2019 •

edited

Loading

rocketraman commented Aug 9, 2019

MoD01 commented Apr 5, 2020 •

edited

Loading

rocketraman commented Apr 5, 2020

rocketraman commented Apr 5, 2020

Fast scanning ADF with long post-processing steps will consume all resources #4

Fast scanning ADF with long post-processing steps will consume all resources #4

Comments

jarrodsfarrell commented May 31, 2019

rocketraman commented Jun 1, 2019

jarrodsfarrell commented Jun 1, 2019

rocketraman commented Jun 1, 2019

rocketraman commented Aug 6, 2019 • edited Loading

jarrodsfarrell commented Aug 6, 2019

rocketraman commented Aug 8, 2019

jarrodsfarrell commented Aug 8, 2019 via email • edited Loading

jarrodsfarrell commented Aug 8, 2019 • edited Loading

rocketraman commented Aug 9, 2019

MoD01 commented Apr 5, 2020 • edited Loading

rocketraman commented Apr 5, 2020

rocketraman commented Apr 5, 2020

rocketraman commented Aug 6, 2019 •

edited

Loading

jarrodsfarrell commented Aug 8, 2019 via email •

edited

Loading

jarrodsfarrell commented Aug 8, 2019 •

edited

Loading

MoD01 commented Apr 5, 2020 •

edited

Loading