Automatically downscale large input images #15

robertknight · 2024-01-07T12:05:40Z

Input images from cameras etc. often have a much higher resolution than is needed to read the text. Downscaling the image can often produce the same output in much less time. This is because all of the steps in the pipeline that work directly on the input image have a lot less memory to move around and less computation to do if it is smaller.

As an example, I ran ocrs on an invoice I'd received from a tradesman recently. The photo of the invoice was 2479 x 3337 pixels and ocrs takes about 1.5s to process it on my Intel Mac. Downsizing to 30% of the original input size produces the same extracted output but runs nearly twice as fast (800-900ms).

In some cases the input image really does need high resolution to make the text legible, so some mechanism to control this would be useful.

The text was updated successfully, but these errors were encountered:

DravidVaishnav · 2024-02-20T11:18:02Z

hey @robertknight thanks for creating ocrs rust crate, im using it for detecting text from card ( for eg- licenses, etc), but the detection is taking more time 70-120 seconds, i have tried downscaling (30-40 seconds), but can do only upto certain extent because after that the text is not detected accurately, as the original image of card is small already. is there any way i can speed up the detection process ?

robertknight · 2024-02-20T11:50:02Z

Can you provide an image(s) that are representative of the ones that you are trying to extract data from, along with some details of the system you are running the extraction on (what CPU? how many cores? etc.)? Make sure not to include identifiable information for a real person.

How many images are you processing in total in the time period that you quoted?

DravidVaishnav · 2024-02-20T12:09:34Z

i have 16 gb ram i7 cpu, 8 cores, and processing on one image

robertknight · 2024-02-20T13:01:49Z

That single image takes 850 milliseconds on my i5 laptop with ocrs image.png. Are you using a release build, or at least building the rten-* dependencies in release mode? Debug builds of those crates will be extremely slow in comparison.

DravidVaishnav · 2024-02-20T14:35:06Z

yes it is super fast with cli, i am trying with library , just saw #7, thanks for the help @robertknight

SkylerA · 2024-05-02T21:51:09Z

As an example, I ran ocrs on an invoice I'd received from a tradesman recently. The photo of the invoice was 2479 x 3337 pixels and ocrs takes about 1.5s to process it on my Intel Mac. Downsizing to 30% of the original input size produces the same extracted output but runs nearly twice as fast (800-900ms).

Is there a known ideal image size or target text height? I'm testing on single line of text images and found that I can vertically concatenate them and run detect/recognize on the concatenated image for a huge speed gain. I can concatenate about 30-40 slices before the recognition accuracy starts to drop so I imagine at a point I'm causing some internal scaling when the image is passed to the model which then drops detection or accuracy.

Looking at some of the model code, am I correct in thinking that the expected image size for detection is 800x600 and the individual detected lines will be scaled to 64px high for recognition?

robertknight · 2024-05-02T23:15:24Z

Looking at some of the model code, am I correct in thinking that the expected image size for detection is 800x600 and the individual detected lines will be scaled to 64px high for recognition?

That's correct. The input is padded to 800x600 if smaller or resized down if larger. In future I'd like to avoid the fixed input size for the detection model, as it is wasteful for small images. In the meantime vertically stacking small images is a good trick for better efficiency.

SkylerA · 2024-05-03T20:24:48Z

Excellent, thanks! Really appreciate this project and excited to explore using RTen with custom ONNX models too.

robertknight mentioned this issue Jan 8, 2024

Roadmap for 2024 #14

Open

6 tasks

robertknight mentioned this issue Feb 20, 2024

Add stronger hint to users to build ocrs and rten* crates in release mode #28

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automatically downscale large input images #15

Automatically downscale large input images #15

robertknight commented Jan 7, 2024 •

edited

Loading

DravidVaishnav commented Feb 20, 2024

robertknight commented Feb 20, 2024

DravidVaishnav commented Feb 20, 2024 •

edited

Loading

robertknight commented Feb 20, 2024

DravidVaishnav commented Feb 20, 2024

SkylerA commented May 2, 2024

robertknight commented May 2, 2024 •

edited

Loading

SkylerA commented May 3, 2024

Automatically downscale large input images #15

Automatically downscale large input images #15

Comments

robertknight commented Jan 7, 2024 • edited Loading

DravidVaishnav commented Feb 20, 2024

robertknight commented Feb 20, 2024

DravidVaishnav commented Feb 20, 2024 • edited Loading

robertknight commented Feb 20, 2024

DravidVaishnav commented Feb 20, 2024

SkylerA commented May 2, 2024

robertknight commented May 2, 2024 • edited Loading

SkylerA commented May 3, 2024

robertknight commented Jan 7, 2024 •

edited

Loading

DravidVaishnav commented Feb 20, 2024 •

edited

Loading

robertknight commented May 2, 2024 •

edited

Loading