Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatically downscale large input images #15

Open
robertknight opened this issue Jan 7, 2024 · 8 comments
Open

Automatically downscale large input images #15

robertknight opened this issue Jan 7, 2024 · 8 comments

Comments

@robertknight
Copy link
Owner

robertknight commented Jan 7, 2024

Input images from cameras etc. often have a much higher resolution than is needed to read the text. Downscaling the image can often produce the same output in much less time. This is because all of the steps in the pipeline that work directly on the input image have a lot less memory to move around and less computation to do if it is smaller.

As an example, I ran ocrs on an invoice I'd received from a tradesman recently. The photo of the invoice was 2479 x 3337 pixels and ocrs takes about 1.5s to process it on my Intel Mac. Downsizing to 30% of the original input size produces the same extracted output but runs nearly twice as fast (800-900ms).

In some cases the input image really does need high resolution to make the text legible, so some mechanism to control this would be useful.

@robertknight robertknight mentioned this issue Jan 8, 2024
5 tasks
@DravidVaishnav
Copy link

hey @robertknight thanks for creating ocrs rust crate, im using it for detecting text from card ( for eg- licenses, etc), but the detection is taking more time 70-120 seconds, i have tried downscaling (30-40 seconds), but can do only upto certain extent because after that the text is not detected accurately, as the original image of card is small already. is there any way i can speed up the detection process ?

@robertknight
Copy link
Owner Author

Can you provide an image(s) that are representative of the ones that you are trying to extract data from, along with some details of the system you are running the extraction on (what CPU? how many cores? etc.)? Make sure not to include identifiable information for a real person.

How many images are you processing in total in the time period that you quoted?

@DravidVaishnav
Copy link

DravidVaishnav commented Feb 20, 2024

aadharsample
i have 16 gb ram i7 cpu, 8 cores, and processing on one image

@robertknight
Copy link
Owner Author

That single image takes 850 milliseconds on my i5 laptop with ocrs image.png. Are you using a release build, or at least building the rten-* dependencies in release mode? Debug builds of those crates will be extremely slow in comparison.

@DravidVaishnav
Copy link

yes it is super fast with cli, i am trying with library , just saw #7, thanks for the help @robertknight

@SkylerA
Copy link

SkylerA commented May 2, 2024

As an example, I ran ocrs on an invoice I'd received from a tradesman recently. The photo of the invoice was 2479 x 3337 pixels and ocrs takes about 1.5s to process it on my Intel Mac. Downsizing to 30% of the original input size produces the same extracted output but runs nearly twice as fast (800-900ms).

Is there a known ideal image size or target text height? I'm testing on single line of text images and found that I can vertically concatenate them and run detect/recognize on the concatenated image for a huge speed gain. I can concatenate about 30-40 slices before the recognition accuracy starts to drop so I imagine at a point I'm causing some internal scaling when the image is passed to the model which then drops detection or accuracy.

Looking at some of the model code, am I correct in thinking that the expected image size for detection is 800x600 and the individual detected lines will be scaled to 64px high for recognition?

@robertknight
Copy link
Owner Author

robertknight commented May 2, 2024

Looking at some of the model code, am I correct in thinking that the expected image size for detection is 800x600 and the individual detected lines will be scaled to 64px high for recognition?

That's correct. The input is padded to 800x600 if smaller or resized down if larger. In future I'd like to avoid the fixed input size for the detection model, as it is wasteful for small images. In the meantime vertically stacking small images is a good trick for better efficiency.

@SkylerA
Copy link

SkylerA commented May 3, 2024

Excellent, thanks! Really appreciate this project and excited to explore using RTen with custom ONNX models too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants