Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Effect of resolution #223

Open
with-him777 opened this issue Dec 7, 2022 · 4 comments
Open

Effect of resolution #223

with-him777 opened this issue Dec 7, 2022 · 4 comments

Comments

@with-him777
Copy link

Sorry to bother you again. I found that different resolution of an image will have a big impact on the recognition effect. For example, if the original resolution of some input images is reduced to 80% or enlarged to 120 percent, the recognition effect will change significantly, and the identification results will be too uncertain.

@lukas-blecher
Copy link
Owner

I've noticed that too, which is why I trained a small classification model to determine what resolution the input image should have.
I've noted it in the Readme and I also include the train_resizer.py script for completeness.

Did you use the cli or gui for your experiment? Because by default the images should be resized there

@uniartisan
Copy link

I'm quite interested in this problem. I'm wondering whether there are enough image enhancement in the training.
To be honest, I'm still pretty ignorant about this project. But when I am reading the source code, I think it can be resized randomly or periodically during training. I've seen something similar in quite a few image tasks, resulution varies from 224x224 all the way up to 1024x1024.
In my opinion, LaTex-OCR can try the same way, I will try a new idea some week after the new year, by the way, what kind of graphics card do I need to train this task, my desktop may not be able to handle this task

train_transform = alb.Compose(

@lukas-blecher
Copy link
Owner

When creating the dataset I already varied the resolution of the formulas to some extent. The model supports only images with dimensions that are multiple of the patch size, so I tried to create a diverse dataset from the beginning and enhance it during training time with the suggestions you mentioned.

@uniartisan
Copy link

When creating the dataset I already varied the resolution of the formulas to some extent. The model supports only images with dimensions that are multiple of the patch size, so I tried to create a diverse dataset from the beginning and enhance it during training time with the suggestions you mentioned.

Sorry, my English may not be very good. If I understand correctly (plus reading the code), you are diversifying the data when creating the dataset, but not periodically and/or randomly changing the data resolution during training.
What I mean is, change the image resolution again during training, mainly by making some small changes in the aspect ratio of the initial image, and then resizing the entire image, say 2-3 times larger or reduced to half the original size.

albumments.augmentations.geometric.resize.RandomScale
albumentations.augmentations.geometric.resize.SmallestMaxSize

https://albumentations.ai/docs/api_reference/augmentations/geometric/resize/#resizing-transforms-augmentationsgeometricresize

At the same time, I saw that when the text box is detected in paddleocr, a reference size is set for character recognition. I wonder if a reference size can also be set for formula recognition. Different image resolutions are multiples of the reference size.

https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.6/configs/rec/PP-OCRv3/ch_PP-OCRv3_rec.yml#L95

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants