Effect of resolution #223

with-him777 · 2022-12-07T05:58:56Z

Sorry to bother you again. I found that different resolution of an image will have a big impact on the recognition effect. For example, if the original resolution of some input images is reduced to 80% or enlarged to 120 percent, the recognition effect will change significantly, and the identification results will be too uncertain.

lukas-blecher · 2022-12-08T15:24:04Z

I've noticed that too, which is why I trained a small classification model to determine what resolution the input image should have.
I've noted it in the Readme and I also include the train_resizer.py script for completeness.

Did you use the cli or gui for your experiment? Because by default the images should be resized there

uniartisan · 2022-12-30T13:03:25Z

I'm quite interested in this problem. I'm wondering whether there are enough image enhancement in the training.
To be honest, I'm still pretty ignorant about this project. But when I am reading the source code, I think it can be resized randomly or periodically during training. I've seen something similar in quite a few image tasks, resulution varies from 224x224 all the way up to 1024x1024.
In my opinion, LaTex-OCR can try the same way, I will try a new idea some week after the new year, by the way, what kind of graphics card do I need to train this task, my desktop may not be able to handle this task

LaTeX-OCR/pix2tex/dataset/transforms.py

Line 4 in 44d70eb

train_transform = alb.Compose(

lukas-blecher · 2022-12-30T16:26:52Z

When creating the dataset I already varied the resolution of the formulas to some extent. The model supports only images with dimensions that are multiple of the patch size, so I tried to create a diverse dataset from the beginning and enhance it during training time with the suggestions you mentioned.

uniartisan · 2022-12-31T02:35:23Z

When creating the dataset I already varied the resolution of the formulas to some extent. The model supports only images with dimensions that are multiple of the patch size, so I tried to create a diverse dataset from the beginning and enhance it during training time with the suggestions you mentioned.

Sorry, my English may not be very good. If I understand correctly (plus reading the code), you are diversifying the data when creating the dataset, but not periodically and/or randomly changing the data resolution during training.
What I mean is, change the image resolution again during training, mainly by making some small changes in the aspect ratio of the initial image, and then resizing the entire image, say 2-3 times larger or reduced to half the original size.

albumments.augmentations.geometric.resize.RandomScale
albumentations.augmentations.geometric.resize.SmallestMaxSize

https://albumentations.ai/docs/api_reference/augmentations/geometric/resize/#resizing-transforms-augmentationsgeometricresize

At the same time, I saw that when the text box is detected in paddleocr, a reference size is set for character recognition. I wonder if a reference size can also be set for formula recognition. Different image resolutions are multiples of the reference size.

https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.6/configs/rec/PP-OCRv3/ch_PP-OCRv3_rec.yml#L95

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Effect of resolution #223

Effect of resolution #223

with-him777 commented Dec 7, 2022

lukas-blecher commented Dec 8, 2022

uniartisan commented Dec 30, 2022

lukas-blecher commented Dec 30, 2022

uniartisan commented Dec 31, 2022

Effect of resolution #223

Effect of resolution #223

Comments

with-him777 commented Dec 7, 2022

lukas-blecher commented Dec 8, 2022

uniartisan commented Dec 30, 2022

lukas-blecher commented Dec 30, 2022

uniartisan commented Dec 31, 2022