-
Notifications
You must be signed in to change notification settings - Fork 444
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Detection model performs poorly when image is scaled (e.g. 1.5x in both dims) #1535
Comments
Hey @ajkdrag 👋, Thanks for the feedback. :) But yeah with the last runs we have already extended the applied augmentations but there is still some space for additions like zoom (in/out) / quality compression / etc. Only for interest have you also tried the newly trained fast models from main branch ? :) |
I have yet to try the Fast models. I saw the links were updated. Will give it a shot today. |
Also, in DBNet, the preprocessing gives an image of size 1024x1024, is it possible that for large rectangular docs like bank checks, resizing to square will mess up things? |
@ajkdrag with keep_aspect_ratio=True (default) and symmetric_pad=True (default) it shouldn't. https://mindee.github.io/doctr/using_doctr/using_models.html#advanced-options |
I tried the Fast model and it works pretty good, but i expected the Fast model to be "fast" :D it takes like a sec per image, but papers with code mentioned it to be almost realtime. I am using the main branch with |
Hey yeah 😃 All papers (DB / FAST) was build for scene text detection in the wild and tested on datasets like IC15, etc. |
Got it. Fast works well, but I think I understand the issue now. For images that are "long", i.e. aspect ratio say: 1176 x 256 , the bin-thresh is really tricky to work with. In my use case, (which is scanned bank checks), I get images that are of this aspect ratio, and for few batches, if I set bin_thresh to 0.2 it works well, while for others, I have to go down to 0.08. Could you suggest some tricks/workarounds for such usecases? |
Hey sorry i totally missed your message. |
Moved to #1604 |
Bug description
Detection model performs poorly when image is scaled (e.g. 1.5x in both dims)
Code snippet to reproduce the bug
If I do something like to my dataset, the detection model performs poorly.
I am using:
Error traceback
No error, but poor bboxes.
Environment
DocTR version: 0.8.1
TensorFlow version: N/A
PyTorch version: N/A (torchvision N/A)
OpenCV version: N/A
OS: Debian GNU/Linux 11 (bullseye)
Python version: 3.8.18
Is CUDA available (TensorFlow): N/A
Is CUDA available (PyTorch): N/A
CUDA runtime version: 11.8.89
GPU models and configuration: GPU 0: Tesla T4
Nvidia driver version: 525.105.17
cuDNN version: Could not collect
Deep Learning backend
is_tf_available: False
is_torch_available: True
The text was updated successfully, but these errors were encountered: