-
Notifications
You must be signed in to change notification settings - Fork 397
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
mismatch in sequence of words in result.export() #528
Comments
Thank you for your interest in doctr! If I understand well your problem is the ordering of boxes in the output (boxes are not mapped to the correct lines/blocks and/or blocks are not ordered). We use boxes coordinates to reconstruct lines and hierarchical clustering of lines to find blocks, but this is not a very robust approach, especially when you have many columns on the page. To help me a little bit on that since I don't have access to the document, could you plot or list the content of the different lines and/or blocks ? Thanks a lot 馃檹 |
Hi @charlesmindee, Thanks for replying. Here I am providing you with the document duplicate I used. Hope that helps you in solving the issue. Image source: Google images The above image can be found using in below link: Thanks a lot |
The option to resolve page lines and blocks is not activated by default, you need to activate it in the I activated the option and it is not working well with your document, as I mentioned above our lines/blocks resolution algorithm is not very robust. What you can do is try to modify the geometrical parameters of the line resolution function in the builder, or use directly the coordinates of the boxes in the output to reorder the boxes as you wish to. I am sorry for this dysfunction, we are going to work on table comprehension/reconstruction as suggested in #524 in the next weeks and it may help you on that! 馃槃 Best |
Thanks for the suggestion. Looking forward to table comprehension/reconstruction. Regards |
Actually, I am looking from extracting information in the table. To do so initially I have proceeded with regex but due to a mismatch in the alignment of words at present, the same regex might not be suitable in the long run when the issue is resolved. Could you please let me know if there are any chances of including key information extraction(KIE) models to the pipeline at present or suggest any other alternative approach to build our own custom KIE that can be added as postprocessing of docTR. Thanks and Regards |
@PoornaSaiNagendra |
@felixdittrich92 |
I am moving this to a discussion so that we can keep on discussing on that and close the bug issue. |
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
馃悰 Bug
The sequence of words outputted by result.export() is not the same as words in the image given as input. The columns were getting swapped.
To Reproduce
Steps to reproduce the behavior:
model = ocr_predictor(pretrained=True)
doc = DocumentFile.from_images(</path/to/the/image>)
result = model(doc)
result.show(doc)
json_output = result.export()
num_words = len(json_output['pages'][0]['blocks'][0]['lines'][0]['words'])
words_list = []
words_dic = json_output['pages'][0]['blocks'][0]['lines'][0]['words']
for word in range(num_words):
res = words_dic[word]['value']
words_list.append(res)
total_text = ' '.join(words_list)
I can't provide the complete image due to privacy issues but I am providing the desired part of the image for my use case.
Expected behavior
The output I am getting is:
HINDI (SPECIALI EVEN EIGHT 100 078 DISTIN 33 078 ENGLISH GENERAL) HIVE TWO 33 100 052 052 SANSKRIT GENERAL AIVEN TWO 100 33 072 072 MATHEMATICS 100 33 SUK ONE 061 061 SCIENCE 100 25 08 20 040 060 Sus ZERO SOCIAL SCIENCE 33 100 062 TWO 062
And the expected output is:
HINDI (SPECIALI) 100 33 078 078 SEVEN EIGHT DISTN ENGLISH (GENERAL) 100 33 052 052 FIVE TWO SANSKRIT GENERAL TWO 100 33 072 072 SEVEN TWO MATHEMATICS 100 33 061 061 SIX ONE SCIENCE 100 25 08 040 20 060 SIX ZERO SOCIAL SCIENCE 100 33 062 062 SIX TWO
Environment
I am using Google Colab free version
Please copy and paste the output from our
environment collection script
(or fill out the checklist below manually).
You can get the script and run it with:
Collecting environment information...
DocTR version: 0.4.0
TensorFlow version: 2.6.0
PyTorch version: 1.9.0+cu111 (torchvision 0.10.0+cu111)
OpenCV version: 4.5.3
OS: Ubuntu 18.04.5 LTS
Python version: 3.7
Is CUDA available (TensorFlow): No
Is CUDA available (PyTorch): No
CUDA runtime version: 11.1.105
GPU models and configuration: Could not collect
Nvidia driver version: Could not collect
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.7.6.5
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.0.5
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.0.5
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.0.5
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.0.5
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.0.5
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.0.5
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.0.5
Additional context
The above image is the cropped output from result.show(doc).
Thanks for any help you can provide in resolving this issue.
The text was updated successfully, but these errors were encountered: