You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Please consider explaining some details in your documentation:
What is WER and CER. If you are not familiar with these terms, you don't grasp it emediately although it is an easy concept.
Does dinglehopper automatically recognize the import format?
How are text files and XML files compared? Are the XML files simply stripped down to their text representation? How do you assure that there is no additional (or missing) empty paragraph screwing the evaluation?
Also, please provide a --verbose parameter. I just ran a comparison of a text file with an XML file (both not part of any OCR-D process) and waited for every. FInally, I aborted the process. It would be nice to know what was the issue internally.... (Moved to #30)
And, despite this critique, thank you for providing such a handy tool! :)
Edit:
I found even more:
How can I process a bunch of ground truth files that are not part of the OCR-D mets.xml. Or, how can I assign them their corresponding page in the mets.xml? There should be some way!
The text was updated successfully, but these errors were encountered:
What is WER and CER. If you are not familiar with these terms, you don't grasp it emediately although it is an easy concept.
Does dinglehopper automatically recognize the import format?
How are text files and XML files compared? Are the XML files simply stripped down to their text representation? How do you assure that there is no additional (or missing) empty paragraph screwing the evaluation?
I'll address this in the documentation. Answering here shortly:
Word and character error rates, e.g. 1 in 10 characters are wrong (due to insertion,deletion,substitution) → CER = 1/10 = 0.1
Yes it detects if a file is ALTO or PAGE and falls back to text
It's always the extracted text that is compared. You can (and should) have a look at the visual comparison of the text. (If you have any specific data, please send and I have a look.)
Also, please provide a --verbose parameter. I just ran a comparison of a text file with an XML file (both not part of any OCR-D process) and waited for every. FInally, I aborted the process. It would be nice to know what was the issue internally....
Heh, yeah I'll consider a progress bar. Could you please open a second issue for this?
And, despite this critique, thank you for providing such a handy tool! :)
Thanks 🥇 ;) Consider giving the project a star here on GitHub!
How can I process a bunch of ground truth files that are not part of the OCR-D mets.xml. Or, how can I assign them their corresponding page in the mets.xml? There should be some way!
This is out of scope for this tool, but this workshop material by @kba should help: https://ocr-d.de/slides/2019-03-25-dhd/praxis-new-mets – You need to add the GT to the METS with the right page id (matching the OCR and image files). So it should be (untested) ocrd workspace add -g P0015 -G OCR-D-GT-PAGE -m application/vnd.prima.page+xml -i OCR-D-GT-PAGE_0015 OCR-D-GT/OCR-D-GT-PAGE_0015.xml for a GT XML with page id P0015.
Please consider explaining some details in your documentation:
Also, please provide a --verbose parameter. I just ran a comparison of a text file with an XML file (both not part of any OCR-D process) and waited for every. FInally, I aborted the process. It would be nice to know what was the issue internally....(Moved to #30)And, despite this critique, thank you for providing such a handy tool! :)
Edit:
I found even more:
The text was updated successfully, but these errors were encountered: