-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hocr-check complains assert doc.xpath("//meta[@name='ocr-id']")!=[] #59
Comments
I would recommend you to use hocr-spec-python. It's an explicit replacement for hocr-check which is more or less nonsensical at the moment. |
You can try it online here: http://digi.bib.uni-mannheim.de/ocr-fileformat/#validate |
Every file I try to upload to hocr-spec-python online -- http://digi.bib.uni-mannheim.de/ocr-fileformat/#validate -- ends with "NameError: global name 'KeyErrora' is not defined" (from tesseract or gImageReader). |
|
Should be fixed and deployed. BTW: If you have any samples, gladly complicated ones, you're welcome to contribute them to ocr-fileformat-samples. |
However, the main problem here we should also fix in
I can do this change, but just want to make sure, that we agree here. |
Before you implement this further, I have a branch somewhere where I've done that, let me check. Several actually. |
Ah, no I did not actually fix it, so yeah: Should we maybe use hocr-spec-python as a base to reorganize this (#42)? The parsing of |
I agree that |
Checking whether What's more important is that |
Fix metadata check: ocr-id -> ocr-system, #59
Can be reproduced with both tesseract and gImageReader hOCR files.
manisandro/gImageReader#101
Does the script end with this error or is it still checking the other issues?
The text was updated successfully, but these errors were encountered: