New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: Even better re-do of OCR #1451
Conversation
…tion common. Actually creates updated file now
Pull Request Test Coverage Report for Build 2916177984
💛 - Coveralls |
Oh hey this fixes some confusion I had yesterday with this feature. I was trying to manually fix the page orientation of some originals that had already been parsed and the redo ocr didn't seem to be doing anything at all. If the redo ocr button will now regenerate all the previews shown in the web interface that would be super! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, as advertised.
If the redo ocr button will now regenerate all the previews shown in the web interface that would be super!
It does exactly that from what I can tell in testing.
The best way I could tell something was happening was with the archive file checksum displayed in the metadata. Content never seemed to change, which makes sense if the OCR was good the first time. |
The checksum works too, I didn't think of that. I uploaded an upside down image, noticed the garbled ocr and upside down preview, rotated it upright in the filesystem |
Yea this worked well for me too and updated the thumbnail etc. |
This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new discussion or issue for related concerns. |
Proposed change
When first creating the redo OCR functionality, I didn't realize the
document_archiver
command already existed to re-parse AND remake the archive file. My attempt only updated the new OCR content in the database.So with that knowledge, this improves the functionality so it not only will redo OCR, but will also make a new archive file, all accessible from the frontend as before. The functionality is moved out of the management command, and into the tasks, from where an async task or the command can call it.
Type of change
Checklist:
pre-commit
hooks, see documentation.