Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Even better re-do of OCR #1451

Merged
merged 1 commit into from Aug 26, 2022
Merged

Feature: Even better re-do of OCR #1451

merged 1 commit into from Aug 26, 2022

Conversation

stumpylog
Copy link
Member

Proposed change

When first creating the redo OCR functionality, I didn't realize the document_archiver command already existed to re-parse AND remake the archive file. My attempt only updated the new OCR content in the database.

So with that knowledge, this improves the functionality so it not only will redo OCR, but will also make a new archive file, all accessible from the frontend as before. The functionality is moved out of the management command, and into the tasks, from where an async task or the command can call it.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Other (please explain)

Checklist:

  • I have read & agree with the contributing guidelines.
  • If applicable, I have tested my code for new features & regressions on both mobile & desktop devices, using the latest version of major browsers.
  • If applicable, I have checked that all tests pass, see documentation.
  • I have run all pre-commit hooks, see documentation.
  • I have made corresponding changes to the documentation as needed.
  • I have checked my modifications for any breaking changes.

…tion common. Actually creates updated file now
@stumpylog stumpylog requested a review from a team as a code owner August 24, 2022 03:02
@paperless-ngx-secretary paperless-ngx-secretary bot added the non-trivial Requires approval by several team members label Aug 24, 2022
@coveralls
Copy link

Pull Request Test Coverage Report for Build 2916177984

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • 9 unchanged lines in 3 files lost coverage.
  • Overall coverage increased (+0.6%) to 92.548%

Files with Coverage Reduction New Missed Lines %
documents/bulk_edit.py 2 95.31%
documents/management/commands/document_archiver.py 3 90.0%
documents/tasks.py 4 97.08%
Totals Coverage Status
Change from base Build 2900990598: 0.6%
Covered Lines: 4831
Relevant Lines: 5220

💛 - Coveralls

@ocelotsloth
Copy link
Contributor

ocelotsloth commented Aug 24, 2022

Oh hey this fixes some confusion I had yesterday with this feature.

I was trying to manually fix the page orientation of some originals that had already been parsed and the redo ocr didn't seem to be doing anything at all.

If the redo ocr button will now regenerate all the previews shown in the web interface that would be super!

@shamoon shamoon added this to the v1.8.1 milestone Aug 25, 2022
@stumpylog stumpylog changed the title Feature Even better re-do of OCR Feature: Even better re-do of OCR Aug 25, 2022
Copy link
Member

@qcasey qcasey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, as advertised.

If the redo ocr button will now regenerate all the previews shown in the web interface that would be super!

It does exactly that from what I can tell in testing.

@stumpylog
Copy link
Member Author

The best way I could tell something was happening was with the archive file checksum displayed in the metadata. Content never seemed to change, which makes sense if the OCR was good the first time.

@qcasey
Copy link
Member

qcasey commented Aug 25, 2022

The checksum works too, I didn't think of that.

I uploaded an upside down image, noticed the garbled ocr and upside down preview, rotated it upright in the filesystem media/originals, redid ocr, noticed correct ocr and correct preview orientation.

@shamoon
Copy link
Member

shamoon commented Aug 25, 2022

Yea this worked well for me too and updated the thumbnail etc.

@qcasey qcasey merged commit 44e596b into dev Aug 26, 2022
@qcasey qcasey deleted the feature-better-redo-ocr branch August 26, 2022 00:01
@github-actions
Copy link
Contributor

This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new discussion or issue for related concerns.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 17, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
backend enhancement New feature non-trivial Requires approval by several team members
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

None yet

5 participants