Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: Better error handling and checking when parsing documents via Tika #3617

Merged
merged 1 commit into from Jun 18, 2023

Conversation

stumpylog
Copy link
Member

Proposed change

My little library for Tikas API got some updates to refine its handling of Tika's response for document types it hasn't encountered and our code did too. It now just double checks there is content, in a couple different ways.

Fixes #3614

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Other (please explain)

Checklist:

  • I have read & agree with the contributing guidelines.
  • If applicable, I have tested my code for new features & regressions on both mobile & desktop devices, using the latest version of major browsers.
  • If applicable, I have checked that all tests pass, see documentation.
  • I have run all pre-commit hooks, see documentation.
  • I have made corresponding changes to the documentation as needed.
  • I have checked my modifications for any breaking changes.

@stumpylog stumpylog requested a review from a team as a code owner June 18, 2023 04:10
@paperless-ngx-secretary paperless-ngx-secretary bot added backend non-trivial Requires approval by several team members labels Jun 18, 2023
@github-actions github-actions bot added the bug Bug report or a Bug-fix label Jun 18, 2023
@stumpylog
Copy link
Member Author

I should probably add a "live" test for a doc file as well

@stumpylog stumpylog marked this pull request as draft June 18, 2023 04:19
…ent via Tika

Signed-off-by: Trenton Holmes <797416+stumpylog@users.noreply.github.com>
@stumpylog stumpylog force-pushed the fix/3614-doc-document-parse branch from 3546ead to e943e86 Compare June 18, 2023 14:05
@codecov
Copy link

codecov bot commented Jun 18, 2023

Codecov Report

Merging #3617 (e943e86) into dev (328c879) will increase coverage by 1.32%.
The diff coverage is 75.00%.

@@            Coverage Diff             @@
##              dev    #3617      +/-   ##
==========================================
+ Coverage   93.79%   95.11%   +1.32%     
==========================================
  Files         157      331     +174     
  Lines        6723    12496    +5773     
  Branches        0     1093    +1093     
==========================================
+ Hits         6306    11886    +5580     
- Misses        417      605     +188     
- Partials        0        5       +5     
Flag Coverage Δ
backend 93.76% <75.00%> (-0.04%) ⬇️
frontend 96.70% <ø> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/paperless_tika/parsers.py 83.60% <71.42%> (-2.11%) ⬇️
src/paperless_mail/parsers.py 96.29% <80.00%> (-0.42%) ⬇️

... and 174 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@stumpylog stumpylog marked this pull request as ready for review June 18, 2023 14:24
@shamoon shamoon force-pushed the dev branch 2 times, most recently from be97585 to 4693632 Compare June 18, 2023 15:06
Copy link
Member

@shamoon shamoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, test makes it super clear too!

@stumpylog stumpylog merged commit 4782b4d into dev Jun 18, 2023
27 checks passed
@stumpylog stumpylog deleted the fix/3614-doc-document-parse branch June 18, 2023 15:39
@github-actions
Copy link
Contributor

This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new discussion or issue for related concerns.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jul 19, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
backend bug Bug report or a Bug-fix non-trivial Requires approval by several team members
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Paperless does not consume Word file ('BaseResponse' object has no attribute 'content')
2 participants