Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chore: Includes OCRMyPdf logging into the log file #5947

Merged
merged 1 commit into from
Feb 28, 2024
Merged

Conversation

stumpylog
Copy link
Member

Proposed change

Closes #(issue or discussion)

Type of change

  • Bug fix: non-breaking change which fixes an issue.
  • New feature / Enhancement: non-breaking change which adds functionality. Please read the important note above.
  • Breaking change: fix or feature that would cause existing functionality to not work as expected.
  • Documentation only.
  • Other. Please explain:

Checklist:

  • I have read & agree with the contributing guidelines.
  • If applicable, I have included testing coverage for new code in this PR, for backend and / or front-end changes.
  • If applicable, I have tested my code for new features & regressions on both mobile & desktop devices, using the latest version of major browsers.
  • If applicable, I have checked that all tests pass, see documentation.
  • I have run all pre-commit hooks, see documentation.
  • I have made corresponding changes to the documentation as needed.
  • I have checked my modifications for any breaking changes.

Copy link
Member

@shamoon shamoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh interesting thanks, got a sample? I wonder if this will help make it apparent to people where OCR happens...

(probably not)

@stumpylog
Copy link
Member Author

It's already been included in the console, so docker logs for example. It would look like this:

[2024-02-28 10:00:01,842] [INFO] [ocrmypdf._pipelines.ocr] Start processing 2 pages concurrently
[2024-02-28 10:00:01,844] [INFO] [ocrmypdf._pipeline] skipping all processing on this page
[2024-02-28 10:00:01,845] [INFO] [ocrmypdf._pipeline] skipping all processing on this page
[2024-02-28 10:00:01,860] [INFO] [ocrmypdf._pipelines.ocr] Postprocessing...
[2024-02-28 10:00:02,099] [ERROR] [ocrmypdf._exec.ghostscript] GPL Ghostscript 10.02.1 (2023-11-01)
Copyright (C) 2023 Artifex Software, Inc.  All rights reserved.
This software is supplied under the GNU AGPLv3 and comes with NO WARRANTY:
see the file COPYING for details.
Processing pages 1 through 2.
Page 1
Page 2

The following errors were encountered at least once while processing this file:
	stream inherited a resource

   
[2024-02-28 10:00:02,100] [ERROR] [ocrmypdf._exec.ghostscript]  This file had errors that were repaired or ignored.
   
[2024-02-28 10:00:02,100] [ERROR] [ocrmypdf._exec.ghostscript]  The file was produced by: 
   
[2024-02-28 10:00:02,101] [ERROR] [ocrmypdf._exec.ghostscript]  >>>> KUBRA Data Transfer Ltd. via ABCpdf <<<<
   
[2024-02-28 10:00:02,101] [ERROR] [ocrmypdf._exec.ghostscript]  Please notify the author of the software that produced this
   
[2024-02-28 10:00:02,101] [ERROR] [ocrmypdf._exec.ghostscript]  file that it does not conform to Adobe's published PDF
   
[2024-02-28 10:00:02,102] [ERROR] [ocrmypdf._exec.ghostscript]  specification.

The formatting isn't amazing, tends to just dump output from the subprocess I think

@shamoon
Copy link
Member

shamoon commented Feb 28, 2024

Cool thanks, do you think it should be debug level? (or maybe Im misunderstanding that parameter)

@stumpylog
Copy link
Member Author

I considered it, but it was very chatty about things that didn't seem relevant

@shamoon
Copy link
Member

shamoon commented Feb 28, 2024

Oh yea I think im misunderstanding, as is it will only show if the user has DEBUG level (not the default INFO)? Thats what I meant

@stumpylog
Copy link
Member Author

Ah, no that just controls the level which will be included in the file. So anything at INFO or lower from ocrmypdf will be included in the file. The log already includes DEBUG and lower from paperless

@shamoon
Copy link
Member

shamoon commented Feb 28, 2024

Ah gotcha, thanks

@stumpylog stumpylog merged commit 86263a5 into dev Feb 28, 2024
31 checks passed
@stumpylog stumpylog deleted the chore/log-ocrmypdf branch February 28, 2024 22:39
Copy link
Contributor

This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new discussion or issue for related concerns. See our contributing guidelines for more details.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 30, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

None yet

2 participants