Skip to content

[BUG] Assigned tags etc. are missing after splitting by PATCHT pages #8604

@Disane87

Description

@Disane87

Description

I have created a workflow to assign tags, permissions ans so on when documents are scraped from the consume-folder. These folders are named by the name of who is the owner of the document.

This folder is feeded by a ScanSnap IX1600.

If I scan a single document, it works perfectly:
image

But if I use the PATCHT splitting by enabling PAPERLESS_CONSUMER_ENABLE_BARCODES all splitted documents are losing my tags scan.

My workflow definition attached:

Trigger:
image

Action:
image

As you can see in the logs, the workflow should be fired, but there is no workflow run for the initial documents with all pages (including PATCHT).

Steps to reproduce

  1. Enable PAPERLESS_CONSUMER_ENABLE_BARCODES and PAPERLESS_CONSUMER_RECURSIVE
  2. Create a folder "marco" underneath the consume folder
  3. Create a workflow to assign tags when documents from import folder *marco* are processing
  4. Scan two or more docuements splitted by PATCHT pages
  5. See that all new documents having no assigments from the workflow

Webserver logs

[2025-01-05 08:51:17,140] [INFO] [paperless.management.consumer] Adding /usr/src/paperless/consume/marco/05012025085111.pdf to the task queue.

[2025-01-05 08:51:17,213] [DEBUG] [paperless.tasks] Skipping plugin CollatePlugin

[2025-01-05 08:51:17,214] [DEBUG] [paperless.tasks] Executing plugin BarcodePlugin

[2025-01-05 08:51:17,215] [DEBUG] [paperless.barcodes] Scanning for barcodes using PYZBAR

[2025-01-05 08:51:17,219] [DEBUG] [paperless.barcodes] PDF has 5 pages

[2025-01-05 08:51:17,219] [DEBUG] [paperless.barcodes] Processing page 0

[2025-01-05 08:51:17,585] [DEBUG] [paperless.barcodes] Image is at /tmp/paperless/tmpmq1fvrsu/barcodeogcik94g/63f1b3f2-e317-49b1-adbd-6b8b3ef8858c-1.ppm

[2025-01-05 08:51:17,778] [DEBUG] [paperless.barcodes] Barcode of type I25 found: 0002999741403616

[2025-01-05 08:51:17,784] [DEBUG] [paperless.barcodes] Processing page 1

[2025-01-05 08:51:18,136] [DEBUG] [paperless.barcodes] Image is at /tmp/paperless/tmpmq1fvrsu/barcodeogcik94g/2bbc347d-ffcd-4fc1-8447-8a667633d16b-2.ppm

[2025-01-05 08:51:18,451] [DEBUG] [paperless.barcodes] Barcode of type CODE39 found: PATCHT

[2025-01-05 08:51:18,458] [DEBUG] [paperless.barcodes] Processing page 2

[2025-01-05 08:51:18,734] [DEBUG] [paperless.barcodes] Image is at /tmp/paperless/tmpmq1fvrsu/barcodeogcik94g/57aee759-4074-400e-b480-e0dce019c4a3-3.ppm

[2025-01-05 08:51:18,858] [DEBUG] [paperless.barcodes] Processing page 3

[2025-01-05 08:51:19,180] [DEBUG] [paperless.barcodes] Image is at /tmp/paperless/tmpmq1fvrsu/barcodeogcik94g/fde38e2e-bf5e-40f9-bbb2-b2a3a3271e8b-4.ppm

[2025-01-05 08:51:19,384] [DEBUG] [paperless.barcodes] Barcode of type I25 found: 9802149753500572

[2025-01-05 08:51:19,387] [DEBUG] [paperless.barcodes] Barcode of type I25 found: 0002999541222453

[2025-01-05 08:51:19,391] [DEBUG] [paperless.barcodes] Processing page 4

[2025-01-05 08:51:19,654] [DEBUG] [paperless.barcodes] Image is at /tmp/paperless/tmpmq1fvrsu/barcodeogcik94g/f960e8b3-c2e9-492b-bb7d-f069dd76fdb5-5.ppm

[2025-01-05 08:51:19,785] [DEBUG] [paperless.barcodes] Starting new document at idx 1

[2025-01-05 08:51:19,787] [DEBUG] [paperless.barcodes] Split into 2 new documents

[2025-01-05 08:51:19,789] [DEBUG] [paperless.barcodes] pdf no:0 has 1 pages

[2025-01-05 08:51:19,793] [DEBUG] [paperless.barcodes] pdf no:1 has 3 pages

[2025-01-05 08:51:19,825] [INFO] [paperless.barcodes] Created new task 097475fe-0ea9-4526-967b-44ce2905901c for 05012025085111_document_0.pdf

[2025-01-05 08:51:19,844] [INFO] [paperless.barcodes] Created new task 8bc649d7-8014-4006-a532-244437be3236 for 05012025085111_document_1.pdf

[2025-01-05 08:51:19,848] [INFO] [paperless.tasks] BarcodePlugin requested task exit: Barcode splitting complete!

[2025-01-05 08:51:22,919] [DEBUG] [paperless.tasks] Skipping plugin CollatePlugin

[2025-01-05 08:51:22,921] [DEBUG] [paperless.tasks] Executing plugin BarcodePlugin

[2025-01-05 08:51:22,922] [DEBUG] [paperless.barcodes] Scanning for barcodes using PYZBAR

[2025-01-05 08:51:22,923] [DEBUG] [paperless.barcodes] PDF has 1 pages

[2025-01-05 08:51:22,924] [DEBUG] [paperless.barcodes] Processing page 0

[2025-01-05 08:51:23,172] [DEBUG] [paperless.barcodes] Image is at /tmp/paperless/tmpn4qios97/barcodeyy_vysjn/e74dd798-1d9d-457e-8148-283eac3fdb36-1.ppm

[2025-01-05 08:51:23,367] [DEBUG] [paperless.barcodes] Barcode of type I25 found: 0002999741403616

[2025-01-05 08:51:23,373] [INFO] [paperless.tasks] BarcodePlugin completed with no message

[2025-01-05 08:51:23,375] [DEBUG] [paperless.tasks] Executing plugin WorkflowTriggerPlugin

[2025-01-05 08:51:23,436] [INFO] [paperless.matching] Document did not match Workflow: Scan Marco

[2025-01-05 08:51:23,436] [DEBUG] [paperless.matching] ('Document path /tmp/paperless/paperless-barcode-split-3cplzarc/05012025085111_document_0.pdf does not match *marco*',)

[2025-01-05 08:51:23,440] [INFO] [paperless.matching] Document did not match Workflow: Scan Lielie

[2025-01-05 08:51:23,441] [DEBUG] [paperless.matching] ('Document path /tmp/paperless/paperless-barcode-split-3cplzarc/05012025085111_document_0.pdf does not match *lielie*',)

[2025-01-05 08:51:23,442] [INFO] [paperless.tasks] WorkflowTriggerPlugin completed with:

[2025-01-05 08:51:23,443] [DEBUG] [paperless.tasks] Executing plugin ConsumeTaskPlugin

[2025-01-05 08:51:23,457] [INFO] [paperless.consumer] Consuming 05012025085111_document_0.pdf

[2025-01-05 08:51:23,460] [DEBUG] [paperless.consumer] Detected mime type: application/pdf

[2025-01-05 08:51:23,461] [INFO] [paperless.consumer] Executing pre-consume script /usr/src/paperless/scripts/pre-consume.sh

[2025-01-05 08:51:23,727] [INFO] [paperless.consumer] /usr/src/paperless/scripts/pre-consume.sh exited 0

[2025-01-05 08:51:23,729] [INFO] [paperless.consumer] /usr/src/paperless/scripts/pre-consume.sh stderr:

[2025-01-05 08:51:23,730] [WARNING] [paperless.consumer] + /usr/src/paperless/scripts/remove-blank-pages.sh

[2025-01-05 08:51:23,731] [WARNING] [paperless.consumer] Total pages 1

[2025-01-05 08:51:23,732] [WARNING] [paperless.consumer] Color-sum in page 1 is 9.68534: Page added to document

[2025-01-05 08:51:23,739] [DEBUG] [paperless.consumer] Parser: RasterisedDocumentParser

[2025-01-05 08:51:23,743] [DEBUG] [paperless.consumer] Parsing 05012025085111_document_0.pdf...

[2025-01-05 08:51:23,750] [INFO] [paperless.parsing.tesseract] pdftotext exited 0

[2025-01-05 08:51:23,844] [DEBUG] [paperless.parsing.tesseract] Calling OCRmyPDF with args: {'input_file': PosixPath('/tmp/paperless/paperless-ngx2z13v6ja/05012025085111_document_0.pdf'), 'output_file': PosixPath('/tmp/paperless/paperless-d994ed3i/archive.pdf'), 'use_threads': True, 'jobs': 16, 'language': 'deu+eng', 'output_type': 'pdfa', 'progress_bar': False, 'color_conversion_strategy': 'RGB', 'skip_text': True, 'clean': True, 'deskew': True, 'rotate_pages': True, 'rotate_pages_threshold': 12.0, 'sidecar': PosixPath('/tmp/paperless/paperless-d994ed3i/sidecar.txt')}

[2025-01-05 08:51:25,309] [INFO] [ocrmypdf._pipeline] page is facing ⇧, confidence 12.11 - rotation appears correct

[2025-01-05 08:51:34,264] [INFO] [ocrmypdf._pipelines.ocr] Postprocessing...

[2025-01-05 08:51:34,938] [INFO] [ocrmypdf._pipeline] Image optimization ratio: 1.26 savings: 20.5%

[2025-01-05 08:51:34,940] [INFO] [ocrmypdf._pipeline] Total file size ratio: 1.25 savings: 20.2%

[2025-01-05 08:51:34,943] [INFO] [ocrmypdf._pipelines._common] Output file is a PDF/A-2B (as expected)

[2025-01-05 08:51:35,264] [DEBUG] [paperless.parsing.tesseract] Using text from sidecar file

[2025-01-05 08:51:35,265] [DEBUG] [paperless.consumer] Generating thumbnail for 05012025085111_document_0.pdf...

[2025-01-05 08:51:35,268] [DEBUG] [paperless.parsing] Execute: convert -density 300 -scale 500x5000> -alpha remove -strip -auto-orient -define pdf:use-cropbox=true /tmp/paperless/paperless-d994ed3i/archive.pdf[0] /tmp/paperless/paperless-d994ed3i/convert.webp

[2025-01-05 08:51:36,278] [INFO] [paperless.parsing] convert exited 0

[2025-01-05 08:51:36,591] [DEBUG] [paperless.consumer] Saving record to database

[2025-01-05 08:51:36,592] [DEBUG] [paperless.consumer] Creation date from parse_date: 2019-01-14 00:00:00+01:00

[2025-01-05 08:51:36,978] [INFO] [paperless.handlers] Assigning document type Rechnung to 2019-01-14 05012025085111_document_0

[2025-01-05 08:51:36,988] [INFO] [paperless.handlers] Tagging "2019-01-14 05012025085111_document_0" with "Bezahlt"

[2025-01-05 08:51:37,122] [DEBUG] [paperless.consumer] Deleting file /tmp/paperless/paperless-ngx2z13v6ja/05012025085111_document_0.pdf

[2025-01-05 08:51:37,133] [DEBUG] [paperless.parsing.tesseract] Deleting directory /tmp/paperless/paperless-d994ed3i

[2025-01-05 08:51:37,135] [INFO] [paperless.consumer] Document 2019-01-14 05012025085111_document_0 consumption finished

[2025-01-05 08:51:37,142] [INFO] [paperless.tasks] ConsumeTaskPlugin completed with: Success. New document id 3412 created

[2025-01-05 08:51:37,452] [DEBUG] [paperless.tasks] Skipping plugin CollatePlugin

[2025-01-05 08:51:37,453] [DEBUG] [paperless.tasks] Executing plugin BarcodePlugin

[2025-01-05 08:51:37,454] [DEBUG] [paperless.barcodes] Scanning for barcodes using PYZBAR

[2025-01-05 08:51:37,456] [DEBUG] [paperless.barcodes] PDF has 3 pages

[2025-01-05 08:51:37,457] [DEBUG] [paperless.barcodes] Processing page 0

[2025-01-05 08:51:37,691] [DEBUG] [paperless.barcodes] Image is at /tmp/paperless/tmpb6voz9zn/barcode9a3f_akb/c347a6cb-1280-4d2a-b774-6ea029ceaa7c-1.ppm

[2025-01-05 08:51:37,822] [DEBUG] [paperless.barcodes] Processing page 1

[2025-01-05 08:51:38,073] [DEBUG] [paperless.barcodes] Image is at /tmp/paperless/tmpb6voz9zn/barcode9a3f_akb/3d550872-d54b-4051-ace6-1e0d11cd2485-2.ppm

[2025-01-05 08:51:38,280] [DEBUG] [paperless.barcodes] Barcode of type I25 found: 9802149753500572

[2025-01-05 08:51:38,280] [DEBUG] [paperless.barcodes] Barcode of type I25 found: 0002999541222453

[2025-01-05 08:51:38,285] [DEBUG] [paperless.barcodes] Processing page 2

[2025-01-05 08:51:38,515] [DEBUG] [paperless.barcodes] Image is at /tmp/paperless/tmpb6voz9zn/barcode9a3f_akb/785b4e6a-773b-4c8a-94b4-067500515cbb-3.ppm

[2025-01-05 08:51:38,639] [INFO] [paperless.tasks] BarcodePlugin completed with no message

[2025-01-05 08:51:38,641] [DEBUG] [paperless.tasks] Executing plugin WorkflowTriggerPlugin

[2025-01-05 08:51:38,690] [INFO] [paperless.matching] Document did not match Workflow: Scan Marco

[2025-01-05 08:51:38,691] [DEBUG] [paperless.matching] ('Document path /tmp/paperless/paperless-barcode-split-3cplzarc/05012025085111_document_1.pdf does not match *marco*',)

[2025-01-05 08:51:38,694] [INFO] [paperless.matching] Document did not match Workflow: Scan Lielie

[2025-01-05 08:51:38,695] [DEBUG] [paperless.matching] ('Document path /tmp/paperless/paperless-barcode-split-3cplzarc/05012025085111_document_1.pdf does not match *lielie*',)

[2025-01-05 08:51:38,695] [INFO] [paperless.tasks] WorkflowTriggerPlugin completed with:

[2025-01-05 08:51:38,696] [DEBUG] [paperless.tasks] Executing plugin ConsumeTaskPlugin

[2025-01-05 08:51:38,708] [INFO] [paperless.consumer] Consuming 05012025085111_document_1.pdf

[2025-01-05 08:51:38,711] [DEBUG] [paperless.consumer] Detected mime type: application/pdf

[2025-01-05 08:51:38,712] [INFO] [paperless.consumer] Executing pre-consume script /usr/src/paperless/scripts/pre-consume.sh

[2025-01-05 08:51:39,114] [INFO] [paperless.consumer] /usr/src/paperless/scripts/pre-consume.sh exited 0

[2025-01-05 08:51:39,116] [INFO] [paperless.consumer] /usr/src/paperless/scripts/pre-consume.sh stderr:

[2025-01-05 08:51:39,117] [WARNING] [paperless.consumer] + /usr/src/paperless/scripts/remove-blank-pages.sh

[2025-01-05 08:51:39,118] [WARNING] [paperless.consumer] Total pages 3

[2025-01-05 08:51:39,118] [WARNING] [paperless.consumer] Color-sum in page 1 is 0.10829: Page removed from document

[2025-01-05 08:51:39,119] [WARNING] [paperless.consumer] Color-sum in page 2 is 13.78236: Page added to document

[2025-01-05 08:51:39,120] [WARNING] [paperless.consumer] Color-sum in page 3 is 0.08683: Page removed from document

[2025-01-05 08:51:39,127] [DEBUG] [paperless.consumer] Parser: RasterisedDocumentParser

[2025-01-05 08:51:39,130] [DEBUG] [paperless.consumer] Parsing 05012025085111_document_1.pdf...

[2025-01-05 08:51:39,137] [INFO] [paperless.parsing.tesseract] pdftotext exited 0

[2025-01-05 08:51:39,230] [DEBUG] [paperless.parsing.tesseract] Calling OCRmyPDF with args: {'input_file': PosixPath('/tmp/paperless/paperless-ngxd1jr5u45/05012025085111_document_1.pdf'), 'output_file': PosixPath('/tmp/paperless/paperless-_83x9tbe/archive.pdf'), 'use_threads': True, 'jobs': 16, 'language': 'deu+eng', 'output_type': 'pdfa', 'progress_bar': False, 'color_conversion_strategy': 'RGB', 'skip_text': True, 'clean': True, 'deskew': True, 'rotate_pages': True, 'rotate_pages_threshold': 12.0, 'sidecar': PosixPath('/tmp/paperless/paperless-_83x9tbe/sidecar.txt')}

[2025-01-05 08:51:40,683] [INFO] [ocrmypdf._pipeline] page is facing ⇧, confidence 11.27 - no change

[2025-01-05 08:51:50,914] [INFO] [ocrmypdf._pipelines.ocr] Postprocessing...

[2025-01-05 08:51:51,658] [INFO] [ocrmypdf._pipeline] Image optimization ratio: 1.19 savings: 16.0%

[2025-01-05 08:51:51,660] [INFO] [ocrmypdf._pipeline] Total file size ratio: 1.18 savings: 15.1%

[2025-01-05 08:51:51,662] [INFO] [ocrmypdf._pipelines._common] Output file is a PDF/A-2B (as expected)

[2025-01-05 08:51:52,046] [DEBUG] [paperless.parsing.tesseract] Using text from sidecar file

[2025-01-05 08:51:52,048] [DEBUG] [paperless.consumer] Generating thumbnail for 05012025085111_document_1.pdf...

[2025-01-05 08:51:52,051] [DEBUG] [paperless.parsing] Execute: convert -density 300 -scale 500x5000> -alpha remove -strip -auto-orient -define pdf:use-cropbox=true /tmp/paperless/paperless-_83x9tbe/archive.pdf[0] /tmp/paperless/paperless-_83x9tbe/convert.webp

[2025-01-05 08:51:53,162] [INFO] [paperless.parsing] convert exited 0

[2025-01-05 08:51:53,474] [DEBUG] [paperless.consumer] Saving record to database

[2025-01-05 08:51:53,475] [DEBUG] [paperless.consumer] Creation date from parse_date: 2019-03-23 00:00:00+01:00

[2025-01-05 08:51:53,877] [INFO] [paperless.handlers] Assigning correspondent MediaMarkt Onlineshop to 2019-03-23 05012025085111_document_1

[2025-01-05 08:51:53,887] [INFO] [paperless.handlers] Assigning document type Rechnung to 2019-03-23 MediaMarkt Onlineshop 05012025085111_document_1

[2025-01-05 08:51:54,928] [DEBUG] [paperless.consumer] Deleting file /tmp/paperless/paperless-ngxd1jr5u45/05012025085111_document_1.pdf

[2025-01-05 08:51:54,939] [DEBUG] [paperless.parsing.tesseract] Deleting directory /tmp/paperless/paperless-_83x9tbe

[2025-01-05 08:51:54,940] [INFO] [paperless.consumer] Document 2019-03-23 MediaMarkt Onlineshop 05012025085111_document_1 consumption finished

[2025-01-05 08:51:54,944] [INFO] [paperless.tasks] ConsumeTaskPlugin completed with: Success. New document id 3413 created

Browser logs

No response

Paperless-ngx version

2.13.5

Host OS

UnraidOS 6.12.14

Installation method

Docker - official image

System status

{
    "pngx_version": "2.13.5",
    "server_os": "Linux-6.1.106-Unraid-x86_64-with-glibc2.36",
    "install_type": "docker",
    "storage": {
        "total": 1999919296512,
        "available": 1637246083072
    },
    "database": {
        "type": "postgresql",
        "url": "paperless",
        "status": "OK",
        "error": null,
        "migration_status": {
            "latest_migration": "paperless_mail.0028_alter_mailaccount_password_and_more",
            "unapplied_migrations": []
        }
    },
    "tasks": {
        "redis_url": "redis://broker:6379",
        "redis_status": "OK",
        "redis_error": null,
        "celery_status": "OK",
        "index_status": "OK",
        "index_last_modified": "2025-01-05T08:51:54.895543+01:00",
        "index_error": null,
        "classifier_status": "OK",
        "classifier_last_trained": "2025-01-05T07:05:04.078656Z",
        "classifier_error": null
    }
}

Browser

Edge

Configuration changes

Added env variables to docker stack:

PAPERLESS_CONSUMER_RECURSIVE: true
PAPERLESS_CONSUMER_ENABLE_BARCODES: true

Please confirm the following

  • I believe this issue is a bug that affects all users of Paperless-ngx, not something specific to my installation.
  • This issue is not about the OCR or archive creation of a specific file(s). Otherwise, please see above regarding OCR tools.
  • I have already searched for relevant existing issues and discussions before opening this report.
  • I have updated the title field above with a concise description.

Metadata

Metadata

Assignees

No one assigned

    Labels

    not a bugnot a bug in paperless-ngx

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions