Skip to content

Support for multi-frame image files #266

@AdventurousGui

Description

@AdventurousGui

Files in GIF, TIFF, and WebP formats can have multiple frames, but currently the service only processes the first frame of such files. While it is possible to load the file once and process each frame individually with Tesseract, altering the workflow requires the following considerations:

  • Either only one task_page_ocr() is called, which loads the file once and processes each frame, negating the advantage of distributing tasks across worker processes...
  • ... or each task_page_ocr() opens the file and seeks to the target frame, in which case the original file does not need to be decomposed into separate files for each frame.
  • Without storing separate files representing each frame, the browser client cannot display each page in the interfaces for editing the layout and results.
  • It must be checked whether PIL.ImageDraw can draw in individual frames of the loaded image, to keep the current support for ignoring parts of the image.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions