Skip to content
This repository was archived by the owner on Apr 11, 2025. It is now read-only.
This repository was archived by the owner on Apr 11, 2025. It is now read-only.

Network (+Hybrid) parser infinite execution #245

@bosd

Description

@bosd

The network parser keeps running infinitly

When parsing a table with a lot of different alignments the parsers keeps running infinitly.
It happens on the Network parser, since the Hybrid parser depends on it, that one will hang as well.

Steps to reproduce the bug

  1. Parse the file 4th page of file tabula/schools.pdf with the network or hybrid parser.
  2. It keeps running

Expected behavior

Not an infinite execution. Was expecting a parsing error. Or a retunerd table.

Code


pdf_file, kwargs = "tabula/schools.pdf", {"pages": "4"} 

tables = pypdf_table_extraction.read_pdf(filename, flavor="network", debug=True, **kwargs)
    

PDF

Screenshots

image

Environment

  • OS: [e.g. macOS]
  • Python version: 3.10
  • Numpy version: 1.5.3
  • OpenCV version:
  • Ghostscript version: 0.7
  • pypdf_table_extraction version: from repo, between release 0.0.2 and 1.0.0

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions