Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AssertionError on using fitz.IRect #3163

Closed
psambit9791 opened this issue Feb 15, 2024 · 4 comments
Closed

AssertionError on using fitz.IRect #3163

psambit9791 opened this issue Feb 15, 2024 · 4 comments
Labels
fix developed release schedule to be determined Fixed in next release

Comments

@psambit9791
Copy link

psambit9791 commented Feb 15, 2024

Background

I am using a column-by-column parsing of PDF text as described in this link from the pymupdf repository . This has a segment where fitz.IRect is called using bounding box values. On running the code, I get assertion error from the convert(x) function.

To Recreate:

b = {'number': 0, 'type': 0, 'bbox': (403.3577880859375, 330.8871765136719, 541.2731323242188, 349.5766296386719), 'lines': [{'spans': [{'size': 14.0, 'flags': 4, 'font': 'SFHello-Medium', 'color': 1907995, 'ascender': 1.07373046875, 'descender': -0.26123046875, 'text': 'Inclusion and diversity', 'origin': (403.3577880859375, 345.9194030761719), 'bbox': (403.3577880859375, 330.8871765136719, 541.2731323242188, 349.5766296386719)}], 'wmode': 0, 'dir': (1.0, 0.0), 'bbox': (403.3577880859375, 330.8871765136719, 541.2731323242188, 349.5766296386719)}]}
bbox = fitz.IRect(b["bbox"])

Stacktrace:

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
[<ipython-input-9-eb9bdcc8f1ee>](https://localhost:8080/#) in <cell line: 2>()
      1 b = {'number': 0, 'type': 0, 'bbox': (403.3577880859375, 330.8871765136719, 541.2731323242188, 349.5766296386719), 'lines': [{'spans': [{'size': 14.0, 'flags': 4, 'font': 'SFHello-Medium', 'color': 1907995, 'ascender': 1.07373046875, 'descender': -0.26123046875, 'text': 'Inclusion and diversity', 'origin': (403.3577880859375, 345.9194030761719), 'bbox': (403.3577880859375, 330.8871765136719, 541.2731323242188, 349.5766296386719)}], 'wmode': 0, 'dir': (1.0, 0.00;3...
----> 2 bbox = fitz.IRect(b["bbox"])

2 frames
/usr/local/lib/python3.10/dist-packages/fitz/__init__.py in __init__(self, p0, p1, x0, y0, x1, y1, *args)
  13101 
  13102     def __init__(self, *args, p0=None, p1=None, x0=None, y0=None, x1=None, y1=None):
> 13103         self.x0, self.y0, self.x1, self.y1 = util_make_irect( *args, p0=p0, p1=p1, x0=x0, y0=y0, x1=x1, y1=y1)
  13104 
  13105     def __len__(self):

/usr/local/lib/python3.10/dist-packages/fitz/__init__.py in util_make_irect(p0, p1, x0, y0, x1, y1, *args)
  20330         assert ret == x
  20331         return ret
> 20332     a = convert(a)
  20333     b = convert(b)
  20334     c = convert(c)

/usr/local/lib/python3.10/dist-packages/fitz/__init__.py in convert(x)
  20328     def convert(x):
  20329         ret = int(x)
> 20330         assert ret == x
  20331         return ret
  20332     a = convert(a)

AssertionError:

How to reproduce the bug

b = {'number': 0, 'type': 0, 'bbox': (403.3577880859375, 330.8871765136719, 541.2731323242188, 349.5766296386719), 'lines': [{'spans': [{'size': 14.0, 'flags': 4, 'font': 'SFHello-Medium', 'color': 1907995, 'ascender': 1.07373046875, 'descender': -0.26123046875, 'text': 'Inclusion and diversity', 'origin': (403.3577880859375, 345.9194030761719), 'bbox': (403.3577880859375, 330.8871765136719, 541.2731323242188, 349.5766296386719)}], 'wmode': 0, 'dir': (1.0, 0.0), 'bbox': (403.3577880859375, 330.8871765136719, 541.2731323242188, 349.5766296386719)}]}
bbox = fitz.IRect(b["bbox"])

PyMuPDF version

1.23.22

Operating system

MacOS

Python version

3.10

@julian-smith-artifex-com
Copy link
Collaborator

Thanks for the report.

This looks like an assert that is too strict - it fails if we try to construct an IRect from floating point values. I have a fix in my tree.

@julian-smith-artifex-com julian-smith-artifex-com added the fix developed release schedule to be determined label Feb 15, 2024
psambit9791 added a commit to psambit9791/PyMuPDF that referenced this issue Feb 16, 2024
Instead of simply typecasting, it seems more optimal to round to the nearest decimal and then return as a typecasted int.
@psambit9791
Copy link
Author

HI @julian-smith-artifex-com , I am trying to add a PR #3167 to improve on this. But I am unsure how to sign the CLA. Can you please help?

@julian-smith-artifex-com
Copy link
Collaborator

Hi @psambit9791. You need to post a comment on the PR that says:

I have read the CLA Document and I hereby sign the CLA

@julian-smith-artifex-com
Copy link
Collaborator

Fixed in 1.23.23.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fix developed release schedule to be determined Fixed in next release
Projects
None yet
Development

No branches or pull requests

2 participants