Skip to content

Unable to identify cropped region in images #3005

@abe-mxff

Description

@abe-mxff

Hi, I'm not able to detect cropped image using the get_bboxlog() method (fitz version 1.23.7).

I generated the attached PDF with two cropped image (one rotated 90°), but the extraction gives me the bounding boxes of the non-cropped images:

Image 0 - bbox: Rect(266.25, 157.2283935546875, 328.5, 608.3989868164062)
Image 1 - bbox: Rect(73.5, 73.5, 568.5, 142.5)
1 - Type: 'fill-path', width=595.5 height=842.25 (raw = (0.0, 0.0, 595.5, 842.25))
2 - Type: 'fill-image', width=62.25 height=451.17059326171875 (raw = (266.25, 157.2283935546875, 328.5, 608.3989868164062))
3 - Type: 'fill-image', width=495.0 height=69.0 (raw = (73.5, 73.5, 568.5, 142.5))

In the following the rendered PDF page and the script used to replicate the result. What am I doing wrong?
rendered_page

import fitz

fn_in = "test_page.pdf"

with open(fn_in, "rb") as f:
    doc = fitz.open(f)

page = doc.load_page(0)

# Extract images
imgs = []
for i, img in enumerate(page.get_image_info(xrefs=True)):
    xref = img["xref"]
    img["bbox"] = fitz.Rect(img["bbox"])
    print(f"Image {i} - bbox: {img['bbox']}")
    img["transform"] = fitz.Matrix(img["transform"])
    imgs.append(img)

# Get bbox_log
for i, (type, raw) in enumerate(page.get_bboxlog()):
    rect = fitz.Rect(raw)
    print(f"{i+1} - Type: '{type}', width={rect.width} height={rect.height} (raw = {raw})")

# There are three elements
# 1) A rectangle occupying the full page (I don't know why it is there)
# 2) The first image
# 3) The second image (correctly detect rotation)
# PROBLEM: None of the images are cropped

# Here images are correctly cropped
# page.get_pixmap().save('rendered_page.png')

Originally posted by @abe-mxff in #1312 (comment)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions