Skip to content

Get image inside table's cell #3587

Answered by JorjMcKie
vinniec2 asked this question in Q&A
Jun 16, 2024 · 2 comments · 4 replies
Discussion options

You must be logged in to vote

Converting this from "Issues" to "Discussions".

Thanks for your interest in PyMuPDF!

Your idea is the only way to match this information. You can extract a list of images via page.get_image_info(). This delivers metadata for all images - without extracting the image binaries themselves.

It is not really difficult to do:

imglist = page.get_image_info()

# copy of the table's text content:
tab_text = tab.extract()[:]
# the table's cell bboxes as Rect objects:
tab_cells=[[pymupdf.Rect(c) for c in r.cells] for r in tab.rows]

Both of the above are lists of lists with matching (row, col) indices.

for img_idx, img in enumerate(imglist):
    if not img["bbox"] in pymupdf.Rect(tab.bbox):
        c…

Replies: 2 comments 4 replies

Comment options

You must be logged in to vote
4 replies
@vinniec2
Comment options

@vinniec2
Comment options

@JorjMcKie
Comment options

@vinniec2
Comment options

Answer selected by vinniec2
Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants
Converted from issue

This discussion was converted from issue #3586 on June 16, 2024 12:11.