PdfReader - Extract images from specific pages #2535

FrsECM · 2024-03-23T18:40:31Z

Replace this: What happened? What were you trying to achieve?

Environment

Python 3.8
WSL Ubuntu 22.04
Windows11
pypdf 4.1.0

Issue

I generated a very simple pdf with libreoffice-writer :
test_image.pdf

In this pdf, there is two pages, one containing a small text, another containing an image.

I want to extract pdf pages and get the image only in the second page.

The code to reproduce the issue is here :

pdf = PdfReader(pdf_file)
for i,page in enumerate(pdf.pages):
        print(f'--- Extracting page {i}')
        print(page.extract_text())
        print(len(page.images))

The result is bellow :

--- Extracting page 0
Test page 1
1
--- Extracting page 1
1

I expect that on page 0 there is 0 image in order to extract the image only from the second page.
I don't know if it is a normal behaviour.

How to do what i would like to obtain ?
Thanks,
Regards

The text was updated successfully, but these errors were encountered:

pubpub-zz · 2024-03-23T19:29:07Z

Your pdf has the image attached to both pages:

pypdf do not check if the images are "called" in the image content.

stefan6419846 added the workflow-images From a users perspective, image handling is the affected feature/workflow label Mar 23, 2024

py-pdf locked and limited conversation to collaborators Mar 23, 2024

stefan6419846 converted this issue into discussion #2536 Mar 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

PdfReader - Extract images from specific pages #2535

PdfReader - Extract images from specific pages #2535

FrsECM commented Mar 23, 2024 •

edited

Loading

pubpub-zz commented Mar 23, 2024

This issue was moved to a discussion.

This issue was moved to a discussion.

PdfReader - Extract images from specific pages #2535

PdfReader - Extract images from specific pages #2535

Comments

FrsECM commented Mar 23, 2024 • edited Loading

Environment

Issue

pubpub-zz commented Mar 23, 2024

This issue was moved to a discussion.

FrsECM commented Mar 23, 2024 •

edited

Loading