Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PdfReader - Extract images from specific pages #2535

Closed
FrsECM opened this issue Mar 23, 2024 · 1 comment
Closed

PdfReader - Extract images from specific pages #2535

FrsECM opened this issue Mar 23, 2024 · 1 comment
Labels
workflow-images From a users perspective, image handling is the affected feature/workflow

Comments

@FrsECM
Copy link

FrsECM commented Mar 23, 2024

Replace this: What happened? What were you trying to achieve?

Environment

Python 3.8
WSL Ubuntu 22.04
Windows11
pypdf 4.1.0

Issue

I generated a very simple pdf with libreoffice-writer :
test_image.pdf

In this pdf, there is two pages, one containing a small text, another containing an image.

I want to extract pdf pages and get the image only in the second page.

The code to reproduce the issue is here :

pdf = PdfReader(pdf_file)
for i,page in enumerate(pdf.pages):
        print(f'--- Extracting page {i}')
        print(page.extract_text())
        print(len(page.images))

The result is bellow :

--- Extracting page 0
Test page 1
1
--- Extracting page 1
1

I expect that on page 0 there is 0 image in order to extract the image only from the second page.
I don't know if it is a normal behaviour.

How to do what i would like to obtain ?
Thanks,
Regards

@stefan6419846 stefan6419846 added the workflow-images From a users perspective, image handling is the affected feature/workflow label Mar 23, 2024
@pubpub-zz
Copy link
Collaborator

Your pdf has the image attached to both pages:
image

pypdf do not check if the images are "called" in the image content.

@py-pdf py-pdf locked and limited conversation to collaborators Mar 23, 2024
@stefan6419846 stefan6419846 converted this issue into discussion #2536 Mar 23, 2024

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
workflow-images From a users perspective, image handling is the affected feature/workflow
Projects
None yet
Development

No branches or pull requests

3 participants