Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PdfReader.get_form_text_fields() not returning dropdown fields #391

Closed
akallai opened this issue Feb 6, 2018 · 7 comments · Fixed by #1114
Closed

PdfReader.get_form_text_fields() not returning dropdown fields #391

akallai opened this issue Feb 6, 2018 · 7 comments · Fixed by #1114
Labels
Has MCVE A minimal, complete and verifiable example helps a lot to debug / understand feature requests is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF PdfReader The PdfReader component is affected workflow-forms From a users perspective, forms is the affected feature/workflow

Comments

@akallai
Copy link

akallai commented Feb 6, 2018

I have a pdf form with multiple Form-Fields in it. The function "getFormTextFields()" is only returning a dictionary without the dropdown-fields. The dictionary contains the information of all other fields, but the dropdown-fields are simply missing.

Minimal Example

from PyPDF2 import PdfReader

reader = PdfReader("sample-files/012-libreoffice-form/libreoffice-form.pdf")
reader.get_form_text_fields()

gives

{"First Name": "Alice", "Last Name": "", "Birthday": "", "First Name_2": "Bob"}

The "Nationality" part is missing.

On the other hand:

>>> reader.get_fields()
{'First Name': {'/FT': '/Tx', '/T': 'First Name', '/V': 'Alice', '/DV': 'Alice'},
 'Last Name': {'/FT': '/Tx', '/T': 'Last Name', '/V': '', '/DV': ''}, 
'female': {'/FT': '/Btn', '/Kids': [IndirectObject(7, 0), IndirectObject(9, 0)], '/T': 'female', '/Ff': 49152, '/V': '/Off', '/DV': '/Off'},
 'Birthday': {'/FT': '/Tx', '/T': 'Birthday', '/V': '', '/DV': ''}, 
'gdpr': {'/FT': '/Btn', '/T': 'gdpr', '/V': '/Off', '/DV': '/Off'}, 
'other': {'/FT': '/Btn', '/T': 'other', '/V': '/Off', '/DV': '/Off'}, 
'First Name_2': {'/FT': '/Tx', '/T': 'First Name_2', '/Ff': 4096, '/V': 'Bob', '/DV': 'Bob'}, 
'Nationality': {'/FT': '/Ch', '/T': 'Nationality', '/Ff': 131072, '/V': '', '/DV': ''}}
@thenineteen
Copy link

thenineteen commented Oct 10, 2018

if you want to get all the values, instead of .getFormTextFields() you can use the following method after opening the pdf file with PyPDF2.PdfFileReader() :

.getFields()

  • getFormTextFields() works as pdf form reader but doesn't read the drop down menus or tick boxes
  • whereas getFields() gets all the data but needs cleaning up

@dr-linetco
Copy link

According to the official documentation, getFormTextFields() can indeed be used to extract the contents of a drop down.

github_issue_pypdf2

So I guess the documentation is just wrong on this.

Sorry for being the issue-necromancer here.

@MartinThoma MartinThoma added the workflow-forms From a users perspective, forms is the affected feature/workflow label Apr 16, 2022
@MartinThoma MartinThoma added is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF Has MCVE A minimal, complete and verifiable example helps a lot to debug / understand feature requests labels Jun 26, 2022
@MartinThoma
Copy link
Member

I've just added https://github.com/py-pdf/sample-files/tree/main/012-libreoffice-form to confirm the issue

@MartinThoma
Copy link
Member

@akallai I've added a minimal example to your question. Thank you for raising the issue!

@MartinThoma MartinThoma changed the title getFormTextFields() not returning Values from Dropdown fields. PdfReader.get_form_text_fields() not returning Values from Dropdown fields. Jun 26, 2022
@MartinThoma MartinThoma added the PdfReader The PdfReader component is affected label Jun 26, 2022
@MartinThoma MartinThoma changed the title PdfReader.get_form_text_fields() not returning Values from Dropdown fields. PdfReader.get_form_text_fields() not returning dropdown fields Jun 26, 2022
MartinThoma pushed a commit that referenced this issue Jul 21, 2022
Added /Opt to the checked field_attributes within reader.get_fields

Closes #391
@shanimuffin
Copy link

Hi

I'm running version 3.0.1 and still have the same issue when using get_form_text_fields(), not seeing dropdown data. Was this fix released then removed? Or am I missing something?

Thanks

Shani

@MartinThoma MartinThoma reopened this Mar 30, 2023
@MartinThoma
Copy link
Member

You need to use get_fields:

>>> from pypdf import PdfReader
>>> reader = PdfReader("sample-files/012-libreoffice-form/libreoffice-form.pdf")
>>> reader.get_form_text_fields()
{'First Name': 'Alice', 'Last Name': '', 'Birthday': '', 'First Name_2': 'Bob'}
>>> reader.get_fields()
{'First Name': {'/T': 'First Name', '/FT': '/Tx', '/V': 'Alice', '/DV': 'Alice'}, 'Last Name': {'/T': 'Last Name', '/FT': '/Tx', '/V': '', '/DV': ''}, 'female': {'/T': 'female', '/FT': '/Btn', '/Ff': 49152, '/V': '/Off', '/DV': '/Off', '/Kids': [IndirectObject(7, 0, 139872300641424), IndirectObject(9, 0, 139872300641424)]}, 'Birthday': {'/T': 'Birthday', '/FT': '/Tx', '/V': '', '/DV': ''}, 'gdpr': {'/T': 'gdpr', '/FT': '/Btn', '/V': '/Off', '/DV': '/Off'}, 'other': {'/T': 'other', '/FT': '/Btn', '/V': '/Off', '/DV': '/Off'}, 'First Name_2': {'/T': 'First Name_2', '/FT': '/Tx', '/Ff': 4096, '/V': 'Bob', '/DV': 'Bob'}, 'Nationality': {'/T': 'Nationality', '/FT': '/Ch', '/Ff': 131072, '/V': '', '/DV': '', '/Opt': ['Unknown', 'German', 'Indonesian', 'US-American', 'French', 'Spanish', 'Italian']}}

@MartinThoma
Copy link
Member

@shanimuffin You've discoverd a documentation issue. Would you mind to expand https://pypdf.readthedocs.io/en/latest/user/forms.html which such an example / explanation?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Has MCVE A minimal, complete and verifiable example helps a lot to debug / understand feature requests is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF PdfReader The PdfReader component is affected workflow-forms From a users perspective, forms is the affected feature/workflow
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants