# Reading Data from PDF Form Fields Using PyPDF

This section demonstrates how to read data from interactive form fields within a PDF using **PyPDF**. PyPDF allows you to extract values from fields like text boxes, checkboxes, and other types of input fields commonly found in interactive PDFs. This can be useful when working with forms that contain user-entered data, such as contracts, applications, or surveys.

## Use Case
- Extracting user input or predefined data from interactive PDF forms.
- Automating the process of reading form fields from multiple PDFs for data analysis or record-keeping.

In [1]:
from pypdf import PdfReader

## Read in Pdf file

In [2]:
# Define the path to the PDF file
path = r"C:\Users\Quynh Pham\Desktop\Import pdf\PdfFormExample.pdf"

In [3]:
# Create a PdfReader object to read the PDF
reader = dfReader(path)

In [4]:
# Get the metadata and print out the result
info = reader.metadata
print(info)

{'/CreationDate': "D:20130629204853+02'00'", '/Creator': 'Writer', '/Keywords': 'PDF Form', '/Producer': 'OpenOffice.org 3.4', '/Title': 'PDF Form Example'}


In [5]:
# Print the number of pages in the PDF
print(f"There are {len(reader.pages)} pages in the PDF.")

There are 1 pages in the PDF.


In [6]:
# Extract text from all pages in the PDF and store in a list
for i in range(len(reader.pages)):
    page = reader.pages[i]
    print(page.extract_text())

PDF Form Example
This is an example of a user fillable PDF form. Normally PDF is used as a final publishing format. 
However PDF has an option to be used as an entry form that can be edited and saved by the user.
The fields of this form have been selected to demonstrate as many as possible of the common 
entry fields.
This document and PDF form have been created with OpenOffice (version 3.4.0).
To fill out the form, make sure the PDF file is not read-only. If the file is read-only save it first to a 
folder or computer desktop. Close this file and open the saved file.
Please fill out the following fields. Important fields are marked yellow.
Given Name:
Family Name:
Address 1:   House nr:
Address 2:
Postcode: City:  
Country:
Gender:
Height (cm):
Driving License:
I speak and understand (tick all that apply): 
      
Favourite colour:
Important: Save the completed PDF form (use menu File - Save).
Deutsch English Français Esperanto Latin


## Extract data from form fields

In [8]:
# Function to extract form field values from a PDF file
def extract_form_values(path):
    """
    This function extracts form field data from a PDF file.

    Parameters:
    path (str): The file path to the PDF document.

    Returns:
    dict: A dictionary containing form field names as keys and their respective values. 
          If no form fields are present, returns None.
    """
    
    with open(path, 'rb') as file:
        reader = PdfReader(file)  
        
        # Check if the PDF file contains any form fields. If there are no fields, return None (no form data to extract)
        if not reader.get_fields():
            return None
        
        form_fields = reader.get_fields() # Read all the form fields into a dictionary called form_fields
        field_values = {} # Initialize an empty dictionary to store the extracted field values
        
        for field_name, field_data in form_fields.items():
            # Retrieve the actual value of the field (denoted by '/V' key in the form field metadata)
            # If no value exists for the field, return None
            field_values[field_name] = field_data.get('/V', None)
        
    return field_values

In [9]:
# Extract form values from the PDF file
form_values = extract_form_values(path)

# Check if form_values is not None, meaning form data was successfully extracted
# If form_values is None, it means no form data was found or an error occurred
if form_values is not None:
    for field, value in form_values.items():
        print(f"Field Name: {field}, Value: {value}")
else:
    print("No form values found or an error occurred.")

Field Name: Given Name Text Box, Value: Quynh Dinh Hảai 
Field Name: Family Name Text Box, Value: Pham
Field Name: House nr Text Box, Value: 20
Field Name: Address 2 Text Box, Value: 
Field Name: Postcode Text Box, Value: 10315
Field Name: Country Combo Box, Value: Germany
Field Name: Height Formatted Field, Value: 158
Field Name: City Text Box, Value: Berlin
Field Name: Driving License Check Box, Value: /Off
Field Name: Favourite Colour List Box, Value: Green
Field Name: Language 1 Check Box, Value: /Yes
Field Name: Language 2 Check Box, Value: /Yes
Field Name: Language 3 Check Box, Value: /Off
Field Name: Language 4 Check Box, Value: /Off
Field Name: Language 5 Check Box, Value: /Off
Field Name: Gender List Box, Value: Woman
Field Name: Address 1 Text Box, Value: Allee der Kosmonauten
