# Data Extraction from Multilingual PDF using Gemini's Multimodal Function Calling


| | |
|-|-|
|Author(s) | [Laxmi Harikumar](https://github.com/lharikumar) |

## Overview

### Introduction to Multimodal Function Calling with Gemini

This notebook demonstrates a powerful [Function Calling](https://cloud.google.com/vertex-ai/docs/generative-ai/multimodal/function-calling) capability of the Gemini model: support for multimodal inputs. With multimodal function calling, you can go beyond traditional text inputs, enabling Gemini to understand your intent and predict function calls and function parameters based on various inputs like images, audio, video, and PDFs. Function calling can also be referred to as *function calling with controlled generation*, which guarantees that output generated by the model always adheres to a specific schema so that you receive consistently formatted responses.

### How It Works

1. **Define Functions and Tools:** Describe your functions, then group them into `Tool` objects for Gemini to use.
2. **Send Inputs and Prompt:** Provide Gemini with multimodal input (image, audio, PDF, etc.) and a prompt describing your request.
3. **Gemini Predicts Action:** Gemini analyzes the multimodal input and prompt to predict the best function to call and its parameters.
4. **Execute and Return:** Use Gemini's prediction to make API calls, then send the results back to Gemini.
5. **Generate Response:** Gemini uses the API results to provide a final, natural language response to the user.



### About this notebook

This notebook will guide you through extracting information from contractual documents in PDF format using Gemini's multimodal function calling.

### Objectives

In this tutorial, you will learn how to use the Vertex AI Gemini API with the Vertex AI SDK for Python to make function calls with multimodal inputs, using the Gemini 1.5 Pro (`gemini-1.5-pro`) model. You'll explore how Gemini can process and understand PDFs to predict and execute functions.

You will complete the following tasks:

- Install the Vertex AI SDK for Python.
- Define functions that can be called by Gemini.
- Package functions into tools.
- Send multimodal inputs (PDFs) and prompts to Gemini.
- Extract predicted function calls and their parameters from Gemini's response.
- Save the output in a csv file in GCS bucket for further analysis/downstream processing

### Costs

This tutorial uses billable components of Google Cloud:

- Vertex AI

Learn about [Vertex AI pricing](https://cloud.google.com/vertex-ai/pricing) and use the [Pricing Calculator](https://cloud.google.com/products/calculator/) to generate a cost estimate based on your projected usage.


## Getting Started


### Install Vertex AI SDK for Python


In [1]:
!pip3 install --upgrade --user --quiet google-cloud-aiplatform

### Restart current runtime

To use the newly installed packages in this Jupyter runtime, you must restart the runtime. You can do this by running the cell below, which will restart the current kernel.

In [2]:
# Restart kernel after installs so that your environment can access the new packages
import IPython

app = IPython.Application.instance()
app.kernel.do_shutdown(True)

{'status': 'ok', 'restart': True}

<div class="alert alert-block alert-warning">
<b>⚠️ The kernel is going to restart. Please wait until it is finished before continuing to the next step. ⚠️</b>
</div>


### Authenticate your notebook environment (Colab only)

If you are running this notebook on Google Colab, run the following cell to authenticate your environment. This step is not required if you are using [Vertex AI Workbench](https://cloud.google.com/vertex-ai-workbench).

In [2]:
import sys

if "google.colab" in sys.modules:
    from google.colab import auth

    auth.authenticate_user()

### Pre-requisites

Upload the 10 contracts provided in the contracts folder to a GCS bucket.

### Set Google Cloud project information and initialize Vertex AI SDK

To get started using Vertex AI, you must have an existing Google Cloud project and [enable the Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com).

Learn more about [setting up a project and a development environment](https://cloud.google.com/vertex-ai/docs/start/cloud-environment).

In [3]:
PROJECT_ID = "<REPLACE_WITH_YOUR_PROJECT_ID>"  # @param {type:"string"}
LOCATION = "<REPLACE_WITH_YOUR_LOCATION>"  # @param {type:"string"}
GCS_BUCKET = "<REPLACE_WITH_YOUR_GCS_BUCKET_NAME>"  # @param {type:"string"}

import vertexai

vertexai.init(project=PROJECT_ID, location=LOCATION)

## Multimodal Function Calling in Action

### Import libraries


In [4]:
from IPython.display import Markdown, display
from vertexai.generative_models import (
    Content,
    FunctionDeclaration,
    GenerationConfig,
    GenerativeModel,
    Part,
    Tool,
)
import pandas as pd

### PDF-Based Function Calling: Extracting Company Data from Invoices

This example demonstrates how to use Gemini's multimodal function calling to process PDF documents. You'll work with a set of contracts and extract the details of the contract.

The 10 contracts are taken from the Kaggle data set (https://www.kaggle.com/datasets/eduardbalamatiuc/contracts-to-invoices-conversion-pdf-to-json?resource=download)

For ease of running this code, these 10 documents are available in the "contracts" folder

<img src="https://storage.cloud.google.com/contract_docs_kaggle/helper_stuff/contract_docs_screenshotpng" width="1000px">

Define a function called `get_contract_information` that  could be used to fetch details about a given list of contracts:

In [5]:
get_contract_information = FunctionDeclaration(
    name="get_company_information",
    description="Get information about a list of contracts",
    parameters={
        "type": "object",
        "properties": {
            "contract_details": {
                "type": "array",
                "description": "A list of contract details",
                "items": {
                    "type": "object",
                    "properties": {
                        "supplier": {
                            "type": "string",
                            "description": "The name of the supplier company"
                        },
                        "client": {
                            "type": "string",
                            "description": "The name of the client company"
                        },
                        "contract_number": {
                            "type": "string",
                            "description": "Contract Number"
                        },
                        "issue_date": {
                            "type": "string",
                            "description": "Issue date of the contract"
                        },
                        "due_date": {
                            "type": "string",
                            "description": "Due date of the contract"
                        },
                        "payment_terms": {
                            "type": "string",
                            "description": "Payment terms of the agreement. Include all of text related to Payment Terms"
                        },
                        "delivery_terms": {
                            "type": "string",
                            "description": "Delivery terms of the agreement. Include all of text related to Delivery Terms"
                        },
                        "contract_value": {
                            "type": "string",
                            "description": "Total amount of the contract"
                        },
                        "bank_name": {
                            "type": "string",
                            "description": "Name of the bank"
                        },
                        "account_number": {
                            "type": "string",
                            "description": "Bank Account Number"
                        }
                    },
                    "required": ["supplier", "client", "contract_number", "issue_date", "due_date", "payment_terms", "delivery_terms", "contract_value"]
                }
            },
        },
        "required": ["contract_details"]
    },
)

Package your newly defined function into a tool:

In [6]:
data_extraction_tool = Tool(
    function_declarations=[
        get_contract_information,
    ],
)

Now you can provide Gemini with multiple PDF invoices and ask it to get company information:

In [7]:
model = GenerativeModel("gemini-1.5-pro-002")
generation_config = GenerationConfig(temperature=0,
                                     max_output_tokens=8192)

# Set up the input data
contents = [
        Part.from_uri(
            f"""gs://{GCS_BUCKET}/contract1.pdf""",
            mime_type="application/pdf",
        ),
        Part.from_uri(
            f"""gs://{GCS_BUCKET}/contract2.pdf""",
            mime_type="application/pdf",
        ),
        Part.from_uri(
            f"""gs://{GCS_BUCKET}/contract3.pdf""",
            mime_type="application/pdf",
        ),
        Part.from_uri(
            f"""gs://{GCS_BUCKET}/contract4.pdf""",
            mime_type="application/pdf",
        ),
        Part.from_uri(
            f"""gs://{GCS_BUCKET}/contract5.pdf""",
            mime_type="application/pdf",
        ),
        Part.from_uri(
            f"""gs://{GCS_BUCKET}/contract6.pdf""",
            mime_type="application/pdf",
        ),
        Part.from_uri(
            f"""gs://{GCS_BUCKET}/contract7.pdf""",
            mime_type="application/pdf",
        ),
        Part.from_uri(
            f"""gs://{GCS_BUCKET}/contract8.pdf""",
            mime_type="application/pdf",
        ),
        Part.from_uri(
            f"""gs://{GCS_BUCKET}/contract9.pdf""",
            mime_type="application/pdf",
        ),
        Part.from_uri(
            f"""gs://{GCS_BUCKET}/contract10.pdf""",
            mime_type="application/pdf",
        ),
        "Inspect all of the 10 PDF files of contracts and retrieve information about the contract",
    ]


In [9]:
# Get the response
response = model.generate_content(
    contents,
    generation_config=generation_config,
    tools=[data_extraction_tool],
)

As expected, Gemini predicted the `get_company_information` function:

In [10]:
function_name = response.candidates[0].function_calls[0].name
function_name

'get_company_information'

The function arguments contain the list of company names extracted from the PDF invoices:

In [11]:
function_args = {
    key: value for key, value in response.candidates[0].function_calls[0].args.items()
}
function_args

{'contract_details': [{'client': 'XYZ Corp.',
   'supplier': 'ABC Supply Co.',
   'contract_number': 'ABC-2023-001',
   'issue_date': 'October 1, 2023',
   'due_date': 'October 15, 2023',
   'contract_value': '$216.00',
   'payment_terms': 'Clients are to ensure that the balance due of $216.00 is paid in full by October 15, 2023. In the case of payment\\ndelays extending beyond the due date, a 5% penalty will be applied to the amount due. The Supplier may levy this\\nafter a period of 5 days from the due date without further notice.',
   'delivery_terms': 'Delivery of said services will occur within ten (10) calendar days following the execution of this Contract amendment\\nunless otherwise stated or unless unforeseen circumstances arise which could delay delivery. Please keep in mind\\nthat once the Supplier dispatches the goods, the Client has five (5) days to inspect and notify the Supplier of any\\ndiscrepancies.\\n\\nIn the event of a delay beyond the standard delivery period due 

Let's create a function that will input these extracted arguments and then write them to a csv file for further processing

In [12]:
def write_data_to_csv(extracted_args):
  df = pd.DataFrame(extracted_args['contract_details'])
  df.to_csv('outputs/data.csv', index=False)

write_data_to_csv(function_args)

This example shows the power of Gemini for processing and extracting structured data from documents

## Conclusions

In this notebook, you explored the powerful capabilities of Gemini's multimodal function calling. You learned how to:

- Define functions and package them into tools.
- Send PDFs and prompts to Gemini.
- Extract predicted function calls and their parameters.
- Use the predicted output to make (or potentially make) API calls. Note: Here we just wrote to a csv file
