[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ruifcruz/WCARS_2025/blob/main/1.Automatize_interactions_with_an_LLM/1_Automatize_interactions_with_an_LLM.ipynb)


# Generative AI applications with python
@Francisco Cruz - francisco.cruz@iscte-iul.pt

# App 1 - Automatize interactions with an LLM

OpenAI is the company responsible for ChatGPT. OpenAI's models and applications are not open source. This notebook is intended to demonstrate how a Python script can be used to communicate with the service using an API.

The objective of the code in this notebook is to communicate with OpenAI, repeatedly requesting it to extract covenants from different contracts.

**Note**: There are several alternatives to OpenAI which provide LLM models with similar quality (Claude, Mistral, Gemini, Deepseek, HuggingFace, etc)

# 0. Preparation steps

In [1]:
# Download and install openai package on the computer
%%capture
!pip install openai==1.76.0 PyPDF2==3.0.1

In [2]:
# We need to instruct the computer to access the packages that will be needed along the notebook
# pandas package is needed to store data on a table
import pandas as pd
# openai and pydantic packages are needed to communicate with OpenAI API
import openai
from pydantic import BaseModel
from typing import Optional
# IPython and json packages are needed to print easier to read outputs
from IPython.display import display, Markdown
import pprint
import json
# google.colab package is needed to access client secrets (explained on the next cells) and download output files
from google.colab import userdata, files
# requests, PyPDF2 and io packages are needed to download and read content from pdf files
import requests
import PyPDF2
import io
import os

In [3]:
# Creating a function that will be useful ahead
def print_text(text):
  display(Markdown(text))

To access OpenAI's services, you need to have a password (api key).

To obtain one, you must create an account at https://platform.openai.com/, go to 'API keys' and create a key. You will need to have a payment method associated with your account.

The key can be placed directly in the code, but that is not considered secure. Therefore, in Colab you can use the 'Secrets' tab (on the left), where you can add a line with:
- 'Name' equal to OPENAI_API_KEY
- 'Value' equal to the key"

In [4]:
# The computer will read and store the key that was pasted on the tab "Secrets"
os.environ["OPENAI_API_KEY"] = userdata.get('OPENAI_API_KEY')

# 1. Basic API call to OpenAI

In [5]:
# To avoid repeating code, we can create a function that sends a text to ChatGPT and returns its response
def get_answer(prompt, model="gpt-4o-mini"):
    # When we send a message to ChatGPT, we need to send it in a dictionary format
    # In this case, the variable 'messages' contains one message, which is the "prompt", a parameter of the function
    messages = [{"role": "user", "content": prompt}]
    # The function openai.chat.completions.create is used to send (via API) a message to ChatGPT
    response = openai.chat.completions.create(
        model=model,
        messages=messages,
        temperature=0,
    )
    return response.choices[0].message.content

In [6]:
# Let's then send a message to OpenAI and print the response
print_text(get_answer("I want to become an Auditor. Is it a fancy profession?"))

Becoming an auditor can be considered a prestigious and respected profession, especially in the fields of finance and accounting. Auditors play a crucial role in ensuring the accuracy and integrity of financial statements, compliance with regulations, and the overall financial health of organizations. Here are a few points to consider regarding the profession:

1. **Professional Recognition**: Auditors, especially those who hold certifications like CPA (Certified Public Accountant) or CA (Chartered Accountant), are often highly regarded in the business community.

2. **Career Opportunities**: There are various paths within auditing, including internal auditing, external auditing, and specialized areas like forensic auditing. This diversity can lead to numerous career opportunities.

3. **Impact**: Auditors help organizations maintain transparency and accountability, which is vital for stakeholders, including investors, regulators, and the public.

4. **Skill Development**: The profession requires strong analytical skills, attention to detail, and a solid understanding of accounting principles and regulations. These skills are valuable and transferable to other roles in finance and business.

5. **Work Environment**: Auditors often work in a variety of settings, including public accounting firms, corporate finance departments, and government agencies. The work can involve travel, especially for external auditors.

6. **Compensation**: Auditing can be financially rewarding, with competitive salaries and benefits, particularly for those with advanced certifications and experience.

While the profession may not have the same level of glamour as some other fields, it is certainly respected and can be very fulfilling for those who enjoy working with numbers, analyzing data, and ensuring compliance. If you have an interest in finance and accounting, pursuing a career as an auditor could be a great choice.

In [7]:
# Let's now create a fake financial contract
fake_contract = """
FINANCIAL CONTRACT
Date: April 25, 2025

PARTIES:
This Contract is entered into by and between:
Lender: Nova Capital Investments, Inc., with registered office at 100 Market St., Suite 400, San Francisco, CA 94105, represented by James Turner, CEO.
Borrower: GreenTech Solutions Ltd., with registered office at 45 Innovation Drive, Austin, TX 78701, represented by Laura Chen, CFO.

1. OBJECT OF THE CONTRACT
The Lender agrees to provide a financial loan to the Borrower to support the development of its solar panel manufacturing facility.

2. LOAN AMOUNT AND TERMS

    Total Loan Amount: $500,000.00 USD

    Interest Rate: 4.5% per annum

    Repayment Term: 24 months

    First repayment due: June 1, 2025

3. PAYMENT SCHEDULE
Repayments shall be made in equal monthly installments of $21,730.00, including principal and interest, paid on the 1st of each month.

4. BORROWER’S OBLIGATIONS
The Borrower agrees to:
a) Use funds exclusively for the development of the solar facility.
b) Provide quarterly financial reports to the Lender.
c) Notify the Lender of any significant changes in ownership or management.
d) Maintain a dedicated bank account for loan-related transactions.

5. DEFAULT
Failure to make two consecutive payments shall constitute a default, granting the Lender the right to demand immediate repayment of the outstanding balance.

6. GOVERNING LAW
This Contract shall be governed by the laws of the State of California.

Signed on April 25, 2025

James Turner
CEO, Nova Capital Investments, Inc.

Laura Chen
CFO, GreenTech Solutions Ltd.
"""

In [8]:
# Now sending the contract as a message to OpenAI
print_text(get_answer(fake_contract))

**FINANCIAL CONTRACT**

**Date:** April 25, 2025

**PARTIES:**
This Contract is entered into by and between:
- **Lender:** Nova Capital Investments, Inc., with registered office at 100 Market St., Suite 400, San Francisco, CA 94105, represented by James Turner, CEO.
- **Borrower:** GreenTech Solutions Ltd., with registered office at 45 Innovation Drive, Austin, TX 78701, represented by Laura Chen, CFO.

**1. OBJECT OF THE CONTRACT**  
The Lender agrees to provide a financial loan to the Borrower to support the development of its solar panel manufacturing facility.

**2. LOAN AMOUNT AND TERMS**  
- **Total Loan Amount:** $500,000.00 USD  
- **Interest Rate:** 4.5% per annum  
- **Repayment Term:** 24 months  
- **First repayment due:** June 1, 2025  

**3. PAYMENT SCHEDULE**  
Repayments shall be made in equal monthly installments of $21,730.00, including principal and interest, paid on the 1st of each month.

**4. BORROWER’S OBLIGATIONS**  
The Borrower agrees to:  
a) Use funds exclusively for the development of the solar facility.  
b) Provide quarterly financial reports to the Lender.  
c) Notify the Lender of any significant changes in ownership or management.  
d) Maintain a dedicated bank account for loan-related transactions.

**5. DEFAULT**  
Failure to make two consecutive payments shall constitute a default, granting the Lender the right to demand immediate repayment of the outstanding balance.

**6. GOVERNING LAW**  
This Contract shall be governed by the laws of the State of California.

**Signed on April 25, 2025**

_________________________  
James Turner  
CEO, Nova Capital Investments, Inc.

_________________________  
Laura Chen  
CFO, GreenTech Solutions Ltd.

What is ChatGPT answering? We didn't ask any questions, we simply sent the contract. Since there is no clear question or instruction, it just assumes that it should create a continuation of the contract or summarize it.

We need to be more precise and make the model understand what we want from it. This is called **prompt engineering**.

# 2. Call OpenAI API with a prompt

In [9]:
# Create a function that receives a financial contract and returns concatenates it with prompting instructions
def get_contract_analysis_prompt(contract):
  prompt = f"""
    The following content delimited with triple backticks is a financial contract.
    You are an Auditor who should find 5 covenants in the contract and summarize them in bullet points.

    Provide your answer in english. Only provide the bullet points and nothing else.
    If there is an article related with the covenant, provide its name or number in parenthesis at the beginning of the bullet point.

    Financial Contract: '''{contract}'''
    """
  return prompt

In [10]:
# Let's see what would be the concatenation of the base prompt with the contract
prompt_for_fake_contract = get_contract_analysis_prompt(fake_contract)
print_text(prompt_for_fake_contract)


    The following content delimited with triple backticks is a financial contract.
    You are an Auditor who should find 5 covenants in the contract and summarize them in bullet points.

    Provide your answer in english. Only provide the bullet points and nothing else.
    If there is an article related with the covenant, provide its name or number in parenthesis at the beginning of the bullet point.

    Financial Contract: '''
FINANCIAL CONTRACT
Date: April 25, 2025

PARTIES:
This Contract is entered into by and between:
Lender: Nova Capital Investments, Inc., with registered office at 100 Market St., Suite 400, San Francisco, CA 94105, represented by James Turner, CEO.
Borrower: GreenTech Solutions Ltd., with registered office at 45 Innovation Drive, Austin, TX 78701, represented by Laura Chen, CFO.

1. OBJECT OF THE CONTRACT
The Lender agrees to provide a financial loan to the Borrower to support the development of its solar panel manufacturing facility.

2. LOAN AMOUNT AND TERMS

    Total Loan Amount: $500,000.00 USD

    Interest Rate: 4.5% per annum

    Repayment Term: 24 months

    First repayment due: June 1, 2025

3. PAYMENT SCHEDULE
Repayments shall be made in equal monthly installments of $21,730.00, including principal and interest, paid on the 1st of each month.

4. BORROWER’S OBLIGATIONS
The Borrower agrees to:
a) Use funds exclusively for the development of the solar facility.
b) Provide quarterly financial reports to the Lender.
c) Notify the Lender of any significant changes in ownership or management.
d) Maintain a dedicated bank account for loan-related transactions.

5. DEFAULT
Failure to make two consecutive payments shall constitute a default, granting the Lender the right to demand immediate repayment of the outstanding balance.

6. GOVERNING LAW
This Contract shall be governed by the laws of the State of California.

Signed on April 25, 2025

James Turner
CEO, Nova Capital Investments, Inc.

Laura Chen
CFO, GreenTech Solutions Ltd.
'''
    

In [11]:
# Sending the prompt to OpenAI
print_text(get_answer(prompt_for_fake_contract))

- (4) The Borrower agrees to use funds exclusively for the development of the solar facility.
- (4) The Borrower must provide quarterly financial reports to the Lender.
- (4) The Borrower is required to notify the Lender of any significant changes in ownership or management.
- (4) The Borrower must maintain a dedicated bank account for loan-related transactions.
- (5) Failure to make two consecutive payments constitutes a default, allowing the Lender to demand immediate repayment of the outstanding balance.

# 3. Apply to a real dataset of contracts

We will use four financial contracts which are publicly available:

- Addendum to Agreement for [Rede Nacional de Cuidados Continuados Integrados e Rede Nacional de Cuidados Paliativos](https://github.com/ruifcruz/WCARS_2025/blob/a3f268f4fdab530eda3989c4d76bc9f946b639fe/PRR-Adenda-ao-Contrato-Financeiro-27-setembro.pdf) (National Network of Integrated Continuing Care and National Network of Palliative Care)
- Service Provision Contract with the [Instituto de Financiamento da Agricultura e Pesca](https://raw.githubusercontent.com/ruifcruz/WCARS_2025/main/1986189.pdf) (Institute for Financing Agriculture and Fisheries)
- Financing contract for [Empresa Municipal de Ambiente do Porto](https://raw.githubusercontent.com/ruifcruz/WCARS_2025/main/2173664.pdf) (Municipal Environment Company of Porto)
- Financing contract for [Ambiolhão](https://raw.githubusercontent.com/ruifcruz/WCARS_2025/main/2245726.pdf) (Municipal Environment Company of Olhão)

## 3.1. Get the data

In [12]:
# List of pdf URLs
pdf_urls = [
    "https://raw.githubusercontent.com/ruifcruz/WCARS_2025/main/1.Automatize_interactions_with_an_LLM/source_data/PRR-Adenda-ao-Contrato-Financeiro-27-setembro.pdf",
    "https://raw.githubusercontent.com/ruifcruz/WCARS_2025/main/1.Automatize_interactions_with_an_LLM/source_data/1986189.pdf",
    "https://raw.githubusercontent.com/ruifcruz/WCARS_2025/main/1.Automatize_interactions_with_an_LLM/source_data/2173664.pdf",
    "https://raw.githubusercontent.com/ruifcruz/WCARS_2025/main/1.Automatize_interactions_with_an_LLM/source_data/2245726.pdf",
    ]

# Creating a loop in which a file is downloaded, read and stored in memory
data_files = []
for url in pdf_urls:
    try:
        response = requests.get(url)
        response.raise_for_status()

        pdf_file = io.BytesIO(response.content)
        reader = PyPDF2.PdfReader(pdf_file)

        text = ""
        for page in reader.pages:
            text += page.extract_text() or ""

        file_name = url.split("/")[-1]
        data_files.append({"file_name": file_name, "content": text.strip()})

    except Exception as e:
        print(f"Error processing {url}: {e}")

# Create table with the data (when using pandas, a table is called a DataFrame)
df = pd.DataFrame(data_files)
# Create two copies of the dataframe to use later
df_simple_extraction = df.copy()
df_structured_extraction = df.copy()

In [13]:
# Let's see the contents of the table
df

Unnamed: 0,file_name,content
0,PRR-Adenda-ao-Contrato-Financeiro-27-setembro.pdf,Página 1 de 8 \n \nADENDA AO CONTRATO DE FINAN...
1,1986189.pdf,NÌVEL DE SEGURANÇA: \nINFORMAÇÃO PÚBLICA \n ...
2,2173664.pdf,\n\t\n\n\r\n\t\r...
3,2245726.pdf,"AMBIOLHAO EMPRESA MUNICIPAL AMBIENTE \nOLHAO,E..."


## 3.2 Call OpenAI API for every contract

In [14]:
# Create a new column on the table named "covenants".
# For each row, it should read the "content" column, generate a prompt and send it to OpenAI
df_simple_extraction['covenants'] = df_simple_extraction['content'].apply(lambda x: get_answer(get_contract_analysis_prompt(x)))

In [15]:
# Let's see the contents of the table with the new column (while hiding the "content" column for easier visualization of the result)
df_simple_extraction[['file_name', 'covenants']]

Unnamed: 0,file_name,covenants
0,PRR-Adenda-ao-Contrato-Financeiro-27-setembro.pdf,- (CLÁUSULA PRIMEIRA) The contract provides fi...
1,1986189.pdf,- (Cláusula 3.ª) The contract is valid for fiv...
2,2173664.pdf,- (8) The Lessee must maintain sufficient fund...
3,2245726.pdf,- (10.1) The Lessee must maintain sufficient f...


# 4. Extract structured information

In [16]:
# In order to get the information on a structured manner, we need to provide OpenAi with an object ContractData that represents the structure
class ContractData(BaseModel):
    contract_date: Optional[str]
    first_party: Optional[str]
    second_party: Optional[str]
    other_parties: Optional[list[str]]
    loan_amount: Optional[float]
    term_date: Optional[str]
    governing_law: Optional[str]
    covenants: Optional[str]

# Create a similar function to communicate with OpenAi. In this case it provides the object ContractData
def get_structured_answer(prompt, model="gpt-4.1"):

    messages = [{"role": "user", "content": prompt}]
    system_prompt = """Extract the contract information in english. When extracting covenants, separate them in bullet points and
        if there is an article related with the covenant, provide its name or number in parenthesis at the beginning of the bullet point."""
    response = openai.OpenAI().beta.chat.completions.parse(
        model=model,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": prompt},
        ],
        temperature=0,
        response_format=ContractData
    )
    return json.loads(response.choices[0].message.content)

In [17]:
# Let's first try this new function with the fake contract
result = get_structured_answer(fake_contract)

In [18]:
# The result of the extraction
print(json.dumps(result, indent=2))

{
  "contract_date": "April 25, 2025",
  "first_party": "Nova Capital Investments, Inc.",
  "second_party": "GreenTech Solutions Ltd.",
  "other_parties": null,
  "loan_amount": 500000,
  "term_date": null,
  "governing_law": "State of California",
  "covenants": "- (Article 4a) Use funds exclusively for the development of the solar facility.\n- (Article 4b) Provide quarterly financial reports to the Lender.\n- (Article 4c) Notify the Lender of any significant changes in ownership or management.\n- (Article 4d) Maintain a dedicated bank account for loan-related transactions."
}


In [19]:
# Let's now apply the same logic to the contracts in the table
new_cols = df_structured_extraction['content'].apply(lambda x: get_structured_answer(x)).apply(pd.Series)

In [20]:
# Concatenate the result with the existing table
df_with_extracted_date = pd.concat([df_structured_extraction, new_cols], axis=1)

In [21]:
# We may see that the new table includes the information extracted by the LLM
df_with_extracted_date

Unnamed: 0,file_name,content,contract_date,first_party,second_party,other_parties,loan_amount,term_date,governing_law,covenants
0,PRR-Adenda-ao-Contrato-Financeiro-27-setembro.pdf,Página 1 de 8 \n \nADENDA AO CONTRATO DE FINAN...,"2023-02-02 (original contract), addendum date ...","ADMINISTRAÇÃO CENTRAL DO SISTEMA DE SAÚDE, I.P...","HOSPITAL DISTRITAL DA FIGUEIRA DA FOZ, E.P.E. ...",,840000.0,2025-12-31,Portuguese Law (referenced by various Decree-L...,- (Clause 1) The contract grants financial sup...
1,1986189.pdf,NÌVEL DE SEGURANÇA: \nINFORMAÇÃO PÚBLICA \n ...,2023-04-05,"IFAP, IP – Instituto de Financiamento da Agric...","NLS, New Link Solutions – Consultoria e Engenh...",,,2028-12-31,Portuguese law (Código dos Contratos Públicos ...,- (Clause 7) The Second Party must fully compl...
2,2173664.pdf,\n\t\n\n\r\n\t\r...,2023-08-01,"Caixa Geral de Depósitos, S.A.","Empresa Municipal Ambiente Porto, E.M., S.A.",,784000.0,2031-08-01,Portuguese law; jurisdiction of Lisbon,- (General Conditions 4th) The Lessee must pay...
3,2245726.pdf,"AMBIOLHAO EMPRESA MUNICIPAL AMBIENTE \nOLHAO,E...",2025-03-24,"Caixa Geral de Depósitos, S.A.","AMBIOLHÃO EMPRESA MUNICIPAL AMBIENTE OLHÃO, E.M.",[MAN TRUCK & BUS PORTUGAL SOCIEDADE UNIPESSOAL...,91950.0,2030-03-24,Portuguese law; jurisdiction: Lisbon,- (Clause 4) The Lessee must pay 60 monthly re...


In [22]:
# Export the DataFrame to an Excel file, excluding the 'content' column
df_with_extracted_date.drop(columns=['content']).to_excel('contract_extracted_data.xlsx', index=False)

# Download the file
files.download('contract_extracted_data.xlsx')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

Exercise: try it for yourself! Go some cells up, add new fields to the ContractData class. Run the remaining cells and see if that information is extracted correctly