<a href="https://colab.research.google.com/github/eduardoplima/artists-expenditure-llm/blob/main/artists.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Identifying artists in public expenditure using LLMs

## Author: Eduardo P. Lima

## Summary

The Brazilian Audit Courts have, among their constitutional attributions, the responsibility of monitoring the expenses with cultural events and artistic presentations in general of the government departments that report to them. To this end, the Audit Courts receive information from the departments under their jurisdiction about the expenditures of this nature.

However, this information is not structured in a way that facilitates the identification of the artists hired. Therefore, it is necessary to use Natural Language Processing techniques to extract this information in order to assess the regular payment of these contracts.

This notebook shows the use of techniques for this purpose, especially the use of Large Language Models (LLM).

### Keypoints

* Point 1




In [18]:
!pip install gdown langchain_openai langgraph

Collecting langgraph
  Downloading langgraph-0.2.59-py3-none-any.whl.metadata (15 kB)
Collecting langgraph-checkpoint<3.0.0,>=2.0.4 (from langgraph)
  Downloading langgraph_checkpoint-2.0.9-py3-none-any.whl.metadata (4.6 kB)
Collecting langgraph-sdk<0.2.0,>=0.1.42 (from langgraph)
  Downloading langgraph_sdk-0.1.45-py3-none-any.whl.metadata (1.8 kB)
Downloading langgraph-0.2.59-py3-none-any.whl (135 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m135.8/135.8 kB[0m [31m10.1 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading langgraph_checkpoint-2.0.9-py3-none-any.whl (37 kB)
Downloading langgraph_sdk-0.1.45-py3-none-any.whl (40 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m40.9/40.9 kB[0m [31m3.4 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: langgraph-sdk, langgraph-checkpoint, langgraph
Successfully installed langgraph-0.2.59 langgraph-checkpoint-2.0.9 langgraph-sdk-0.1.45


In [20]:
import os
import gdown
import getpass

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

import langchain
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.pydantic_v1 import BaseModel, Field
from langchain_openai import ChatOpenAI


from langgraph.graph import StateGraph, END
#from langchain_openai import

## Dataset loading

We load our dataset from the xlsx file. It has 3 columns, concerning the description of the procurement process, contract and subsequent prepayment. We have to look for an artist identification in those columns.

In [9]:
url = "https://github.com/eduardoplima/artists-expenditure-llm/raw/refs/heads/main/festas_juninas.xlsx"
output = "artists.xlsx"
gdown.download(url, output)

Downloading...
From: https://github.com/eduardoplima/artists-expenditure-llm/raw/refs/heads/main/festas_juninas.xlsx
To: /content/artists.xlsx
100%|██████████| 1.14M/1.14M [00:00<00:00, 84.9MB/s]


'artists.xlsx'

In [10]:
#df_art = pd.read_csv('artists.csv', on_bad_lines='skip')
df_art = pd.read_excel('artists.xlsx', engine='openpyxl')

In [11]:
df_art.head(10)

Unnamed: 0,contract,prepayment,procurement
0,contratação da empresa A. NUNES DE ARAÚJO PROD...,"Despesa com diária em favor da servidora, NAYA...",Contratação de empresa especializada no fornec...
1,contratação da empresa A. NUNES DE ARAÚJO PROD...,Ref. empenho estimativo de diárias nacionais p...,Contratação de empresa especializada no fornec...
2,contratação da empresa A. NUNES DE ARAÚJO PROD...,Ref. empenho estimativo de diárias internacion...,Contratação de empresa especializada no fornec...
3,,Referente despesa com 4º termo aditivo empenho...,
4,contratação da empresa A. NUNES DE ARAÚJO PROD...,Referente despesa do 4º termo aditivo empenho ...,Contratação de empresa especializada no fornec...
5,,Ref. serviço de fornecimento de passagens aére...,
6,,Ref. serviço de fornecimento de passagens aére...,
7,,Referente despesa com participação no lounge m...,
8,contratação da empresa A. NUNES DE ARAÚJO PROD...,Referente empenho com participação no evento s...,Contratação de empresa especializada no fornec...
9,,Despesa com participação Expoturismo Paraná d...,


## Model creation

We create the models that we'll use on our agents

In [15]:
os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")

OpenAI API Key:··········


In [17]:
model = ChatOpenAI(temperature=0, model_name="gpt-4o-mini")