<a href="https://colab.research.google.com/github/eduardoplima/artists-expenditure-llm/blob/main/artists.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Identifying artists in public expenditure using LLMs

## Author: Eduardo P. Lima

## Summary

The Brazilian Audit Courts have, among their constitutional attributions, the responsibility of monitoring the expenses with cultural events and artistic presentations in general of the government departments that report to them. To this end, the Audit Courts receive information from the departments under their jurisdiction about the expenditures of this nature.

However, this information is not structured in a way that facilitates the identification of the artists hired. Therefore, it is necessary to use Natural Language Processing techniques to extract this information in order to assess the regular payment of these contracts.

This notebook shows the use of techniques for this purpose, especially the use of Large Language Models (LLM).

### Keypoints

* Point 1




In [3]:
!pip install gdown



In [6]:
import gdown

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

import langchain
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.pydantic_v1 import BaseModel, Field
#from langchain_openai import

We load our dataset from the xlsx file. It has 3 columns, concerning the description of the procurement process, contract and subsequent prepayment. We have to look for an artist identification in those columns.

In [43]:
url = "https://github.com/eduardoplima/artists-expenditure-llm/raw/refs/heads/main/festas_juninas.xlsx"
output = "artists.xlsx"
gdown.download(url, output)

Downloading...
From: https://github.com/eduardoplima/artists-expenditure-llm/raw/refs/heads/main/festas_juninas.xlsx
To: /content/artists.xlsx
100%|██████████| 1.14M/1.14M [00:00<00:00, 25.2MB/s]


'artists.xlsx'

In [44]:
#df_art = pd.read_csv('artists.csv', on_bad_lines='skip')
df_art = pd.read_excel('artists.xlsx', engine='openpyxl')

In [45]:
df_art.head(10)

Unnamed: 0,objeto_contrato,justificativa,objeto_licitacao
0,contratação da empresa A. NUNES DE ARAÚJO PROD...,"Despesa com diária em favor da servidora, NAYA...",Contratação de empresa especializada no fornec...
1,contratação da empresa A. NUNES DE ARAÚJO PROD...,Ref. empenho estimativo de diárias nacionais p...,Contratação de empresa especializada no fornec...
2,contratação da empresa A. NUNES DE ARAÚJO PROD...,Ref. empenho estimativo de diárias internacion...,Contratação de empresa especializada no fornec...
3,,Referente despesa com 4º termo aditivo empenho...,
4,contratação da empresa A. NUNES DE ARAÚJO PROD...,Referente despesa do 4º termo aditivo empenho ...,Contratação de empresa especializada no fornec...
5,,Ref. serviço de fornecimento de passagens aére...,
6,,Ref. serviço de fornecimento de passagens aére...,
7,,Referente despesa com participação no lounge m...,
8,contratação da empresa A. NUNES DE ARAÚJO PROD...,Referente empenho com participação no evento s...,Contratação de empresa especializada no fornec...
9,,Despesa com participação Expoturismo Paraná d...,
