In [1]:
from langchain_community.llms import Ollama
llm = Ollama(model="llama2")

In [2]:
from langchain_core.prompts import ChatPromptTemplate
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are an expert data science."),
    ("user", "{input}")
])

chain = prompt | llm 

In [3]:
from langchain_core.output_parsers import StrOutputParser

output_parser = StrOutputParser()

In [4]:
chain = prompt | llm | output_parser

In [5]:
chain.invoke({"input": "define data"})

'\nAs an expert in data science, I can tell you that data refers to any information or facts that are collected, stored, and analyzed to gain insights or make informed decisions. Data can come in various forms, including numerical, textual, image, audio, and video. It can be structured or unstructured, and it can originate from a variety of sources, such as databases, spreadsheets, files, social media platforms, sensors, devices, and more.\n\nData can be used for various purposes, including:\n\n1. Analysis: Data is analyzed to identify patterns, trends, and relationships within the data itself or in comparison to other datasets.\n2. Modeling: Data is used to build models that can simulate real-world phenomena or make predictions about future events.\n3. Decision-making: Data is used to inform decisions based on its analysis and interpretation.\n4. Visualization: Data is visualized to help communicate insights and findings in a more accessible and engaging manner.\n5. Optimization: Data

In [6]:
from langchain_community.embeddings import OllamaEmbeddings
import os, json
import pandas as pd
from langchain_community.document_loaders.csv_loader import CSVLoader

path_to_json = 'jsons/'
json_files = [pos_json for pos_json in os.listdir(path_to_json) if pos_json.endswith('.json')]
##print(json_files)

for json_file in json_files:
    json_file_path = os.path.join(path_to_json, json_file)
    
    df = pd.read_json(json_file_path)

    csv_file_path = os.path.join(path_to_json, json_file.replace('.json', '.csv'))

    df.to_csv(csv_file_path, index=False)

csv_files = [pos_json for pos_json in os.listdir(path_to_json) if pos_json.endswith('.csv')]
print(csv_files)

loader = CSVLoader(file_path='jsons/01-02-2023.csv',
    csv_args={
    'delimiter': ',',
    'quotechar': '"',
    'fieldnames': ['DATA', 'CADERNO', 'PAGINA', 'NOME', 'PUBLICACAO', 'TEXTO', 'LISTANEGRITO']
})

data = loader.load()

embeddings = OllamaEmbeddings()

['03-06-2024.csv', '02-03-2023.csv', '02-01-2024.csv', '01-08-2024.csv', '03-10-2023.csv', '03-01-2023.csv', '04-04-2023.csv', '03-05-2024.csv', '02-04-2024.csv', '01-07-2024.csv', '04-01-2023.csv', '01-04-2024.csv', '03-03-2023.csv', '01-09-2023.csv', '04-03-2024.csv', '04-08-2023.csv', '03-08-2023.csv', '02-01-2023.csv', '01-08-2023.csv', '02-08-2023.csv', '03-02-2023.csv', '02-02-2023.csv', '04-05-2023.csv', '01-11-2023.csv', '02-09-2024.csv', '02-05-2023.csv', '04-07-2023.csv', '04-01-2024.csv', '01-02-2023.csv', '01-02-2024.csv', '03-04-2023.csv', '02-07-2024.csv', '01-03-2023.csv', '03-09-2024.csv', '02-08-2024.csv', '03-05-2023.csv', '02-10-2023.csv', '04-07-2024.csv', '03-07-2023.csv', '02-05-2024.csv', '03-07-2024.csv', '02-06-2023.csv', '02-02-2024.csv', '03-01-2024.csv', '03-04-2024.csv', '01-06-2023.csv', '01-03-2024.csv', '01-12-2023.csv', '04-04-2024.csv', '04-06-2024.csv']


In [7]:
from langchain_community.vectorstores import FAISS
from langchain_text_splitters import RecursiveCharacterTextSplitter


text_splitter = RecursiveCharacterTextSplitter()
documents = text_splitter.split_documents(data)
vector = FAISS.from_documents(documents, embeddings)

In [8]:
#geração de regex genericas com agrupamento de queries

from langchain.chains import create_retrieval_chain

from langchain.chains.combine_documents import create_stuff_documents_chain

prompt = ChatPromptTemplate.from_template("""Answer the following question based only on the provided context:

<context>
{context}
</context>

Question: {input}""")

document_chain = create_stuff_documents_chain(llm, prompt)

retriever = vector.as_retriever()
retrieval_chain = create_retrieval_chain(retriever, document_chain)

In [9]:
from langchain_core.documents import Document

print(document_chain.invoke({
    "input": "what insights can you give about this data?",
    "context": [Document(page_content="This is a database about the govement")]
}))


Based on the provided context, I can infer that the data in the database is related to the government or governance. However, without more specific information or context, it's difficult to provide detailed insights into the data. Some possible insights could include:

1. The database may contain information about the structure and organization of the government, such as the hierarchy of officials, departments, or agencies.
2. The database may include data on government policies, laws, and regulations, as well as their implementation and enforcement.
3. The database may provide information on government spending and budgeting, including allocations for various programs and initiatives.
4. The database may contain data on the demographic makeup of the population served by the government, such as age, gender, race, or socioeconomic status.
5. The database may offer insights into the public's perceptions and attitudes towards the government, including levels of trust, satisfaction, or di

In [10]:
response = retrieval_chain.invoke({"input": "what is the content of this database?"})
print(response["answer"])


Based on the provided context, the content of the database appears to be related to various contracts and purchases made by the Prefeitura Municipal de Alcântaras in Ceara, Brazil. The database includes information such as:

1. Data: Dates of the contracts and purchases.
2. Caderno: Serial numbers of the contracts and purchases.
3. Pagina: Page numbers of the contracts and purchases.
4. Nome: Names of the contracting parties or the entities involved in the contracts and purchases.
5. Publicacao: Amounts of the public tenders and purchases.
6. Tекsto: Texts of the contracts and purchases, including the object, scope, and other details.
7. Listanegrito: Lists of the contracts and purchases, including the titles or references of the documents.

The database also includes information related to the selection of better proposals for future contracts and purchases, as well as the evaluation of tenderers and suppliers.


In [11]:
response = retrieval_chain.invoke({"input": "what is the general idea of this data?"})
print(response["answer"])


Based on the provided context, the general idea of this data appears to be related to a tender or bidding process for a construction project in the city of Russas, CE, Brazil. The data includes a list of registered companies and individuals who have expressed interest in participating in the project, as well as their respective capacities and limitations.

The context also mentions the presence of a president of a permanent commission, which suggests that the tender process is being overseen by a formal committee or board. Additionally, the use of terms such as "licitation," "portaria," and "sub-rogação" suggest that the project is being managed through a formal legal framework.

Overall, the data appears to be related to a specific construction project in Russas, CE, and provides information on the registered participants and their capabilities for the tender process.


In [12]:
import { ChatOllama } from "@langchain/community/chat_models/ollama";

const chatModel = new ChatOllama({
  baseUrl: "http://localhost:11434", // Default value
  model: "mistral",
});

SyntaxError: invalid syntax (2021891032.py, line 1)