# 1.0 - Document Loading and QA Retrieval

Nos códigos anteriores, utilizamos LLMs treinadas para alguns experimentos mais iniciais, desde a utilização de um prompt definido, até a utilização de um chatbot com memória suficiente para armazenar as conversas.
No arquivo atual, vamos avançar para próximo nível: Vamos ser capazes de treinar utilizar uma IA em nossos próprios documentos, trazendo informações deles.

Para isto, precisamos de um arquivo de  texto para que possamos importar e fazer as perguntas. Para este este documento, vamos utilizar o texto "President Biden’s State of the Union Address
. Mas fique a vontade para utilizar qualquer um de sua preferencia."

In [1]:
from langchain import OpenAI
from langchain.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.chains import RetrievalQA

import warnings
from dotenv import load_dotenv
load_dotenv()
warnings.filterwarnings('ignore')

In [3]:
loader = TextLoader('documents/doc.txt', autodetect_encoding=True)
documents = loader.load()

#### 1.1 - Split documents into smaller sections, as lengthy texts may exceed the model's maximum processing capacity.

In [8]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
texts = text_splitter.split_documents(documents)

embeddings = OpenAIEmbeddings()
store = Chroma.from_documents(texts, embeddings, collection_name="state-of-the-union")

In [9]:
llm = OpenAI(temperature=0)
chain = RetrievalQA.from_chain_type(llm, retriever = store.as_retriever())

In [10]:
print(chain.run("What did Buden talk about Ohio"))

 Biden talked about the Brent Spence Bridge in Kentucky, which crosses the Ohio River and is in need of repairs. He also mentioned meeting a young woman named Saria from Ohio who is a member of the Iron workers Local 44 and is excited to work on the bridge project. Additionally, Biden mentioned the $1.6 billion commitment for the project and the importance of investing in American infrastructure and creating jobs.
