## **Simple RAG Process**

We'll read in a PDF file and ask a question about it.

In [3]:
import openai
import os
from dotenv import load_dotenv, find_dotenv

_ = load_dotenv(find_dotenv())  # read local .env file
openai.api_key = os.environ["OPENAI_API_KEY"]

In [9]:
from langchain.embeddings import OpenAIEmbeddings
from langchain.llms import OpenAI
from langchain.chat_models import ChatOpenAI
from langchain.schema import StrOutputParser
from langchain.schema.runnable import RunnablePassthrough
from langchain.chains.question_answering import load_qa_chain
from langchain.prompts import PromptTemplate
from src.langchainHelpers import PdfLoad, ChromaDB, ChromaRetriever

## **Set Constants**

In [4]:
PDF_FILEPATH = "big-book-of-machine-learning-use-cases-2nd-edition.pdf"
DIRECTORY_LOCATION = 'vectorDB'
EMBEDDING = OpenAIEmbeddings(model="text-embedding-ada-002")
LLM = OpenAI()

Now we can extract the document and chunk it up.

In [5]:
texts = PdfLoad(PDF_FILEPATH).characterSplitter(chunk_size=1000, chunk_overlap=200)

We can either create the vector store for the first time or, if we have persisted it already we can simply load it.

In [6]:
# Uncomment if first time run
#vectordb = ChromaDB(persist_directory=DIRECTORY_LOCATION, embedding=EMBEDDING).initiate(texts=texts, persist=True)
vectordb = ChromaDB(persist_directory=DIRECTORY_LOCATION, embedding=EMBEDDING).load()
retriever = ChromaRetriever(vectordb, k=3)

Now let's ask a question...

In [7]:
chain = load_qa_chain(
    LLM,
    chain_type="map_rerank",
    return_intermediate_steps=True,
)

query = "what is MLOps?"
docs = retriever.get_relevant_documents(query)
results = chain({"input_documents": docs, "question": query}, return_only_outputs=True)
results["output_text"]



' MLOps is a combination of software engineering and machine learning practices that are used to optimize the machine learning production lifecycle. It is a set of processes and tools that enable data teams to manage and deploy machine learning models. '