Document Parser Rag Pipeline
- Parse and index a document using OpenAI embeddings model
- Store it in a FAISS vector database that is used to inform user queries
- Use OpenAI API to answer questions specifically about the document

In [6]:
# Set OpenAI key
import os

os.environ["OPENAI_API_KEY"] = "key"

In [4]:
# Import and define LLM
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-3.5-turbo")
# llm.invoke("What can you do for me?")

In [5]:
# Parse and load PDF
from langchain_community.document_loaders import PyPDFLoader

loader = PyPDFLoader("./RAG_BioBloom.pdf")
pages = loader.load_and_split()

In [7]:
# Load to vector knowledge base 
from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings

faiss_index = FAISS.from_documents(pages, OpenAIEmbeddings())
faiss_index.save_local('biobloom')

In [8]:
# Load the saved embeddings
from langchain.chains import ConversationalRetrievalChain
from langchain.llms import OpenAI

loaded_vectors = FAISS.load_local("biobloom", OpenAIEmbeddings(), allow_dangerous_deserialization=True)

# Conversational Retrieval Chain
qa = ConversationalRetrievalChain.from_llm(
    llm = OpenAI(),
    retriever = loaded_vectors.as_retriever()
)

chat_history = []
def rag(query):
    response = qa({"question":query, "chat_history":chat_history})
    chat_history.append((query, response['answer']))
    return response['answer'].strip()

  warn_deprecated(


In [11]:
rag("What are BioBloom's solutions?")

"Sure, BioBloom's solutions are focused on innovative and sustainable biofuel production, which includes using algae-based technology to minimize waste and maximize yield. They also have a sustainable processing mechanism in place, which converts harmful substances into useful biofuels. This approach results in a high-quality final product with reduced emissions. Additionally, BioBloom has strategic operational and marketing advantages, such as utilizing India's year-long algae production, in-house algae production with vertical farming, and targeting untapped rural and coastal areas. They also have a diverse customer base, including industries like automobile, agriculture, and aviation. In terms of socioeconomic and environmental impact, BioBloom creates employment opportunities and economic growth in rural India, provides subsidized biofuels and byproducts for local farmers, and reduces village reliance on high-emission fuels. They also prioritize biodiversity conservation and ecosys