<a href="https://colab.research.google.com/github/navneet-g/google_collab_langchain_session/blob/main/LangChain_RAG_OpenAI.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Install dependencies

In [9]:
!pip install --q -U langchain langchain_community \
openai langchain-openai selenium unstructured \
langchain-text-splitters unstructured faiss-cpu

Import packages

In [10]:
from langchain_community.document_loaders import SeleniumURLLoader  # loading documents
from langchain.text_splitter import CharacterTextSplitter  # splitting text
from langchain_community.vectorstores import (
    FAISS,
)  # creating vector store from embeddings; can use chromadb instead as well
from langchain.chains import RetrievalQA  # creating qa system
from google.colab import userdata

from langchain_openai import ChatOpenAI  # using llm for qa system
from langchain_openai import OpenAIEmbeddings  # embedding text with openai
import openai


Initialize LLM

In [11]:
openai_api_key=userdata.get('OPENAI_API_KEY')

# docs https://python.langchain.com/docs/integrations/llms/google_ai/
llm = ChatOpenAI(temperature=0, model_name="gpt-3.5-turbo", openai_api_key=openai_api_key)

Load custom data

In [12]:

# load url
urls = [
    "https://en.wikipedia.org/wiki/96th_Academy_Awards",
]

loader = SeleniumURLLoader(urls=urls)
data = loader.load()


# split document by character
print("Splitting document by character...")
text_splitter = CharacterTextSplitter(
    separator="\n", chunk_size=1000, chunk_overlap=200
)

# split into multiple documents
print("Splitting into multiple documents...")
docs = text_splitter.split_documents(data)




Splitting document by character...
Splitting into multiple documents...


Create Vector Store

In [13]:
print("Creating vector store...")
# create vector store
db = FAISS.from_documents(docs, OpenAIEmbeddings( openai_api_key=openai_api_key))


Creating vector store...


In [14]:

# create retriever to ask questions using openai and vector store
print("Creating retriever...")
qa = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=db.as_retriever(search_kwargs={"k": 10}),
)

Creating retriever...


In [15]:
def ask_question(question):
    print("Asking question: " + question)
    print(qa.invoke(question))


In [17]:
ask_question("Who were the academy awards winners?")
ask_question("What date did the academy awards happen?")
ask_question("What date did the 96th academy awards happen?")

Asking question: Who were the academy awards winners?
{'query': 'Who were the academy awards winners?', 'result': "The winners of the 96th Academy Awards included Oppenheimer, which won 7 awards, Poor Things, which won 4 awards, and The Zone of Interest, which won 2 awards. Some of the winners in various categories were Hayao Miyazaki and Toshio Suzuki for Best Animated Feature, Ludwig Göransson for Best Original Score, and Billie Eilish and Finneas O'Connell for Best Original Song."}
Asking question: List the names of all academy awards nominees
{'query': 'List the names of all academy awards nominees', 'result': "I don't have access to the full list of nominees for the 96th Academy Awards. You may want to check the official Academy of Motion Picture Arts and Sciences website or other reputable sources for the complete list of nominees."}
Asking question: What date did the academy awards happen?
{'query': 'What date did the academy awards happen?', 'result': 'The 96th Academy Awards c