## LangChain RAG prototyping



In [None]:
#%%pip install poetry

In [None]:
#%%poetry install

In [10]:
from dotenv import find_dotenv, load_dotenv
from pydantic_settings import BaseSettings, SettingsConfigDict

load_dotenv(find_dotenv(".env"))


class BaseConfig(BaseSettings):
    """Base settings."""

    OPENAI_API_KEY: str | None = None
    model_config = SettingsConfigDict(env_file=find_dotenv(".env"), extra="ignore")


config = BaseConfig()
OPENAI_API_KEY = config.OPENAI_API_KEY

In [2]:
from sklearn.datasets import fetch_20newsgroups
from langchain.schema import Document

newsgroups_data = fetch_20newsgroups(subset='train', categories=['sci.space'])  # minimalize amount of documents for testing
documents = [Document(page_content=text) for text in newsgroups_data["data"]]

In [3]:
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1024, chunk_overlap=128)

splits = text_splitter.split_documents(documents)
# vectorstore = Chroma.from_documents(documents=splits, embedding=OpenAIEmbeddings(), persist_directory="chroma_db")  # first run
vectorstore = Chroma(persist_directory="chroma_db", embedding_function=OpenAIEmbeddings())  # subsequent runs

In [4]:
retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 5})

In [5]:
out_docs = retriever.invoke("How does one become an astronaut?")

print(len(out_docs))
for item in out_docs:
    print(item)

5
page_content='From: leech@cs.unc.edu (Jon Leech)
Subject: Space FAQ 14/15 - How to Become an Astronaut
Keywords: Frequently Asked Questions
Article-I.D.: cs.astronaut_733694515
Expires: 6 May 1993 20:01:55 GMT
Distribution: world
Organization: University of North Carolina, Chapel Hill
Lines: 313
Supersedes: <astronaut_730956661@cs.unc.edu>
NNTP-Posting-Host: mahler.cs.unc.edu

Archive-name: space/astronaut
Last-modified: $Date: 93/04/01 14:39:02 $

HOW TO BECOME AN ASTRONAUT

    First the short form, authored by Henry Spencer, then an official NASA
    announcement.

    Q. How do I become an astronaut?'
page_content='Q. How do I become an astronaut?

    A. We will assume you mean a NASA astronaut, since it's probably
    impossible for a non-Russian to get into the cosmonaut corps (paying
    passengers are not professional cosmonauts), and the other nations have
    so few astronauts (and fly even fewer) that you're better off hoping to
    win a lottery. Becoming a shuttle pilot

In [8]:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_core.prompts import PromptTemplate
from langchain_openai import ChatOpenAI


def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)


llm = ChatOpenAI(model="gpt-4o-mini")

prompt = PromptTemplate.from_template("""
You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.
Question: {question} 
Context: {context} 
Answer:
""")

rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

In [9]:
out_docs = rag_chain.invoke("How does one become an astronaut?")

In [11]:
print(out_docs)

To become a NASA astronaut, you must be a U.S. citizen with a bachelor's degree in engineering, biological science, physical science, or mathematics, along with at least three years of relevant professional experience. Applicants must also pass a NASA class II space physical and undergo a year of training and evaluation if selected. Due to high competition, simply meeting qualifications is not enough; you must also avoid disqualifications.


### To become a NASA astronaut, you must be a U.S. citizen with a bachelor's degree in engineering, biological science, physical science, or mathematics, along with at least three years of relevant professional experience. Applicants must also pass a NASA class II space physical and undergo a year of training and evaluation if selected. Due to high competition, simply meeting qualifications is not enough; you must also avoid disqualifications.