# RAG (Retrieve Augmented Generation) Mockup

This is a mockup for the RAG system that allow to generate context-aware prompt to get the best from LLMs.
This implementation is based on the langchain python framework to support RAG with different LLMs such as GPT, Gemini or Claude.

Consider the following implementation as an idea of the possibility that a GenAI-based application have to exctract and retrieve high detailed content from specific documents. All the unstructured data (in this case a financial pdf document) is embedded and saved in a vector DB, to easily generate context based on user prompts.

In [1]:
import dotenv
import bs4
from langchain import hub
from langchain_community.document_loaders import WebBaseLoader
from langchain_chroma import Chroma
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_text_splitters import RecursiveCharacterTextSplitter

In [2]:
import os
from langchain_anthropic import ChatAnthropic

# Load the API_KEYS to access LLMs and Embeddings models
dotenv.load_dotenv()
# ANTHROPIC_API_KEY = os.getenv('ANTHROPIC_API_KEY')
# COHERE_API_KEY = os.getenv('COHERE_API_KEY)
# print(ANTHROPIC_API_KEY, COHERE_API_KEY) # Check API keys are correctly loaded from .env file

# Build the ChatAntropic object to interact with claude-3 LLM
llm = ChatAnthropic(model="claude-3-sonnet-20240229")

In [9]:
# Load the pdf document and split it into pages.
# Use Chroma vector store and the CohereEmbeddings to embed all the pages of the document.
# Finally test the vectorstor to check if correctly retrieve part of the 
# document that are related to the input query.

from langchain_cohere import CohereEmbeddings
from langchain_community.document_loaders import PyPDFLoader

loader = PyPDFLoader("./knowledge_base/financial-literacy-cis-countries-survey-EN.pdf")
pages = loader.load_and_split()
vectorstore = Chroma.from_documents(documents=pages, embedding=CohereEmbeddings())

response = vectorstore.similarity_search('Can you tell me something about the financial educational level in armenia?')
print(response)

[Document(page_content='FINANCIAL KNOWLEDGE  │ 17 \n \nLEVELS OF FINANCIAL LITERACY IN CIS COUN TRIES \n  3.4. Self-reported financial knowledge  \nThere is a consistent pattern in terms of self -reported knowledge across the c ountries with \nthe majority of people reporting that they believe they are about average  (Figure 3.5).  \nHowever, respondents in Armenia are less likely  than those elsewhere  to consider \nthemselves to have lower than average financial knowledge (just 8% did so) and more \nlikely to rate themselv es as high  (20%). Respondents in Kazakhstan (74%) are \nconsiderably more likely than the Russian respondents (59%) to put themselves around \nthe average , whilst some 26% of respondents in the Russian Federation reported that their \nknowledge was lower than av erage.    \nFigure  3.5. Self-reported financial knowledge  \nBase: all respondents (excluding non -responses). % of respondents reporting that their financial knowledge is \nlower, about average  or high

In [10]:
# Retrieve and generate using the relevant snippets of the blog.
retriever = vectorstore.as_retriever()
prompt = hub.pull("rlm/rag-prompt")

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

# This is the incredible RunnableSequence type that is an implementation provided
# by longchain framework to properly support the process of managing promts, context, llm and chat model.
rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)
# To simply obtain the response, just call the invoke function with the input prompt
response = rag_chain.invoke("Can you tell me something about the financial educational level in armenia?")
print(response)

Based on the provided context, here are a few key points about financial educational level in Armenia:

1) Respondents in Armenia were less likely than those in other countries surveyed to consider themselves as having lower than average financial knowledge (only 8% did so). 

2) Around 20% of respondents in Armenia rated themselves as having higher than average financial knowledge, which was one of the highest percentages across the countries surveyed.

3) However, the data shows that in Armenia, only those believing they had below average knowledge made a fair self-assessment of their actual financial knowledge levels compared to others in their country. Those who rated themselves as having average or high knowledge tended to overestimate their knowledge levels.


In [11]:
# Cleanup vector DB
vectorstore.delete_collection()