# RAG Using LangChain & SingleStore

### Install libraries & dependencies

In [31]:
!pip install langchain --quiet
!pip install --upgrade openai==0.28.1 --quiet
!pip install pdf2image --quiet
!pip install pdfminer.six --quiet
!pip install singlestoredb --quiet
!pip install tiktoken --quiet
!pip install --upgrade unstructured==0.10.14 --quiet
!pip install -qU pypdf langchain_community

### Import the libraries

In [38]:
from langchain.document_loaders import PyPDFLoader
from langchain.chat_models import ChatOpenAI
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chains import RetrievalQA, ConversationalRetrievalChain
import os

### Load your custom document

In [40]:
from langchain_community.document_loaders import PyPDFLoader

file_path = "https://unctad.org/system/files/official-document/wesp2023_en.pdf"
loader = PyPDFLoader(file_path)

data = loader.load()

### Split the document into chunks

In [46]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

print(f"You have {len(data)} document(s) in your data")
print(f"There are {len(data[0].page_content)} characters in your document")

text_splitter = RecursiveCharacterTextSplitter(chunk_size = 2000, chunk_overlap = 0)
texts = text_splitter.split_documents(data)

print(f"You have {len(texts)} pages")

You have 178 document(s) in your data
There are 44 characters in your document
You have 379 pages


### Useing OpenAI API to generate embeddings for the document chunks

In [47]:
import os
import getpass

os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key: ")

OpenAI API Key:  ········


### Let's store our document embeddings into SingleStore database

In [48]:
from langchain.embeddings import OpenAIEmbeddings

embedding = OpenAIEmbeddings()

#from langchain.vectorstores.singlestoredb as s2
from langchain.vectorstores import SingleStoreDB
#from langchain.vectorstores.utils import DistanceStrategy

#s2.ORDERING_DIRECTIVE["DOT_PRODUCT"] = s2.ORDERING_DIRECTIVE[DistanceStrategy.DOT_PRODUCT]

docsearch = SingleStoreDB.from_documents(
    texts,
    embedding,
    table_name = "pdf_un",
    #distance_strategy = "DOT_PRODUCT"
)

  embedding = OpenAIEmbeddings()


### Let us check the text chunks and associated embeddings stored inside our database

In [50]:
%%sql
select * from pdf_un limit 1;

id,content,vector,metadata
2251799813685252,"III  Foreword This 2023 edition of the United Nations flagship report, World Economic Situation and Prospects, comes at a pivotal moment for the global economy. The growth of the world’s population to 8 billion people is a testament to improved nutrition, public health and sanitation. But as our human family grows larger, it is more unequal and divided than ever. Billions of people are struggling; hundreds of millions face hunger and even famine. People in the richest countries can expect to live up to 30 years longer than those in the poorest. Countries in the Global South are drowning in debt, with poverty and hunger increasing as they face the growing impacts of the climate crisis – a case study in inequality. Vast swathes of the world have no chance of investing in a sustainable recovery from the pandemic, a transition to renewable energy, or education and training so their people can benefit from the digital revolution. Against this backdrop, World Economic Situation and Prospects presents a grim economic outlook for the near-term. A broad-based and severe slowdown of the global economy looms large amid high inflation, aggressive monetary tightening, and heightened uncertainties. Many economies are at risk of falling into recession, having barely recovered from the shock of the pandemic. The fiscal space of developing countries is under siege from exchange rate depreciation, high borrowing costs and rising debt distress. Tightening global monetary conditions will make it even more difficult to finance investments in the 2030 Agenda and the Sustainable Development Goals. While taming inflation remains a key near-term objective, policymakers must also consider trade- offs with slower growth, employment losses and international spillovers. This is not the time for short-term thinking or knee-jerk fiscal austerity that exacerbates inequality, increases suffering and could put the SDGs farther out of reach.","b'\xc4c\x18\xbb\x93\x04\xd5\xbc\xb8Q\xf7:\xffK\xa4\xbc\xfa\x90\x14;\xa8\r\x9f;\xe5t\xa1<9\x9d\xdf<\xf9\xbb\x97<\x8b\xfa\x82\xbc\xf2\x81\x91\xbc_ix;\xb9\xfbp<\x03pc;\xb6\xd7\xb1\xbc\x158c\xbc\x16\xbc\n\xc1\xbcY\xde\x9c\xbb\xcdG\x18=q1\xf8;\xd2\x02(<\x9c\xed\x85\xbc\xccm\xea\xbb\xad\xe0\x88\xbc\x02\xc6\xe9\xb9\xc2\xf7J<1{\xb3\xbc\xb8V\xa8\xbc\x00!\xa1\xbc\r\x03\x8e\xce;\xa4\x0f\xb2\xbb\x91\xb0\xe1;x\x88\t=K:\\<+\xd3\xcc;\x0c)`\xbc\xf5\xaa\x01=\xb4\x1a\x8f\xbb\xd5iD\xbb\xf8\xf9C<\x8b\r,\xbbU\xf3\xd8<\\2\x90\xbc\xfa\x90\x14\xbb\xec\xc6\x81\xbbdg\xe5\xbc\xb3\x96g\xbcV\xb0\xfb\xbb7\x1e\xe9]\xa0;s\x9d\xc5\xbc<\xb3\xa6;%0\x17\xbb\xeekJ\xbaWZ\xf5:m8\xbc2\x1d<\xa4""\xdb\xbc:L\x8a\xbc\xbf\xa8\x88\xbaSa\xb9=Y\xde\x1c=\x89c2;\x0bl\xbd<\x0b\x7f\xe6<\x19#\'\xbc\xf9\xb6f\xba6#\x1a\xbd\xc8o}:\x12\xe4\xef\xbb\xa5\xb9\xab;T\xf8\t\xba\xca\xf3$<\x14P=\xbc8\xcd\x93\xbcN{\xa6\xbc\xf9\xbb\x97\x07\x81y\xba\x1b\xa2\x1d<\x98\xd7\xbe<\xfe\xa1*<\x8f\x06h<\xfd\xdfV=\x07s\x81\xbc\x80\xaa5;\x0c)\xe0:\xb4-\xb8\xbc\xccm\xea<\x10RP\xbchx{;\xa4\xf7W\xbcV\x8a\xa9;a\x13r\xbd3\r\xd3\xbc\x1c\x8a\xc3\xbc\x9b\x18\x89<\xc4K\xbe\xbc8\xe0<\xbcr\x0b\xa6\xbcj\xd1\x9f\xbc\x1cw\x9a<\xaa\x8c\x95\xbc\xfa\xa3\xbd\xbc\x1cw\x1a\xe8\xbb@\xd7e\xbc}V\xc2\xbc8\xcd\x93\xb9\xaba\x929c\x97\x99\xbc&\x05\x94\xbc\x07n\xd0<\x9egK\xbb\xac\x1e54<\x05\x07\xb4;\x9c\xd5+\xbc\x8b\xfa\x02;!\x1f\x81<\x8d\x9fK\xbc\xc0e\xab<\xa6\x8e(;\xabt\xbb\xbc\xe8\xdb=\xbc\xeb\x1c\x88] \xbc\x0fj\xaa\xbc\xa7c%;\xce\xf1\x91\xbc\x05\xd7\x7f\xbc\xf5\xb8\xf9;\x05\xc9\x07\xbc\xba\xbdD;a\x18#\xbc\x01\xf6\x1d<9w\r\xbcdA\x93<\xcdG\x98\xba\t\xda\x9d:\x9f$\xee;\x12\xe4\xef8\x8aK\xd8;)Y\x87\xbc?-\xec<\xa0\x11E\xbc\x9d\xd0z<\x8e\\n;s\xc8\xc8;R\xa4\x16<\xdc\xa3J\xbbhx{<}V\xc2:!X\xfc;V\x8a)<\x1b\xb5F<\x19\x0b\xcd\xbc\xeam]\xba\xf8\xf9\xc3<\x05\xefY<\xb4E\x92;""\x1aP<3\xe7\x00=\xd2\x1a\x02<\x1e\t\xba\xbb@\xc4\xbc\xbbm b<*.\x84;cl\x16\xbb*\x11\xf9<\x06\xc4\xd6\xbb\xeaG\x8b;\r\x16\xb7;\xca\x19w:\xd8\x92\xb4<\xafr(\xba\x1f\xb33\xbdn\xe2\xb5;\x80\x97\x0c\xbc\xeaZ4\xbc\x18f\x04\xbe\x9f)\x9f\xbb0\x93\r\xbbRy\x13=\x89{\x0c=s\xc8H<\xc6\xe2\x8e\xba\x8c\xcaN\xbbn\xe2\xb5\xbc\x80\x97\x8c<\xeb\x04.\xbc\xdd\x8b\xf0\xbc\xa2xa\xbb_n\xa9\xb4;\xe3\xf0y\xbc3\xfa\xa9;;\xf6\x83A<\xb6\xac.<\x8b\xf5\xd1\xbc\xed\x83$;\x84\xa3q\xbd\xf5\xbd*=\xbf}\x85\xbb;!\x87\xbc>2\x9d\xbc\x8b\r\xac\xbb\xbd)\x12\xbb\xa9\xddj\xbc>] \xbc\x0b\x84\x17<=\x83\xf2\xbc\xfa\x90\x94<\xcb\xc3\xf0;\xd2(z\xbc\x89c2\xbdH\xeb\x99\xbb\x8d\x8c\xa2<>2\x9d\xb8\xc6\x08\xe1;\x89P\x89<5to;\x8dtH\xba\xdc\xbb\xa4h\xbc\x8c\xb2t\xbc\x05\xf4\x8a\xbc\x81l\x89;\xc7\x9f1\xbc'","{'page': 4, 'source': 'https://unctad.org/system/files/official-document/wesp2023_en.pdf'}"


### Ask a query against your custom data (the pdf that you loaded) using just similarity search to retrieve the top k closest content

In [51]:
query = "What India's GDP growth is projected to be?"
docs = docsearch.similarity_search(query)
print(docs[0].page_content)

99cHAPtER  iii . REGion Al DEvEloPmEnt S AnD outlooK
South Asia: A challenging road 
ahead amid global headwinds
 ● South Asia’s outlook has deteriorated 
amid challenging domestic and 
global conditions.
 ● Rising global food and energy prices are 
intensifying pressure on food security 
and undermining progress on the SDGs.
 ● The economic impact of the conflict 
in Ukraine is exacerbating existing 
vulnerabilities across the region.
The outlook for South Asia has deteriorated and 
is subject to multiple downside risks amid global 
monetary tightening, fiscal vulnerabilities, rising 
inflation and extreme weather events. Regional 
GDP growth is expected to slow to 4.8 per cent 
in 2023 from an estimated 5.6 per cent expansion 
in 2022. Overall, weaker global demand, tighter 
monetary policy, additional supply disruptions, 
further escalation in commodity prices and 
the emergence of new COVID-19 variants pose 
significant risks in 2023.
India’s GDP growth rate is projected to moderat

### Here is the augmented response to the user query

In [52]:
import openai

prompt = f"The user asked: {query}. The most similar text from the document is: {docs[0].page_content}"

response = openai.ChatCompletion.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": prompt}
    ]
)

print(response['choices'][0]['message']['content'])

India's GDP growth rate is projected to moderate to 5.8 per cent in 2023 from an estimated 6.4 per cent in 2022.


### Let’s test when knowledge base (custom documents like pdf) is not provided

In [53]:
from langchain.llms import OpenAI
llm = OpenAI(temperature=0.8)

  llm = OpenAI(temperature=0.8)


In [54]:
llm.predict("What India's GDP growth is projected to be in 2024?")

  llm.predict("What India's GDP growth is projected to be in 2024?")


"\n\nAccording to the International Monetary Fund (IMF), India's GDP growth is projected to be 6.3% in 2024."

In [66]:
query = "SriLanka sought financial assistance from whom?"
docs = docsearch.similarity_search(query)
print(docs[0].page_content)

As the United States Federal Reserve raised its 
policy rate and international investors reduced 
their exposures to developing markets in 2022, 
South Asian currencies weakened significantly 
against the dollar. In response, central banks in 
the region accelerated their interest rate hikes 
and intervened strongly to prevent further 
currency depreciation, particularly during the 
second half of the year. Fiscal and balance-of-
payments financing needs were exacerbated 
across the region.
Existing high levels of sovereign debt and 
unsustainable debt-servicing burdens 
prompted several South Asian countries to seek 
multilateral financial support in the second 
half of 2022 (figure III.15). After defaulting on 
its sovereign debt in April, Sri Lanka reached a 
staff-level agreement with the IMF under the 
Extended Fund Facility in early September. The 
IMF programme is expected to help boost tax 
revenues and reduce fiscal deficits in coming 
years (IMF, 2022k). Pakistan and Banglade

In [67]:
import openai

prompt = f"The user asked: {query}. The most similar text from the document is: {docs[0].page_content}"

response = openai.ChatCompletion.create(
    model="gpt-4",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": prompt}
    ]
)

print(response['choices'][0]['message']['content'])

Sri Lanka sought financial assistance from the International Monetary Fund (IMF) after defaulting on its sovereign debt in April. In early September, Sri Lanka reached a staff-level agreement with the IMF under the Extended Fund Facility. The IMF programme is expected to help boost Sri Lanka's tax revenues and reduce fiscal deficits in the coming years.
