# Retrieval Augmented Generation with OpenAI's GPT as Reasoning Engine and Pinecone as remote vectorstore
This project explores the capabilities of retrieval augmented generation (RAG) for context injection in LLMs with remove vectorstores hosted in Pinecone, a cloud vector database provider

## Libraries are installed

In [None]:
!pip install openai --quiet
!pip install langchain --quiet
!pip install pypdf --quiet
!pip install pinecone-client --quiet
!pip install langchain-pinecone --quiet
!pip install tiktoken --quiet

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m226.7/226.7 kB[0m [31m3.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.9/75.9 kB[0m [31m6.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m76.9/76.9 kB[0m [31m6.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m58.3/58.3 kB[0m [31m4.8 MB/s[0m eta [36m0:00:00[0m
[?25h[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
llmx 0.0.15a0 requires cohere, which is not installed.
llmx 0.0.15a0 requires tiktoken, which is not installed.[0m[31m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m811.8/811.8 kB[0m [31m7.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.6/1.6 MB[0m [31m13.3 

## Drive is mounted to fetch credentials

In [None]:
from google.colab import drive
drive.mount('/content/gdrive')

Mounted at /content/gdrive


## Keys are stored as environment variables

In [None]:
import os
os.environ['OPENAI_API_KEY'] = 'openai-key'
os.environ['PINECONE_API_KEY'] = 'pinecone-key'
os.environ['PINECONE_INDEX_NAME'] = 'demo-clase'

## Libraries are imported

In [None]:
from langchain.chat_models import ChatOpenAI
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.callbacks import get_openai_callback
from langchain.chains.question_answering import load_qa_chain
from langchain_pinecone import Pinecone
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from pypdf import PdfReader

## GPT 3.5 model is instantiated

In [None]:
chat = ChatOpenAI(model = 'gpt-3.5-turbo')

## PDF file with context is then read and parsed to extract the text contained in it

In [None]:
pdf = "/content/gdrive/MyDrive/GenAIEne2024/Tesla Q3 2023 Earnings Call.pdf"
pdf_reader = PdfReader(pdf)
text = ""
for page in pdf_reader.pages:
  text += page.extract_text()

## Text is displayed

In [None]:
text

"Elon Musk  \n[Audio gap] ramp -up of new factories, and we believe there's still meaningful room for \nimprovement there. Regarding Autopilot and AI, our vehicle has now driven over 0.5 billion \nmiles with FSD beta, full self -driving beta, and that number is growing rapidl y. We recently \ncompleted a 10,000 GPU cluster of H100s. We think probably bringing it into operation faster \nthan anyone's ever brought that much compute per unit time into production since training is the \nfundamental limiting factor on progress with full s elf-driving and vehicle autonomy.  \nWe're also seeing significant promise with FSD version 12. This is the end -to-end AI where it's a \nphoton count in controls out or really you can think of it as there's just large midstream coming \nin and a tiny bit stream going out, impressing reality into a very small set of outputs, which is \nactually kind of how humans work. The vast majority of human data input is optics from our \neyes. And so, we are like the

## Using the recursive character splitter to divide text in chunks which will be later vectorized

In [None]:
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=400,
    chunk_overlap=50
)

In [None]:
chunks = text_splitter.split_text(text=text)

## As result, 133 chunks are created

In [None]:
len(chunks)

133

## Using OpenAI embeddings to vectorize these chunks and then store them in pinecone

In [None]:
embedings = OpenAIEmbeddings()
vectorstore = Pinecone.from_texts(chunks, embedding=embedings, index_name=os.getenv('PINECONE_INDEX_NAME'))

## Testing with query outside of the model's inner knowledge base

In [None]:
query = 'What is being said about building a factory in Mexico?'

## The most simmilar chunk is retrieved

In [None]:
docs = vectorstore.similarity_search(query=query, k=1)
chain = load_qa_chain(llm=chat, chain_type='map_reduce', verbose=True)

## Using the LLM to fetch response

In [None]:
with get_openai_callback() as cb:
  response = chain.run(input_documents = docs, question=query)
  print(cb)



[1m> Entering new MapReduceDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mSystem: Use the following portion of a long document to see if any of the text is relevant to answer the question. 
Return any relevant text verbatim.
______________________
to talk about?  
Elon Musk  
No, we're definitely making the factory in Mexico. We feel very good about that. We put a lot of 
effort into looking at different locations, and we feel very good about that location, and we are 
going to build a factory there. And it's going to be great.  
The question is really just one of timing. And there's going to be a broken record on the interest
Human: What is being said about building a factory in Mexico?[0m

[1m> Finished chain.[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mSystem: Given the following extracted parts of a long document and a question, create a final answer. 
If you don't know the a

## Displaying response

In [None]:
print(response)

The speaker states that they are definitely building a factory in Mexico and that they feel very good about the chosen location. They also express confidence in the decision and believe that the factory will be great.
