# Retrieval Augmented Generation
## Using an LangChain and a local LLM on your machine with your data

In [1]:
# Import necessary packages

from langchain.chains import RetrievalQA
from langchain.document_loaders import PyPDFLoader
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.llms import GPT4All
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.callbacks import StdOutCallbackHandler
from langchain.vectorstores.faiss import FAISS

In [2]:
#1. INDEXING

# Loading
loader = PyPDFLoader("WIRED_article.pdf")
documents = loader.load_and_split()

# Chunking
text_splitter = RecursiveCharacterTextSplitter(chunk_size = 1024,
                                               chunk_overlap = 64)

texts = text_splitter.split_documents(documents)

# Embedding
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

# Storing in vector database
vectorstore = FAISS.from_documents(texts, embeddings)

# Define the retriever for the stored indices
retriever = vectorstore.as_retriever()


In [3]:
# large language model you want to use
# LLM path
llama_7b_path = r'/model_path/llama-2-7b-chat.Q8_0.gguf'

# Create the LLM instance
llm = GPT4All(model=llama_7b_path,
              max_tokens=1000, 
              verbose=True, 
              repeat_last_n=0)

In [4]:
#2. RETRIEVING

#create a retrieval chain with the document source in the vector store
qa_chain_with_sources = RetrievalQA.from_chain_type(llm=llm,
                                                    retriever=retriever,
                                                    callbacks=[StdOutCallbackHandler()],
                                                    return_source_documents=True)

In [5]:
#3. AUGMENTING

# Send the augmented user query to the LLM to get the response
response = qa_chain_with_sources({"query":"what is EEAGER and why is it used?"})



[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


In [6]:
response['result']

' EEAGER stands for Earth Engine Automated Geospatial Elements Recognition. It is a machine learning algorithm developed by Google that can identify beaver dams in satellite imagery with a high degree of accuracy. EEAGER is used to help scientists and conservationists monitor beaver populations, track their movements, and understand their role in ecosystems. By using EEAGER to analyze satellite imagery, researchers can identify beaver dams and ponds more quickly and accurately than by manually reviewing images, which can save time and resources.'

In [7]:
response['source_documents']

[Document(page_content='model also relies on context clues from the surrounding landscape, and a sagebrush plateau in Wyoming\nlooks very different from a deciduous forest in Massachusetts.) The team also has to figure out EEAGER’s\nlong-term fate: Will it remain a tool hosted by Google? Spin off into a stand-alone product? Become a\nservice operated by a university or nonprofit?\n“That’s the challenge for the future—how do we make this more universally accessible and usable?” Corwin\nsays. The beaver revolution may not be televised, but it will definitely be documented by satellite.', metadata={'source': 'WIRED_article.pdf', 'page': 6}),
 Document(page_content='tries to figure out whether a given chunk of geospatial data includes a particular object—whether a stretch\nof mountain stream contains a beaver dam, say. Fairfax and some obliging beaverologists from Utah State\nUniversity submitted thousands of coordinates for confirmed dams, ponds, and canals, which the Googlers\nmatched up

## Cross-check without external source

In [8]:
# Check model output without external vector sources
vanilla_model = GPT4All(model=llama_7b_path,
                        max_tokens=1000, 
                        verbose=True, 
                        repeat_last_n=0)

output_check_1 = vanilla_model.generate(["what is EEAGER and why is it used?"])
output_double_check = vanilla_model.generate(["what is EEAGER and why is it used? The context is a Wired article about beavers"])


In [9]:
print(output_check_1.generations[0][0].text)



EEAGER is a term used in the context of database systems and refers to the "Eager Evaluation of Join Predicates" technique. It is used to improve the performance of database queries by reducing the number of joins required in the query.
In a typical database query, the join predicates are evaluated lazily, meaning that the join operation is performed only when the entire join result is needed. However, this can lead to inefficiencies, especially when dealing with large datasets.
EEAGER addresses this issue by evaluating the join predicates eagerly, meaning that the join operation is performed as soon as possible, even if the entire join result is not needed immediately. This can significantly reduce the number of joins required in the query, resulting in improved performance.
EEAGER is particularly useful in scenarios where the join predicates are complex or involve subqueries, as it can help avoid the overhead of re-evaluating the join predicates multiple times. It is also useful in

In [10]:
print(output_double_check.generations[0][0].text)

 and their impact on the environment.
EEAGER is an acronym for "Environmental Engineered Artificial Grass Ecosystems". It is a term used in the article to describe a new technology that uses artificial grass to create a sustainable ecosystem that can help to mitigate the effects of climate change.
The article explains that beavers are known for their ability to create dams and wetlands that help to regulate water flow and provide habitat for wildlife. However, beavers are currently facing extinction due to habitat loss and other environmental factors. To address this issue, researchers have developed EEAGER, a technology that uses artificial grass to mimic the ecosystem services provided by beavers.
EEAGER works by creating a series of interconnected artificial grass modules that can be deployed in a variety of environments, including wetlands, rivers, and coastal areas. These modules are designed to mimic the natural processes that occur in beaver-created ecosystems, such as water fil