## Retrieval Augmented Generation with LanceDB  

**Objective:**
Use Llama 2.0, Langchain and LanceDB to create a Retrieval Augmented Generation (RAG) system.

This will allow us to ask questions about our documents (that were not included in the training data), without fine-tunning the Large Language Model (LLM).

Here Text Splitting will help LLM to give accurate answers without hallucination.


## What is a Retrieval Augmented Generation (RAG) system?

Retrieval-augmented generation (RAG) is an AI framework for improving the quality of LLM-generated responses by grounding the model on external sources of knowledge to supplement the LLM’s internal representation of information. Implementing RAG in an LLM-based question answering system has two main benefits:
1. It ensures that the model has access to the most current, reliable facts, and that users have access to the model’s sources, ensuring that its claims can be checked for accuracy and ultimately trusted.
2. RAG has additional benefits. By grounding an LLM on a set of external, verifiable facts, the model has fewer opportunities to pull information baked into its parameters. This reduces the chances that an LLM will leak sensitive data, or ‘hallucinate’ incorrect or misleading information.


The orchestration of the retriever and generator will be done using Langchain. A specialized function from Langchain allows us to create the receiver-generator in one line of code.




In [None]:
# Installation

!pip install transformers accelerate einops langchain xformers bitsandbytes lancedb sentence_transformers



In [None]:
# imports

from torch import cuda, bfloat16
import torch
import transformers
from transformers import AutoTokenizer
from time import time
import lancedb
from langchain.llms import HuggingFacePipeline
from langchain.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.chains import RetrievalQA
from langchain.vectorstores import LanceDB

In [1]:
# Dataset(in txt format)

!wget https://gist.githubusercontent.com/PrashantDixit0/10fd4ab8a7d0d37de361af2a06ecfbe2/raw/indianEconomy.txt

--2024-01-25 01:04:50--  https://gist.githubusercontent.com/PrashantDixit0/10fd4ab8a7d0d37de361af2a06ecfbe2/raw/indianEconomy.txt
Resolving gist.githubusercontent.com (gist.githubusercontent.com)... 

64:ff9b::b9c7:6c85, 64:ff9b::b9c7:6d85, 64:ff9b::b9c7:6e85, ...
Connecting to gist.githubusercontent.com (gist.githubusercontent.com)|64:ff9b::b9c7:6c85|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2620 (2.6K) [text/plain]
Saving to: ‘indianEconomy.txt’


2024-01-25 01:04:51 (11.4 MB/s) - ‘indianEconomy.txt’ saved [2620/2620]



In [None]:
# load data

dataloader = TextLoader("indianEconomy.txt", encoding="utf8")
documents = dataloader.load()

## Text Chunking

We have discussed various types Text Chunking for LLM Applications and Tips and Tricks related to it.

Refer - https://medium.com/p/a420efc96a13/edit

Here you can try out different Text Spplitting Strategies according to your data and Tips and Tricks discussed in Blog.

For Now, we are going to use Recursive Text Splitting using LangChain

In [None]:
recursive_text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=10000, chunk_overlap=200
)
all_splits = recursive_text_splitter.split_documents(documents)

## Embeddings Generator

Creating embeddings using Sentence Transformer with HuggingFace embeddings.

In [None]:
model_name = "sentence-transformers/all-mpnet-base-v2"
model_kwargs = {"device": "cuda"}

embeddings = HuggingFaceEmbeddings(model_name=model_name, model_kwargs=model_kwargs)

## LanceDB for vector storage and searching

Initialize LanceDB with the Recursive Text Chunking, the *embeddings*   Sentence Transformer object will be used for extract embeddings from all the text splits

In [None]:
db = lancedb.connect("/tmp/lancedb")
table = db.create_table(
    "rag_table",
    data=[
        {
            "vector": embeddings.embed_query("Indian Economoy"),
            "text": "Current and future details of Indian Economy",
            "id": "1",
        }
    ],
    mode="overwrite",
)

vectordb = LanceDB.from_documents(documents, embeddings, connection=table)

## Initialize RAG Chain


## Chat Models

Here you can change to any other LLM for Chat Model.

Refer to LangChain, There are few Chat Models which can be used as Chat model to generate answers in RAG.
https://python.langchain.com/docs/integrations/chat/

In [None]:
from langchain.chat_models import ChatOpenAI

llm = ChatOpenAI(
    temperature=0,
    openai_api_key="YOUR_API_KEY",
    openai_organization="YOUR_ORGANIZATION_ID",
)

In [None]:
# Retreiver
retriever = vectordb.as_retriever()

qa = RetrievalQA.from_chain_type(
    llm=llm, chain_type="stuff", retriever=retriever, verbose=True
)

In [None]:
# Results of RAG

query = "What is growth of Indian Economy?"

result = qa({"query": query})

print(result)