# RAG Application using Gradio and Langchain

In this Jupyter notebook, we will explore how to build a RAG (Retrieve, Attend, Generate) application using Gradio and Langchain. RAG is a powerful language model designed for question answering. It can take a query as input and provide relevant answers by retrieving and generating relevant passages from a corpus of documents.

## What is Gradio?

Gradio is a Python library that allows you to quickly build customizable UIs for your machine learning models. With Gradio, you can create user-friendly interfaces to interact with your models, such as sliders, dropdowns, and file upload fields.

## What is Langchain?

Langchain is a simple and easy-to-use library for accessing and interacting with the RAG API. With Langchain, you can easily retrieve answers from the RAG model and also fine-tune it using your own dataset.

## Prerequisites

To run this code, you don't need to install any prerequisites. You can easily run this notebook in the cloud using Google Colab.

**Note:** As this is a public LLM endpoint, please refrain from uploading any confidential data.

## Usage

1. Upload your documents: You can upload your own set of documents in various formats such as PDF, DOCX, or plain text.

2. Play with the code: You can modify the code to customize the behavior of the RAG model, such as changing the retrieval or generation strategies.

3. Get answers: Use the Gradio interface to enter your queries and get answers from the RAG model. You can experiment with different queries and see how the model responds.

## Let's get started!

Now that you understand the basic overview of this RAG application, let's dive int the code and start building our RAG chatbot using Gradio and Langchain.

# Install required packages
In order for our code to work in Colab, we need to install all necessary requirements and their dependencies.

Frontend:
Gradio

Backend:
Langchain
Langchain NVIDIA Support

In [1]:
!pip install --quiet gradio langchain langchain-nvidia-ai-endpoints langchain_community unstructured[all-docs] faiss-cpu

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.3/12.3 MB[0m [31m34.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m974.6/974.6 kB[0m [31m16.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.2/2.2 MB[0m [31m32.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m14.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m27.0/27.0 MB[0m [31m28.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m92.0/92.0 kB[0m [31m4.9 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m318.1/318.1 kB[0m [31m15.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.6/75.6 kB[0m [31m6.4 MB/s

## NVIDIA NGC API Key
Before running the code, please make sure to paste your NVIDIA NGC API Key into the environment. The NVIDIA NGC API Key is required for accessing the RAG model. You can obtain your API key by signing up for an account on the NVIDIA NGC website. After obtaining the API key, paste it into the appropriate field in the code. This will ensure that you have the necessary authorization to use the RAG model.

**Note**: Execute below cell and paste in your API Key

In [2]:
import getpass
import os

# del os.environ['NVIDIA_API_KEY']  ## delete key and reset
if os.environ.get("NVIDIA_API_KEY", "").startswith("nvapi-"):
    print("Valid NVIDIA_API_KEY already in environment. Delete to reset")
else:
    nvapi_key = getpass.getpass("NVAPI Key (starts with nvapi-): ")
    assert nvapi_key.startswith("nvapi-"), f"{nvapi_key[:5]}... is not a valid key"
    os.environ["NVIDIA_API_KEY"] = nvapi_key

NVAPI Key (starts with nvapi-): ··········


## Upload you document
To upload your files, execute the next cell by clicking on it and then clicking on the "Run" button or using the keyboard shortcut Shift+Enter. Once the cell has finished executing, you will see a "Choose Files" button. Click on this button to select the files you want to upload from your local machine. You can upload multiple files at once. After selecting the files, click on the "Upload" button to begin the upload process. Please note that larger files may take longer to upload.

In [3]:
from google.colab import files
uploaded = files.upload()

Saving Example_DIGA_TEXT.txt to Example_DIGA_TEXT.txt


## Define LLM and Embedding Model

In [4]:
import random
from langchain_nvidia_ai_endpoints import ChatNVIDIA, NVIDIAEmbeddings
from langchain.schema import AIMessage, HumanMessage
from langchain.text_splitter import CharacterTextSplitter
from langchain_community.vectorstores import FAISS
from langchain_community.document_loaders import DirectoryLoader
from langchain_community.document_loaders import UnstructuredFileLoader


llm = ChatNVIDIA(model="mistralai/mixtral-8x7b-instruct-v0.1")
embedder = NVIDIAEmbeddings(model="nvidia/nv-embed-v1", truncate="END")


## Create Document Vector Store from uploaded files

In [5]:
raw_documents = UnstructuredFileLoader(list(uploaded.keys())).load()
text_splitter = CharacterTextSplitter(chunk_size=2000, chunk_overlap=200)
documents = text_splitter.split_documents(raw_documents)
vectorstore = FAISS.from_documents(documents, embedder)

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.


## Define System Prompts and Templates

In [11]:
from re import template
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain.chains import create_history_aware_retriever
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnableAssign
from operator import itemgetter



qa_system_prompt = """You are an assistant for question-answering tasks. \
Use the following pieces of retrieved context to answer the question. \
If you don't know the answer, just say that you don't know. \
Use three sentences maximum and keep the answer concise.\

{context}"""
qa_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", qa_system_prompt),
        MessagesPlaceholder("chat_history"),
        ("human", "{input}"),
    ]
)
retriever = vectorstore.as_retriever()
rag_chain = (
    RunnableAssign({'context': itemgetter("input")|retriever})
   | qa_prompt
   | llm
   | StrOutputParser()
)


def predict(message, history):
  history_langchain_format = []
  for human, ai in history:
      history_langchain_format.append(HumanMessage(content=human))
      history_langchain_format.append(AIMessage(content=ai))

  result = rag_chain.invoke({"input":message,"chat_history": history_langchain_format})
  return result




In [12]:
import gradio as gr
with gr.Blocks() as demo:
    gr.ChatInterface(predict)

## Run Application

In [None]:
demo.launch(debug=True)

Setting queue=True in a Colab notebook requires sharing enabled. Setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. This cell will run indefinitely so that you can see errors and logs. To turn off, set debug=False in launch().
Running on public URL: https://5e2cc13a0cc52d6a89.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)
