Notebook created by [Nikolaos Tsopanidis](https://github.com/tSopermon)

# Retrieval-Augmented Generation (RAG) App

Retrieval-Augmented Generation (RAG) is a powerful technique used in applications like Q&A chatbots. It combines data retrieval with language model generation to provide accurate and context-aware answers. A typical RAG system consists of two main components:

1. **Indexing**: This process prepares the data for efficient retrieval:
    * Load: Use Document Loaders to ingest raw data.
    * Split: Break large documents into smaller chunks using text splitters, improving searchability and ensuring compatibility with the model's context window.
    * Store: Save and index the chunks in a VectorStore using an Embeddings model for later retrieval.

2. **Retrieval and Generation**: This process handles user queries at runtime:
    * Retrieve: Fetch relevant chunks from the VectorStore using a Retriever based on the user query.
    * Generate: Use a ChatModel or LLM to generate an answer by combining the query with the   retrieved data in a prompt.
    * This workflow ensures that the system can efficiently process large datasets and provide accurate, contextually relevant responses.

    <img src="https://cdn.prod.website-files.com/61082de7b224bb1768edad68/6704484409635e6a0dbf4098_65982e165de858b7c41f4fa3_Img%25201.webp" style="height:600px">
    
    [image source](https://medium.com/@yogeshkd/four-ways-to-embed-images-in-your-jupyter-notebook-powered-blog-2d28f6d1b6e6)

## LangChain Implementation
**LangChain** is an open source framework used commonly in GenAI applications. Not only we can build apps, but we can use LangSmith, introduced by LangChain, to monitor LLMs, debug and evaluate code.

## Leveraging Ollama Capabilities
**Ollama** is a framework designed to simplify the deployment and use of LLaMA models, offering us the ability to run these models locally or in cloud environments for practical applications.

## Using Ollama with LangChain for the app development
Utilizing these two frameworks together allows us to create powerful applications that can leverage the capabilities of LLaMA models while also providing robust monitoring and debugging features through LangChain.
In this **Jupyter Notebook**, we describe the development of a RAG application designed to answer questionsn about retrieved documents related to concert tours, thereby providing a consise and accurate response to user queries.

First, we will install the required packages for the project, shown in the `requirements` section below:
```bash
    streamlit
    langchain-ollama
    langchain-chroma
    transformers
    langchain-core
    langchain-text-splitters
    langchain
    torch
    serpapi
    google-search-results
```

## 1. Importing necessary libraries for text processing and vector storage
- `re`: Regular expressions for text manipulation.
- `Document`: Class for handling documents in the LangChain framework.
- `CharacterTextSplitter`: Class for splitting text into smaller chunks based on characters.
- `Chroma`: Class for storing and managing vector embeddings.

In [21]:
import re, os
import traceback
from langchain_core.documents import Document
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_chroma import Chroma

## 2. Importing the library to create flexible chat prompts for LLMs. 

In [None]:
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

## 3. Importing libraries to create a retrieval chain. Chains are used to combine multiple components into a single workflow.
Chains encode sequencies of calls to LLMs, tools, and other chains.
In this case, we will create a retrieval chain that **retrieves relevant documents** from a vector store and then generates a response based on those documents.
The **retrieval chain** will use a retriever to find relevant documents and then use a language model to generate a response based on those documents.
The **history aware retriever** will be used to keep track of the conversation history and use it to improve the retrieval process.

In [3]:
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains import create_retrieval_chain, create_history_aware_retriever

## 4. Importing Streamlit to create the web app UI for handling user input and SerpAPI for web search and scraping events from Google Events engine.

In [4]:
import streamlit as st
from serpapi import GoogleSearch

## 5. Importing LangChain Ollama
This service uses the Ollama library to generate embeddings and chat completions using Ollama models. OllamaEmbeddings is used to create embeddings for the documents and to store them in a vector store. The ChatOllama class is used to generate chat completions using the Ollama model. The AIMessage class is used to create messages for the chat model.

In [None]:
from langchain_ollama import OllamaEmbeddings, ChatOllama
from langchain_core.messages import AIMessage

## 6. Setting the SerApi API key to use the engine and retrieve events
SepApi uses the SerpApi API key to access the Google Search API. You can get your own API key from https://serpapi.com/manage-api-key. Make sure to set the environment variable SERPAPI_API_KEY to your own API key before running this code.

In [None]:
SERPAPI_API_KEY = "" # Replace with your own API key
os.environ["SERPAPI_API_KEY"] = SERPAPI_API_KEY

## 7. Configuring models for the embeddings and LLM

### 7.1. Embeddings
The embeddings model is used to convert text into vector representations. This is important for semantic search and retrieval tasks.
The `OllamaEmbeddings` class is used to create embeddings using the Ollama model. The `model` parameter specifies the name of the model to use. In this case, we are using the `nomic-embed-text` model, which is designed for text embeddings.

In [8]:
embeddings = OllamaEmbeddings(model="nomic-embed-text")

### 7.2. LLM
The LLM (Language Model) is used to generate text based on the input provided. The `ChatOllama` class is used to create a chat-based language model using the Ollama model. The `model` parameter specifies the name of the model to use. In this case, we are using the `llama3.1:8b` model, which is a variant of the LLaMA model with 8 billion parameters. The `grounding` parameter specifies the grounding method to use. In this case, we are using "strict" grounding, which means that the model will be more focused on the input context.
The LLM will also be used to generate concise summaries of the uploaded documents and scraped events. Summaries will then be saved in the Chroma vector store to be used for retrieval.

In [9]:
llm = ChatOllama(model="llama3.1:8b", grounding="strict")

**Important**: `nomic-embed-text` and `llama3.1:8b` need to be downloaded first by installing [Ollama](https://ollama.com/download) in your system and running the following commands:
```bash
ollama pull nomic-embed-text
ollama pull llama3.1:8b
```

## 8. Functions for Document Processing and Vector Base Creation

### 8.1. Function to check if the text is concert-related
This function checks if the text contains any concert-related keywords. If the text is related to concerts, the app will not load the document, ensuring that only relevant information is processed.
This is important for maintaining the focus of the application and ensuring that the user receives accurate and relevant information.

In [None]:
def is_concert_related(text, CONCERT_RELATED_KEYWORDS):
    text_lower = text.lower() # Convert to lowercase for case-insensitive matching
    return any(keyword in text_lower for keyword in CONCERT_RELATED_KEYWORDS)

CONCERT_RELATED_KEYWORDS = ["concert", "music", "band", "performance", "stage", 
                            "ticket", "venue", "artist", "festival", "tour", "gig", 
                            "show", "orchestra", "symphony", "recital", "live music", 
                            "audience", "encore", "setlist", "soundcheck"]

### 8.2. Function to create a Chroma vector store
This function takes a text input and an embeddings model as arguments. It processes the text to remove unnecessary whitespace and non-alphanumeric characters, splits the text into smaller chunks, and creates a Chroma vector store using the provided embeddings model.
We will use `RecursiveCharacterTextSplitter` to split the text into smaller chunks as the concert documents consist of multiple paragraphs. This way we will maintain each paragraph semantically intact and avoid splitting them into smaller pieces.

`nomic-embed-text` model is used to convert text into vector representations and store them in a vector database. By generating vector representations of the text, the app can efficiently search and retrieve relevant information based on user queries.
This is particularly useful for applications that require fast and accurate information retrieval, such as chatbots, search engines, and recommendation systems.


In [None]:
def get_vector_store(text, embeddings):
    processed_text = re.sub(r'\s+', ' ', text)
    processed_text = re.sub(r'[^a-zA-Z0-9\s]', '', processed_text)
    documents = [Document(page_content=processed_text)] # list of Document objects
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=200,
                                                   chunk_overlap=20,
                                                   length_function=len)
    chunks = text_splitter.split_documents(documents)
    chunk_texts = [chunk.page_content for chunk in chunks]
    vector_store = Chroma.from_texts(texts=chunk_texts,
                                     embedding=embeddings,
                                     persist_directory="chroma_db",     # directory to persist the database
                                     collection_name="my_collection")   # name of the collection
    
    return vector_store

## 9. Function to generate summary
This function utilizes an llm to generate a summary of the concert document based on the provided instructions. 
The instructions are designed to guide the llm in creating a concise and informative summary that captures the main points and key events of the concert document.
For this purpose, we use the `ChatPromptTemplate` class to create a prompt template that includes the instructions and the text to be summarized.

In [14]:
def generate_summary(text, instructions, llm):
    summary_prompt = ChatPromptTemplate.from_template(
        """
        Summarize the following concert document based ONLY on these instructions: {instructions}.
        Document: {text}
        """
    )
    try:
        chain = summary_prompt | llm
        response = chain.invoke({"instructions": instructions, "text": text})
    except Exception as e:
        print(f"Error generating summary: {e}")

    return response.content.strip()

SUMMARY_INSTRUCTIONS = """
    You are a concert summarizer. Your task is to summarize the concert document based on the given instructions.
    You should focus on the main points, key events, and any important details that are relevant to the concert.
    The summary should be concise and informative, providing a clear overview of the concert document.
    You should not include any personal opinions or subjective interpretations.
    Your summary should be in a clear and easy-to-understand format, using simple language and avoiding jargon.
    If the document appears to be in JSON format, you should extract the relevant information and summarize it accordingly
    and not mention the JSON format in the summary.
    """

## 10. Functions to create RAG (Retrieval-Augmented Generation) chain

### 10.1. Function to create a retrieval chain
This function creates a retrieval chain using the provided vector store and language model.
The retrieval chain is responsible for retrieving relevant documents from the vector store based on the user's query.

- `ChatPromptTemplate` is used here to create a prompt template for the language model to generate a precise response based on the user's input and the retrieved documents.

- `create_history_aware_retriever` is used to create a retriever that takes into account the conversation history, allowing the model to generate more contextually relevant responses.

In [15]:
def get_retriever_chain(vector_store, llm):
    retriever = vector_store.as_retriever(search_kwargs={"k": 5})   # retrieve 5 most similar documents
    prompt = ChatPromptTemplate.from_messages([MessagesPlaceholder(variable_name="chat_history"), 
                                               ("user","{input}"), 
                                               ("user","Given the following conversation, answer the question")])
    history_retriever_chain = create_history_aware_retriever(llm, retriever, prompt)

    return history_retriever_chain

### 10.2. Function to get the conversational RAG chain
This function creates a conversational RAG chain using the provided history retriever chain and language model.
- `ChatPromptTemplate` is used to create a prompt template for the language model, instructing the system to answer questions based on the context provided. Then a message placeholder is added to include the chat history, and finally the user input is included.
- The `create_stuff_documents_chain` function is used to create a document chain that combines the retrieved documents and the language model.
- The `create_retrieval_chain` function is then used to create a retrieval chain that combines the history retriever chain and the document chain.


This way, the conversational RAG chain can retrieve relevant documents from the vector store and generate answers based on the context of the conversation.

In [16]:
def get_conversational_rag(history_retriever_chain, llm):
    answer_prompt = ChatPromptTemplate.from_messages([("system","Answer the question based on the context below: \n\n{context}"), 
                                                      MessagesPlaceholder(variable_name="chat_history"), 
                                                      ("user","{input}")])
    document_chain = create_stuff_documents_chain(llm, answer_prompt)
    conversational_retrieval_chain = create_retrieval_chain(history_retriever_chain, document_chain)

    return conversational_retrieval_chain

### 10.3. Function to return the response from the chatbot

This function takes the user input and the language model as arguments, and returns the response from the chatbot.
It calls the `get_retriever_chain` and `get_conversational_rag` functions to create a retrieval chain and a conversational RAG chain.
Then, it invokes the chain with the chat history and user input to get the response.

In [17]:
def get_response(user_input, llm):
    formatted_chat_history = [message.content if hasattr(message, 'content') else str(message)
                              for message in st.session_state.chat_history] # list of chat history messages
    
    history_retriever_chain = get_retriever_chain(st.session_state.vector_store, llm)
    conversation_rag_chain = get_conversational_rag(history_retriever_chain, llm)
    response = conversation_rag_chain.invoke({
          "chat_history":formatted_chat_history,
          "input":user_input
      })
    return response["answer"]

## 11. SerpAPI function to extract events for an artist

This function uses the SerpAPI to search for events related to a specific artist.
It takes the artist's name and your SerpAPI key as input and returns a list of events for that artist according to the parameters set in the function.
Search results are edited to remove unnecessary columns and then formatted into a string to feed into the LLM. Events are limited to 3 because the LLM is not able to process larger amounts of text reliably.

In [19]:
def get_events_for_artist(artist_name, api_key):
    params = {
        "api_key": {api_key},
        "engine": "google_events",
        "q": {artist_name},
        "hl": "en",
        "gl": "us",
        "htichips": "concerts"
    }
    search = GoogleSearch(params)
    results = search.get_dict()
    events = results.get("events_results", [])
    # drop 2 last columns from list
    for event in events:
        event.pop("thumbnail", None)
        event.pop("image", None)
        event.pop("event_location_map", None)

    events_str = "\n\n\nEVENT\n".join(
        "\n".join(f"{key}: {value}" for key, value in event.items()) for event in events[:3]
        ) # limit to 3 events and organize string formatting
    return events_str

### Initialization of chat history and vector store.

In [20]:
chat_history=[]
vector_store=[]

## 12. Streamlit App Code
Below is the code for the Streamlit app. It includes the UI elements and the main logic for handling user input and displaying responses.

### 12.1. Sidebar for Uploading Concert Documents and Searching for Concerts based on Artist Name

#### 12.1.1. On the left side of the app, we create a sidebar for uploading concert documents. The sidebar includes a text area for pasting the concert document and a button to upload it. If the user clicks the button, we check if the document is related to concerts using the `is_concert_related` function. If it is, we generate a summary and store it in the vector store.

In [None]:
style = "<style>h2 {text-align: center;}</style>" # centering the header
st.markdown(style, unsafe_allow_html=True)
st.header("LangChain Concert chatbot")
st.write("<p style='text-align: center;'>This app is a concert chatbot that can answer questions about concerts.</p>", unsafe_allow_html=True)

st.sidebar.subheader("Upload Concert Document")
doc_text = st.sidebar.text_area("Paste your concert document here", height=200, key="doc_input")
if st.sidebar.button("Upload Document", key="doc_submit_button"):
    if not doc_text:
        st.warning("Please enter some text.") # if no text is entered, show warning
    elif not is_concert_related(doc_text, CONCERT_RELATED_KEYWORDS): # document relation check
        st.warning("Sorry, I cannot ingest documents with other themes.") # if document is not concert-related, won't ingest and show warning
    else:
        with st.spinner("Processing document..."):
            try:
                summary = generate_summary(doc_text, SUMMARY_INSTRUCTIONS, llm) # generate summary
                if "chat_history" not in st.session_state: # `st.session_state` is used to persist data across reruns
                    st.session_state.chat_history = [] # list to store chat history.
                if "vector_store" not in st.session_state:
                    st.session_state.vector_store = get_vector_store(summary, embeddings) # create vector store
                    st.session_state.last_text = summary # store the last text, which is the summary
                st.success("Your document has been successfully added to the database.")
                st.write(summary)
            except Exception as e: # handle any errors that occur during processing
                st.error(f"Error: {e}")
                print(traceback.format_exc())

#### 12.1.2. The sidebar also includes a text input for searching for concerts based on the artist's name. If the user clicks the button, we use the `get_events_for_artist` function to fetch concert events from Google Search and display them.

In [None]:
st.sidebar.subheader("Search Artist Events")
artist_name = st.sidebar.text_input("Enter artist name (e.g., Lady Gaga)", key="artist_input")
if st.sidebar.button("Search Events", key="event_submit_button"):
    if not artist_name:
        st.warning("Please enter an artist name.")
    else:
        with st.spinner(f"Searching concerts for {artist_name}..."):
            try:
                events = get_events_for_artist(artist_name, SERPAPI_API_KEY)
                summary = generate_summary(events, SUMMARY_INSTRUCTIONS, llm)
                if "chat_history" not in st.session_state:
                    st.session_state.chat_history = []
                if "vector_store" not in st.session_state:
                    st.session_state.vector_store = get_vector_store(summary, embeddings)
                    st.session_state.last_text = summary
                st.success(f"The upcoming events of {artist_name} have been successfully added to the database.")
                st.write(summary)
            except Exception as e:
                st.error(f"Error: {e}")
                print(traceback.format_exc())

### 12.2. Chatbot Interface
The chatbot interface allows users to ask questions about the concert document and get answers based on the ingested data.
It uses the LangChain library to create the conversational retrieval-augmented generation system for accurate and relevant responses.
AIMessage is used to format the AI's responses, and the chat history is maintained in the session state.
The user input is taken from a text input box, and the response is generated using the `get_response` function.

In [None]:
user_input = st.text_input("Ask a quenstion:")
if st.button("Submit", key="question_submit_button"):
    if not user_input:
        st.warning("Please enter a question.")
    else:
        with st.spinner("Generating response..."):
            try:
                if user_input is None and user_input == "":
                    st.warning("Please enter a question.")
                else:
                    response = get_response(user_input, llm)
                    st.session_state.chat_history.append(AIMessage(content=response))
                for message in st.session_state.chat_history:
                    if isinstance(message, AIMessage):
                        with st.chat_message("AI"):
                            st.write(message.content)
            except Exception as e:
                st.error(f"Error: {e}")
                print(traceback.format_exc())

## Coclusion
We successfully created the concert chatbot using LangChain and Streamlit.
It will prove to be a valuable tool for users looking to get information about concerts and events related to their favorite artists.
The chatpot can reliably answer questions based on the ingested concert documents and provide information about upcoming events.

### The implemented code was inspired by the Medium articles, LangChain documentation and the GitHub repository shown in the references below.

## **References**
* https://medium.com/@aminajavaid30/building-a-rag-system-the-data-ingestion-pipeline-d04235fd17ea
* https://medium.com/@laddhaakshatrai/how-to-perform-data-ingestion-with-langchain-day-12-100-f11288d7ae99
* https://www.hostinger.com/tutorials/what-is-ollama#Key_features_of_Ollama
* https://medium.com/@danushidk507/rag-with-llama-using-ollama-a-deep-dive-into-retrieval-augmented-generation-c58b9a1cfcd3
* https://medium.com/@jiangan0808/retrieval-augmented-generation-rag-with-open-source-hugging-face-llms-using-langchain-bd618371be9d
* https://python.langchain.com/docs/tutorials/rag/
* https://python.langchain.com/v0.2/docs/tutorials/local_rag/
* https://medium.com/@mrcoffeeai/conversational-chatbot-trained-on-own-data-streamlit-and-langchain-a45ea5a9dc0f
* https://github.com/y-pred/Langchain/blob/main/Langchain%202.0/RAG_Conversational_Chatbot.ipynb