# Introduction

In scientific research, accessing relevant information quickly and accurately can significantly improve the experimentation and reporting process. Researchers often need to retrieve documents, datasets, and past experimental results to make informed decisions and ensure their work is built upon verified data. This project presents a Retrieval-Augmented Generation (RAG) chatbot specifically designed to streamline the retrieval of documents within a research environment, enabling users to seamlessly access, update, and utilize information throughout their scientific workflows.

## Objectives of the RAG Chatbot Project

- **Efficient Document Access:** Enable researchers to access key documents quickly, whether for background information, experimental procedures, or data analysis.
- **Enhanced Research Documentation:** Support the consistent logging of experiments to promote rigor and reproducibility.
- **Accelerate Reporting:** Automate portions of the report generation process, aligning with scientific publishing standards.
- **Improve Experiment Quality:** Through real-time anomaly detection and feedback, researchers gain insights into their data as it is generated.

### Install required packages

In [None]:
# Necessary dependancies
!pip install -q torch transformers transformers accelerate bitsandbytes langchain sentence-transformers faiss-gpu openpyxl pacmap datasets langchain-community ragatouille streamlit

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/647.5 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m647.5/647.5 kB[0m [31m36.8 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m86.7/86.7 kB[0m [31m8.9 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m122.4/122.4 MB[0m [31m7.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m85.5/85.5 MB[0m [31m9.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m480.6/480.6 kB[0m [31m40.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.4/2.4 MB[0m [31m86.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

### Build the chatboot app file

In [None]:
%%writefile app_chatbot.py
# Streamlit App
import streamlit as st
from transformers import AutoTokenizer, pipeline, AutoModelForCausalLM, BitsAndBytesConfig
import torch
from typing import List, Tuple
import pandas as pd
from langchain.vectorstores import FAISS
from langchain_community.embeddings import HuggingFaceEmbeddings
from ragatouille import RAGPretrainedModel

# Configure pandas
pd.set_option("display.max_colwidth", None)

# Caching the RAG model
@st.cache_resource
def load_rag_model():
    return RAGPretrainedModel.from_pretrained("colbert-ir/colbertv2.0")

# Caching the embedding model
@st.cache_resource
def load_embedding_model():
    return HuggingFaceEmbeddings(
        model_name="thenlper/gte-small",
        multi_process=True,
        model_kwargs={"device": "cuda"},
        encode_kwargs={"normalize_embeddings": True},
    )

# Function to load FAISS database
@st.cache_resource
def load_faiss_database(_embedding_model):
    return FAISS.load_local(
        "/content/drive/MyDrive/3DpresentationF/codes",
        _embedding_model,
        allow_dangerous_deserialization=True
    )

# Load all models at the beginning
embedding_model = load_embedding_model()
KNOWLEDGE_VECTOR_DATABASE = load_faiss_database(embedding_model)
RERANKER = load_rag_model()

# Caching the LLM
@st.cache_resource
def load_llm():
    bnb_config = BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_use_double_quant=True,
        bnb_4bit_quant_type="nf4",
        bnb_4bit_compute_dtype=torch.bfloat16,
    )

    model = AutoModelForCausalLM.from_pretrained(
        "HuggingFaceH4/zephyr-7b-alpha",
        quantization_config=bnb_config
    )
    tokenizer = AutoTokenizer.from_pretrained("HuggingFaceH4/zephyr-7b-alpha")
    return pipeline(
        model=model,
        tokenizer=tokenizer,
        task="text-generation",
        do_sample=True,
        temperature=0.2,
        repetition_penalty=1.1,
        return_full_text=False,
        max_new_tokens=500
    )

READER_LLM = load_llm()

# Function to answer queries with RAG
def answer_with_rag(
    question: str,
    llm: pipeline,
    knowledge_index,
    reranker=None,
    num_retrieved_docs: int = 30,
    num_docs_final: int = 5,
) -> Tuple[str, List[str]]:
    relevant_docs = knowledge_index.similarity_search(query=question, k=num_retrieved_docs)
    relevant_docs = [doc.page_content for doc in relevant_docs]

    if reranker:
        relevant_docs = reranker.rerank(question, relevant_docs, k=num_docs_final)
        relevant_docs = [doc["content"] for doc in relevant_docs]

    relevant_docs = relevant_docs[:num_docs_final]

    context = "\nExtracted documents:\n" + "".join([f"Document {i}:::\n{doc}" for i, doc in enumerate(relevant_docs)])
    final_prompt = f"Using the information contained in the context,\ngive a detailed answer to the question.\nRespond only to the question asked.\nIf the answer cannot be deduced from the context, do not give an answer.\nContext:\n{context}\n\n---\n\nQuestion: {question}"

    answer = llm(final_prompt)[0]["generated_text"]
    return answer, relevant_docs

# User Profiles Configuration
USER_PROFILES = {
    "student": {
    "additional_context": """
    Assume the student has a basic understanding of the subject but is unfamiliar with more advanced concepts.
    Start by introducing the topic with simple terms and gradually move towards the more complex ideas.
    Use analogies and examples to explain difficult concepts, and break down any technical terms or jargon.
    At the end of your explanation, provide a real-world example or application to solidify the student’s understanding.
    If relevant, include a step-by-step guide on how to approach solving a related problem.
    """,
    "response_style": "clear, detailed, and engaging with simple language",
    "max_tokens": 700
},
    "researcher": {
        "additional_context": "Provide concise, precise, and technical responses, focusing on the relevant research findings.",
        "response_style": "concise and technical",
        "max_tokens": 300
    },
    "domain_expert": {
        "additional_context": "Give a highly technical explanation assuming the user has deep expertise in the subject matter.",
        "response_style": "highly technical",
        "max_tokens": 1000
    }
}

# Function to get prompt template based on user profile
def get_prompt_template(user_profile: str, context: str, question: str) -> str:
    if user_profile not in USER_PROFILES:
        raise ValueError(f"User profile '{user_profile}' not found. Available profiles: {list(USER_PROFILES.keys())}")

    profile_data = USER_PROFILES[user_profile]

    base_prompt = [
        {
            "role": "system",
            "content": f"""Using the information contained in the context,
give a {profile_data['response_style']} answer to the question.
Respond only to the question asked. Provide the names of the authors, do not give document numbers.
If the answer cannot be deduced from the context, do not give an answer.
{profile_data['additional_context']}"""
        },
        {
            "role": "user",
            "content": f"""Context:
{context}

---

Question: {question}"""
        }
    ]

    return base_prompt

# Streamlit interface
st.title("RAG-based Scientific Chatbot")

# Partie de l'interface pour afficher l'historique des questions dans la barre latérale
if 'qa_history' not in st.session_state:
    st.session_state.qa_history = []
# Initialize sidebar visibility state
if 'sidebar_open' not in st.session_state:
    st.session_state.sidebar_open = True  # Sidebar is open by default

# Function to toggle sidebar visibility
def toggle_sidebar():
    st.session_state.sidebar_open = not st.session_state.sidebar_open

with st.sidebar:
    st.header("Historique des questions")
    if st.session_state.qa_history:
        for i, (q, a) in enumerate(st.session_state.qa_history):
            if st.button(f"Question {i+1}: {q[:30]}..."):  # Limitez l'affichage de la question
                st.session_state['selected_question'] = q
                st.session_state['selected_answer'] = a

# Input: sélection du profil utilisateur
user_profile = st.selectbox("Select your profile:", options=list(USER_PROFILES.keys()))

# Input: question de l'utilisateur
user_query = st.chat_input("Enter your query:")

# Afficher la question et la réponse sélectionnées
if 'selected_question' in st.session_state:
    with st.chat_message("user"):
        st.write(st.session_state['selected_question'])
    with st.chat_message("assistant"):
        st.write(st.session_state['selected_answer'])

if user_query:
    with st.chat_message("user"):
        st.write(user_query)

    with st.spinner('Retrieving relevant documents...'):
        relevant_docs = KNOWLEDGE_VECTOR_DATABASE.similarity_search(query=user_query, k=5)

    context = "\nExtracted documents:\n" + "".join(
        [f"**Title:** {doc.metadata['title']}\n**Authors:** {doc.metadata['authors']}\n**Submitter:** {doc.metadata['submitter']}\n**Categories:** {doc.metadata['categories']}\n**Journal Reference:** {doc.metadata['journal reference']}\n\n**Content:**\n{doc.page_content}\n\n" for doc in relevant_docs])

    prompt = get_prompt_template(user_profile, context, user_query)

    with st.spinner('Generating an answer...'):
        answer, _ = answer_with_rag(user_query, READER_LLM, KNOWLEDGE_VECTOR_DATABASE)

    with st.chat_message("assistant"):
        st.write(answer)

    # Enregistrer dans l'historique
    st.session_state.qa_history.append((user_query, answer))

    # Afficher les documents récupérés
    with st.expander("Retrieved Documents"):
        for i, doc in enumerate(relevant_docs):
            st.markdown(f"#### Document {i + 1}")
            st.markdown(f"**Title:** {doc.metadata['title']}")
            st.markdown(f"**Authors:** {doc.metadata['authors']}")
            st.markdown(f"**Submitter:** {doc.metadata['submitter']}")
            st.markdown(f"**Categories:** {doc.metadata['categories']}")
            st.markdown(f"**Journal Reference:** {doc.metadata['journal reference']}")
            st.markdown(f"**Content:**\n{doc.page_content}")

Writing app_chatbot.py


In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


### Testing the app using Streamlit and Ngrok

In [None]:
!pip install streamlit

Collecting streamlit
  Downloading streamlit-1.40.0-py2.py3-none-any.whl.metadata (8.5 kB)
Collecting pydeck<1,>=0.8.0b4 (from streamlit)
  Downloading pydeck-0.9.1-py2.py3-none-any.whl.metadata (4.1 kB)
Collecting watchdog<6,>=2.1.5 (from streamlit)
  Downloading watchdog-5.0.3-py3-none-manylinux2014_x86_64.whl.metadata (41 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m41.9/41.9 kB[0m [31m3.7 MB/s[0m eta [36m0:00:00[0m
Downloading streamlit-1.40.0-py2.py3-none-any.whl (8.6 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m8.6/8.6 MB[0m [31m110.7 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading pydeck-0.9.1-py2.py3-none-any.whl (6.9 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.9/6.9 MB[0m [31m119.8 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading watchdog-5.0.3-py3-none-manylinux2014_x86_64.whl (79 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m79.3/79.3 kB[0m [31m7.8 MB/s[0m eta [36m0:00:00[0m


In [None]:
!pip install pyngrok

Collecting pyngrok
  Downloading pyngrok-7.2.1-py3-none-any.whl.metadata (8.3 kB)
Downloading pyngrok-7.2.1-py3-none-any.whl (22 kB)
Installing collected packages: pyngrok
Successfully installed pyngrok-7.2.1


Dowloand Ngrok packages

In [None]:
# Remove existing ngrok if any
#!rm -f ngrok

# Then run the download and unzip commands
!wget -q -O ngrok.zip https://bin.equinox.io/c/4VmDzA7iaHb/ngrok-stable-linux-amd64.zip
!unzip -o ngrok.zip

# Set the ngrok authtoken

# Add your ngrok auth token
!ngrok config add-authtoken 2mLv0yhsmDVpduryAaw83Tv1SdT_4oBGFiWVzEw497U4CWwSu



Archive:  ngrok.zip
  inflating: ngrok                   
Authtoken saved to configuration file: /root/.config/ngrok/ngrok.yml


Setup the App environment

In [None]:
ngrok.kill()

In [None]:
import subprocess
from pyngrok import ngrok

# Run Streamlit app
def run_streamlit():
    process = subprocess.Popen(['streamlit', 'run', 'app_chatbot.py'])
    return process

# Establish ngrok tunnel
def start_ngrok():
    public_url = ngrok.connect(8501)
    print(f"Streamlit app is live at: {public_url}")
    return public_url

# Start Streamlit and ngrok
process = run_streamlit()
url = start_ngrok()


Explore Ngrok tunnels

In [None]:
!ngrok tunnels

ngrok - tunnel local ports to public URLs and inspect traffic

USAGE:
  ngrok [command] [flags]

AUTHOR:
  ngrok - <support@ngrok.com>

COMMANDS: 
  config          update or migrate ngrok's configuration file
  http            start an HTTP tunnel
  tcp             start a TCP tunnel
  tunnel          start a tunnel for use with a tunnel-group backend

EXAMPLES: 
  ngrok http 80                                                 # secure public URL for port 80 web server
  ngrok http --url baz.ngrok.dev 8080                           # port 8080 available at baz.ngrok.dev
  ngrok tcp 22                                                  # tunnel arbitrary TCP traffic to port 22
  ngrok http 80 --oauth=google --oauth-allow-email=foo@foo.com  # secure your app with oauth

Paid Features: 
  ngrok http 80 --url mydomain.com                              # run ngrok with your own custom domain
  ngrok http 80 --cidr-allow 2600:8c00::a03c:91ee:fe69:9695/32  # run ngrok with IP policy restrictions

# Conclusion
The RAG-based document retrieval chatbot developed in this project demonstrates a valuable tool for enhancing scientific research workflows. By combining retrieval-augmented generation capabilities with real-time interaction, the chatbot efficiently meets the need for quick access to research documents ans experiment tracking. This integration of AI-driven document retrieval allows researchers to focus on the substance of their work, reducing the time spent on administrative tasks and document searching.