# Building Confluence QA App on Neopilot
We're revolutionizing chatbot interactions with advancements in AI and NLP like OpenAI's GPT  and LangChain. In this post, we'll explore how to use Neopilot to simplify and enhance the process of building a Q&A app for an Internal Knowledge base from conceptualization to deployment (monitoring and iteration of your application.) (not doing this)

GitHub link for [code](Add link)

## Solution Overview

Answering from a large text is difficult because the models from OpenAI take limited context. There are multiple ways to get around this problem.

1. Form text snippets and sequentially query the LLM with a prompt based on the current snippet and refine the answer from previous snippets. It helps in iteratively covering all the text. But this method is slow and cost-ineffective.
2. We can use LLMs with longer context windows. There are constant advancements in the field of LLM. Anthropic released a 100k context window Claude model. This is only a partial solution when we query a considerable knowledge base.
3. Form a prompt using the nearest text snippets related to the question with the help of embeddings, and query the LLM with the prompt. The idea is to have an embedding vector store for each text snippet. When a question is asked, we compute the embedding of the question and retrieve the nearest embeddings to the question vector using a similarity search. 

##Architecture:



* Step 1:
[Knowledge Base] → [Text Snippets] → [Snippet embedding]

* Step 2:
[User asks a question] → [Compute embedding] → [find relevant snippets using similarity search]

* Step 3:
[Prompt engineering] using relevant snippets, User's question  -> [Query LLM with Prompt] - Get the answer



### Text embeddings:

We can use Open source models like SBERT, Universal Sentence Encoder, Instructor-XL, or  OpenAI APIs like text-embedding-ada-002. [MTEB Leader board](https://huggingface.co/spaces/mteb/leaderboard) from Hugging Face compares different models on various tasks. In this work, we use OpenAI's `adav2`.

### LLMs:  

We can use open-source models like FastChat, Falcon, Flan-T5 or APIs from OpenAI [GPT-3.5 models](https://platform.openai.com/docs/models/gpt-3-5). In this work, we use OpenAI's `gpt-3.5-turbo`

* **Step 1: Creating an Embedding Store from the knowledge base:**

    In our case, we are using Confluence pages as the knowledge base. Langchain provides a variety of Document Loaders for different knowledge bases like `ConfluenceLoader`, `PDFLoader`, `NotionLoader`.
    We use Langchain's `ConfluenceLoader`  with `TextSplitter` and `TokenSplitter` to efficiently split the documents into text snippets. Finally, We create embeddings with OpenAI's `adav2` and store them with Chromadb.

    There are many vector stores integration provided by Langchain. We have used `Chromadb` since it was easy to setup. We can design weekly Jobs to extract new Confluence pages and update the Vector DB store. Find the code and relevant description at [Step 1](#step1-creating-an-embedding-store-from-the-knowledge-base)

* **Step 2: Computing questions embeddings and finding relevant snippets**

    We have used RetreivalQA from Langchain, with ChromaDB to retrieve top K relevant text snippets based on the similarity with questions embedding in [Step 2](#step-2-computing-questions-embeddings-and-finding-relevant-snippets)


* **Step 3: Prompt engineering and querying LLM**

    We have used the default prompt of RetreivalQA from Langchain. How to add a Custom prompt is shown in [Step 3](#step-3-prompt-engineering-and-querying-llm). 


* **Step 4: Streamlit App and Creating a Service with Neopilot**

    The final step is to package everything into a streamlit application and expose the endpoint, shown in [Step 4](#step-4-streamlit-service-and-creating-a-service-with-neopilot)

### Initialize OpenAI Keys


In [1]:
import os
os.environ["OPENAI_API_KEY"] ="sk-**"

In [2]:
# Constants
EMB_OPENAI_ADA = "text-embedding-ada-002"
EMB_SBERT = None # Chroma takes care

LLM_OPENAI_GPT35 = "gpt-3.5-turbo"

In [None]:
## Installations 
!pip install langchain==0.0.189
!pip install chromadb==0.3.25
!pip install openai==0.27.6
!pip install pytesseract==0.3.10
!pip install beautifulsoup4==4.12.2
!pip install atlassian-python-api==3.38.0
!pip install tiktoken==0.4.0
!pip install lxml==4.9.2

In [4]:
import os
from langchain.document_loaders import ConfluenceLoader
from langchain.text_splitter import CharacterTextSplitter, TokenTextSplitter
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.prompts import PromptTemplate
from langchain.chat_models import ChatOpenAI
from langchain.vectorstores import Chroma
from langchain.chains import RetrievalQA

### ConfluenceQA Initialize

**Initialize the embedding model to be used for embedding the text snippets.**

* OpenAI provides several embedding models like `ada-v2`, `ada-v1`, `davinci-v1`, `curie-v1`, `babbage-v1`. The default model is `ada-v2` (`text-embedding-ada-002`) which is the most performative and cost-effective. You can learn more about embeddings from [OpenAI Documentation](https://platform.openai.com/docs/guides/embeddings)


In [5]:
embedding = OpenAIEmbeddings()


**Intialize the LLM model to be used for the final LLM call to query with prompt**

* Available OpenAI LLM APIs are 
    * GPT-4 Models - most powerful and in Limited Beta
    * GPT-3.5 Models - Has a context length of 4096 tokens and more powerful than GPT-3 models
    * GPT-3 Models - Has a context length of 2049 tokens and are available for finetuning
* We have used ChatGPT Model (`gpt-3.5-turbo`), since it's cheapest among GPT-3.5 models. It's advised to try out different models, since some models excel in specific tasks. You can find more about LLM APIs from [OpenAI Documentation](https://platform.openai.com/docs/models)

In [6]:
llm = ChatOpenAI(model_name=LLM_OPENAI_GPT35, temperature=0.)

### Step1: Creating an Embedding Store from the knowledge base:

#### Extract the documents with ConfluenceLoader 

`ConfluenceLoader` can extract the documents with `username`,`apikey` and `confluenceurl`. [ConfluenceLoader](https://python.langchain.com/en/latest/modules/indexes/document_loaders/examples/confluence.html?highlight=confluence%20loader) currently supports `username/api_key`, `Oauth2 login` authentication.


In [8]:
config = {"persist_directory":"./chroma_db/",
          "confluence_url":"https://templates.atlassian.net/wiki/",
          "username":None,
          "api_key":None,
          "space_key":"RD"
          }

In [9]:
persist_directory = config.get("persist_directory",None)
confluence_url = config.get("confluence_url",None)
username = config.get("username",None)
api_key = config.get("api_key",None)
space_key = config.get("space_key",None)

## 1. Extract the documents
loader = ConfluenceLoader(
    url=confluence_url,
    username = username,
    api_key= api_key
)
documents = loader.load(
    space_key=space_key,
    limit=100
    )


#### Split documents and create text snippets


In [12]:
text_splitter = CharacterTextSplitter(chunk_size=100, chunk_overlap=0)
texts = text_splitter.split_documents(documents)
text_splitter = TokenTextSplitter(chunk_size=1000, chunk_overlap=10, encoding_name="cl100k_base")  # This the encoding for text-embedding-ada-002
texts = text_splitter.split_documents(texts)


#### Generate Embeddings and add to chroma store

In [13]:
if persist_directory and os.path.exists(persist_directory):
    vectordb = Chroma(persist_directory=persist_directory, embedding_function=embedding)
else:
    vectordb = Chroma.from_documents(documents=texts, embedding=embedding, persist_directory=persist_directory)
    

### Step 2: Computing questions embeddings and finding relevant snippets
#### Retreival QA Chain
<!-- TODO: Add about Retreival QA Chain -->


### Step 3: Prompt engineering and  querying LLM
* We have used the default prompt from Langchain here
    ```python
    prompt_template = """Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.

    {context}

    Question: {question}
    Helpful Answer:"""
    PROMPT = PromptTemplate(
        template=prompt_template, input_variables=["context", "question"]
    )
    ```


* For passing a custom prompt with `context` and `question`:
    ```python
    custom_prompt_template = """You are a Confluence chatbot answering questions. Use the following pieces of context to answer the question at the end. If you don't know the answer, say that you don't know, don't try to make up an answer.

    {context}

    Question: {question}
    Helpful Answer:"""
    CUSTOMPROMPT = PromptTemplate(
        template=custom_prompt_template, input_variables=["context", "question"]
    )
    ## Inject custom prompt 
    qa.combine_documents_chain.llm_chain.prompt = CUSTOMPROMPT
    ```



In [14]:
retriever = vectordb.as_retriever(search_kwargs={"k":4})
qa = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff",retriever=retriever)

In [17]:
question = "How to organize content in a space?"

answer = qa.run(question)
print(answer)

To organize content in a space, you can create pages or blogs for different types of content. Pages can have child pages, which allows you to organize content into categories and subcategories. You can also use labels to categorize and identify content, and create a table of contents for your space using the Content Report Table Macro. Additionally, you can customize the sidebar to make it easier to navigate through your space and add a search box to find content within your space.


### Let's pack every thing into a class

In [7]:
class ConfluenceQA:
    def __init__(self,config:dict = {}):
        self.config = config
        self.embedding = None
        self.vectordb = None
        self.llm = None
        self.qa = None
        self.retriever = None
    def init_embeddings(self) -> None:
        # OpenAI ada embeddings API
        self.embedding = OpenAIEmbeddings()
    def init_models(self) -> None:
        # OpenAI GPT 3.5 API
        self.llm = ChatOpenAI(model_name=LLM_OPENAI_GPT35, temperature=0.)
        
    def vector_db_confluence_docs(self,force_reload:bool= False) -> None:
        """
        creates vector db for the embeddings and persists them or loads a vector db from the persist directory
        """
        persist_directory = self.config.get("persist_directory",None)
        confluence_url = self.config.get("confluence_url",None)
        username = self.config.get("username",None)
        api_key = self.config.get("api_key",None)
        space_key = self.config.get("space_key",None)
        if persist_directory and os.path.exists(persist_directory) and not force_reload:
            ## Load from the persist db
            self.vectordb = Chroma(persist_directory=persist_directory, embedding_function=self.embedding)
        else:
            ## 1. Extract the documents
            loader = ConfluenceLoader(
                url=confluence_url,
                username = username,
                api_key= api_key
            )
            documents = loader.load(
                space_key=space_key, 
                limit=100)
            ## 2. Split the texts
            text_splitter = CharacterTextSplitter(chunk_size=100, chunk_overlap=0)
            texts = text_splitter.split_documents(documents)
            text_splitter = TokenTextSplitter(chunk_size=1000, chunk_overlap=10, encoding_name="cl100k_base")  # This the encoding for text-embedding-ada-002
            texts = text_splitter.split_documents(texts)

            ## 3. Create Embeddings and add to chroma store
            ##TODO: Validate if self.embedding is not None
            self.vectordb = Chroma.from_documents(documents=texts, embedding=self.embedding, persist_directory=persist_directory)
    def retreival_qa_chain(self):
        """
        Creates retrieval qa chain using vectordb as retrivar and LLM to complete the prompt
        """
        ##TODO: Use custom prompt
        self.retriever = self.vectordb.as_retriever(search_kwargs={"k":4})
        self.qa = RetrievalQA.from_chain_type(llm=self.llm, chain_type="stuff",retriever=self.retriever)

    def answer_confluence(self,question:str) ->str:
        """
        Answer the question
        """
        answer = self.qa.run(question)
        return answer


In [25]:
config = {"persist_directory":"./chroma_db/",
          "confluence_url":"https://templates.atlassian.net/wiki/",
          "username":None,
          "api_key":None,
          "space_key":"RD"}
confluenceQA = ConfluenceQA(config=config)
confluenceQA.init_embeddings()
confluenceQA.init_models()

In [None]:
## Create Vector DB 

confluenceQA.vector_db_confluence_docs()

In [27]:
### Retreival QA Chain
confluenceQA.retreival_qa_chain()

In [28]:
### Run the chain
question = "How to organize content in a space?"
confluenceQA.answer_confluence(question)

### Step 4: Streamlit service and Creating a service with Neopilot


We can have a streamlit app around this and create a service to deploy locally or on cluster.
* Neopilot helps to productionise the service quickly
* Sabrina can help to add more on deployment part here

```python
import streamlit as st
import os
import json
import time
from dotenv import load_dotenv

from confluence_qa import ConfluenceQA
load_dotenv()
st.set_page_config(
    page_title='Q&A Bot for Confluence Page',
    page_icon='⚡',
    layout='wide',
    initial_sidebar_state='auto',
)

st.session_state["config"] = {}
confluence_qa = None  # Define confluence_qa initially as None

@st.cache_resource
def load_confluence(config):
    # st.write("loading the confluence page")
    confluence_qa = ConfluenceQA(config=config)
    confluence_qa.init_embeddings()
    confluence_qa.init_models()
    confluence_qa.vector_db_confluence_docs()
    confluence_qa.retreival_qa_chain()
    return confluence_qa

with st.sidebar.form(key ='Form1'):
    st.markdown('## Add your configs')
    confluence_url = st.text_input("paste the confluence URL", "https://templates.atlassian.net/wiki/")
    username = st.text_input(label="confluence username",
                             help="leave blank if confluence page is public",
                             type="password")
    space_key = st.text_input(label="confluence space",
                             help="Space of Confluence",
                             value="RD")
    api_key = st.text_input(label="confluence api key",
                            help="leave blank if confluence page is public",
                            type="password")
    submitted1 = st.form_submit_button(label='Submit')

    if submitted1 and confluence_url and space_key:
        st.session_state["config"] = {
            "persist_directory": None,
            "confluence_url": confluence_url,
            "username": username if username != "" else None,
            "api_key": api_key if api_key != "" else None,
            "space_key": space_key,
        }
        with st.spinner(text="Ingesting Confluence..."):
            confluence_qa = load_confluence(st.session_state["config"])
            st.session_state["confluence_qa"] = confluence_qa
        st.write("Confluence Space Ingested")
        

st.title("Confluence Q&A Demo")

question = st.text_input('Ask a question', "How do I make a space public?")

if st.button('Get Answer', key='button2'):
    with st.spinner(text="Asking LLM..."):
        confluence_qa = st.session_state.get("confluence_qa")
        if confluence_qa is not None:
            result = confluence_qa.answer_confluence(question)
            st.write(result)
        else:
            st.write("Please load Confluence page first.")
```