### In this post, we will explore the power of Langchain agents and their application in question answering over a text document store. Specifically, we will dive into creating a Jupyter Notebook that demonstrates the integration of Langchain agents with Confluence pages, enabling seamless information retrieval. We'll also discuss the challenges encountered during the planning phase and how to overcome them.

### The [langchain framework](https://python.langchain.com/docs/get_started/introduction.html) makes it easy to use Large Language Models (LLMs) as [agents](https://python.langchain.com/docs/modules/agents/) capable of making decisions. Furthermore, these agents can be equipped with a variety of [tools](https://python.langchain.com/docs/modules/agents/tools/) which implement different functionalities. LLM agents can be given access of a combination of such tools. The decision to use a particular tool as part of solving a particular task is based on the language understanding ability of the LLMs. 




### We start with installing and importing relevant libraries and setting up the openAI and SERP API keys.

In [None]:
!pip install langchain atlassian-python-api

In [1]:
import os
from langchain.llms import OpenAI
from langchain import PromptTemplate
from langchain.agents import initialize_agent, Tool
from langchain.agents import AgentType
from langchain.tools import BaseTool
from langchain.document_loaders import ConfluenceLoader
from langchain.chains.question_answering import load_qa_chain
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.text_splitter import CharacterTextSplitter
from langchain.chains import RetrievalQA
from langchain.agents import load_tools
from langchain import OpenAI, SerpAPIWrapper


os.environ["OPENAI_API_KEY"] = "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"
os.environ["SERPAPI_API_KEY"] = "b4b1d59017daf45a63fcad672a4b4ce8e23bb3ad7f9cf7365956191e23c302a1"
llm = OpenAI(model_name="text-davinci-003", temperature = 0.5)

### Connecting to Confluence
Now that you have the OpenAI LLM model ready, let's move on to connecting it to documents and fetching data. To achieve this, we'll make use of Confluence document loader available in Langchain. Confluence is a widely used collaboration and documentation software. It's designed to help teams work together, share information, and create and organize content in a collaborative digital environment. This tutorial utilises Confluence as a wiki/knowledge base because it is particularly popular in enterprise teams for its ability to facilitate knowledge sharing and communication. 

Using Langchain's ConfluenceLoader class, we can load the documents from Confluence. There are some dummy documents created at https://confluence-llm.atlassian.net/wiki . You need to create an account in Confluence in order to access this page.


Accessing Confluence Data Loaders in Langchain: Langchain provides built-in data loaders to fetch data from various sources, including Confluence. Initialize a Confluence data loader instance by providing your Confluence username, api key, and the Confluence space or page URL you want to extract data from.

All info related to loading data from confluence can be found at: https://python.langchain.com/docs/integrations/document_loaders/confluence

In [12]:
import requests
url = 'https://github.com/sagaruprety/doc_store/blob/ad04f32356ad9c437ca4621fb0bbd32a1b888991/llm_papers/Emergent-ability-llm-summary.txt'
# url = 'https://raw.githubusercontent.com/netology-code/py-homework-basic-files/master/3.2.http.requests/DE.txt'
resp = requests.get(url)
resp.content

b'404: Not Found'

In [6]:
# Connecting with Confluence API
# api_key can be created and managed at https://id.atlassian.com/manage-profile/security/api-tokens

loader = ConfluenceLoader(
    url="https://confluence-llm.atlassian.net/wiki",
    username="sagaruprety@gmail.com",
    api_key="ATATT3xFfGF0TTnRi9V1HT_JYruA4NwoP0AmYseisVgE0ZB5_aMf2p-Gf88vvZcaKbkraPrB1WCZalWtOYOyrcxX50IR8PHOzmGNgv6mJ683z_eJG1qM9EGR1i86kz4x5wTQ_Tcecd9rV2n2IIq0DQ0i3Bdkq2iNgNVXduq0hnAbdFaWxLHBvQY=73F405A7"
)

# space_key refers to the specific page within Confluence you wish to load documents from.
# It can found at https://yoursite.atlassian.com/wiki/spaces/<space_key>/pages/<page_id>
documents = loader.load(space_key="~5e91dc845f62040b813a19da", limit=50)

In [None]:
# Simple Question answering over Confluence data
query = "What is the REACT model?"
qa_chain = load_qa_chain(llm, chain_type="stuff")
qa_chain.run(input_documents=documents, question=query)

In [None]:
query = "Which paper works in medical domain?"
qa_chain.run(input_documents=documents, question=query)

### Below I create a custom search tool which is an combination of two tools - one searches the confluence data store to answer the user query, and the other is the search engine result page (serp) api tool, which I want to use whenever some question cannot be answered with confluence data.

### Langchain Tool class takes in three parameters - name of the tool, the functionality of the tool and a description of the tool, which can be useful for an agent to decide when and how to use the tool.

In [None]:
# creating a custom search tool using with basic descriptions

text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(documents)

# defining search tool using confluence document store
embeddings = OpenAIEmbeddings()
docsearch = Chroma.from_documents(texts, embeddings)
confluence_ml_papers = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=docsearch.as_retriever())

# defining search tool using SerpAPIWrapper
search = SerpAPIWrapper()

search_tool_default = [
    Tool(
        name = "Confluence ML Papers wiki",
        func = confluence_ml_papers.run,
        description = "Use it to lookup information from Confluence pages"
    ),
    Tool(
        name = "Search",
        func=search.run,
        description="Use this to lookup information from google search engine",
    )]



### Now we create an agent which will use the custom combined tool. We create a React zero shot agent, as we also get to see the reasoning behind sub-tool selection.

In [None]:
# create an agent which will use the custom tool

# serp_tools = load_tools(["serpapi"], confluence_tool )

# Finally, let's initialize an agent with the tools, the language model, and the type of agent we want to use.
agent = initialize_agent(search_tool_default , llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True)
# agent.agent.llm_chain.verbose=True
# Now let's test it out!
agent.run("What is the REACT paper in AI about? Who are its authors?")

### In the above result, we see that the agent takes the google search action, i.e. uses the SerpAPI tool all the time. We would like it to first lookup for information in the Confluence data base first, before going to google.

In [None]:
# creating a custom search tool using with information about order of usage in descriptions

text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(documents)

# defining search tool using confluence document store
embeddings = OpenAIEmbeddings()
docsearch = Chroma.from_documents(texts, embeddings)
confluence_ml_papers = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=docsearch.as_retriever())

# defining search tool using SerpAPIWrapper
search = SerpAPIWrapper()

search_tool_specify_order_in_dscrptn = [
    Tool(
        name = "Confluence ML Papers wiki",
        func = confluence_ml_papers.run,
        description = "Use it to lookup information from Confluence pages. Always used as first tool"
    ),
    Tool(
        name = "Search",
        func=search.run,
        description="Use this to lookup information from google search engine. \
        Use it only after you have tried using the Confluence ML Papers wiki tool.",
    )]



In [None]:
# create an agent which will use the custom tool

# serp_tools = load_tools(["serpapi"], confluence_tool )

# Finally, let's initialize an agent with the tools, the language model, and the type of agent we want to use.
agent = initialize_agent(search_tool_specify_order_in_dscrptn , llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True)
# agent.agent.llm_chain.verbose=True
# Now let's test it out!
agent.run("What is the REACT paper in AI about? Who are its authors?")

In [None]:
agent.run("Which paper works in the medical domain? Which university are its authors affiliated to?")

In [None]:
 agent.run("What is this new Camel paper in LLMs about. Give detailed summary of the paper")

### We need to make changes in the final prompt which is being sent to the model. This can be seen using 'agent.agent.llm_chain.prompt.template'. 

In [None]:
agent.agent.llm_chain.prompt.template = '''Answer the following questions as best you can. \n
You have access to the following tools:\n\n
Confluence ML Papers : Always used as first tool\n
Search: Always used after Confluence ML Papers wiki tool. Useful only when you do not find the answer in Confluence ML Papers wiki.\n\n
Action input should include as much context as possible.\n\n
Use terms from the observation of first tool as input to the search tool.

Question: the input question you must answer\nThought: you should always think about what to do\n
Action: the action to take, should be one of [Confluence ML Papers wiki, Search]. Always look first in Confluence ML Papers wiki\n
Action Input: the input to the action\n
Observation: the result of the action\n
... (this Thought/Action/Action Input/Observation can repeat N times)\n
Thought: I now know the final answer\n
Final Answer: the final answer to the original input question\n\n
Begin!\n\n
Question: {input}\n
Thought:{agent_scratchpad}'''

In [None]:
agent.run("What is the REACT paper in AI about? Who are its authors?")

In [None]:
agent.run("Which paper works in the medical domain? Which university are its authors affiliated to?")

In [None]:
 agent.run("What is this new Camel paper in LLMs about. Give detailed summary of the paper")