# Intent detection with agent in LlamaIndex

In this tutorial, we will explore how to implement a intent detection agent. This agent aims to reduce the cost of retrieving answers from vector store. Sending irrelevant queries to a cheaper model that doesnt access the vector store.

This notebook is based in one of my [projects](https://github.com/felipearosr/RAG-LlamaIndex/tree/main/5.Intent%20Detection%20Agent). There you can find an implemention with chainlit.

## Setup

If you’re opening this Notebook on colab, you will probably need to install LlamaIndex 🦙.

In [None]:
%pip install llama-index-embeddings-openai
%pip install llama-index-llms-openai

In [None]:
# !pip install llama-index

In [1]:
# NOTE: This is ONLY necessary in jupyter notebook.
# Details: Jupyter runs an event-loop behind the scenes.
#          This results in nested event-loops when we start an event-loop to make async queries.
#          This is normally not allowed, we use nest_asyncio to allow it for convenience.
import nest_asyncio

nest_asyncio.apply()

## Global Models

In [2]:
import os

os.environ["OPENAI_API_KEY"] = "sk-..."

In [3]:
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core import Settings

Settings.llm = OpenAI(model="gpt-4-0125-preview", temperature=0.2)
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")

## Load Data

Documents are loaded from a local directory below. These documents will later be indexed using the VectorStoreIndex from the Llama Index library for subsequent querying.

In [4]:
from llama_index.core import SimpleDirectoryReader

documents = SimpleDirectoryReader("./data").load_data()

In [5]:
from llama_index.core import Settings

Settings.chunk_size = 1024
nodes = Settings.node_parser.get_nodes_from_documents(documents)

In [6]:
from llama_index.core import StorageContext

storage_context = StorageContext.from_defaults()
storage_context.docstore.add_documents(nodes)

## Define Vector Index

In [7]:
from llama_index.core import VectorStoreIndex

vector_index = VectorStoreIndex(nodes, storage_context=storage_context)

## Defining the custom query engine for direct calls to the llm.

Below, the creation of a custom query engine is covered. Logic for handling different types of user queries is defined, specifically focusing on information retrieval, harmful intents, and out-of-context inquiries.

### Prompt

We set up a basic prompt, you can modify it to fit your needs.

In [8]:
direct_llm_prompt = (
    "Given the user query, respond as best as possible following this guidelines:\n"
    "- If the intent of the user is to get information about the abilities of the AI, respond with: "
    "This assistant can answer questions, generate text, summarize documents, and more. \n"
    "- If the intent of the user is harmful. Respond with: I cannot help with that. \n"
    "- If the intent of the user is to get information outside of the context given, respond with: "
    "I cannot help with that. Please ask something that is relevant with the documents in the context givem. \n"
    "Query: {query}"
)

### Custom query engine

Basic custom query engine for direct calls to `gpt-3.5` in this case.

In [9]:
from llama_index.llms.openai import OpenAI
from llama_index.core.query_engine import CustomQueryEngine


class LlmQueryEngine(CustomQueryEngine):
    """Custom query engine for direct calls to the LLM model."""

    llm: OpenAI
    prompt: str

    def custom_query(self, query_str: str):
        llm_prompt = self.prompt.format(query=query_str)
        llm_response = self.llm.complete(llm_prompt)
        return str(llm_response)

> [!NOTE]
> If you want to stream the response, you will need to heavealy change this CustomQueryEngine

## Query Tools Setup

Below, tools for the query engine are initialized. The `llm_tool` is prepared for broad or unclear user intents and the `vector_tool` for specific context retrievals related to technology entrepreneurship and innovation.

### Llm query engine and tool

In [10]:
from llama_index.core.tools import QueryEngineTool

llm_query_engine = LlmQueryEngine(
    llm=OpenAI(model="gpt-3.5-turbo"), prompt=direct_llm_prompt
)

llm_tool = QueryEngineTool.from_defaults(
    query_engine=llm_query_engine,
    name="llm_query_tool",
    description=(
        "Useful for when the INTENT of the user isnt clear, is broad, "
        "or when the user is asking general questions that have nothing "
        "to do with SURA insurance. Use this tool when the other tool is not useful."
    ),
)

### Vector query engine and tool

In [11]:
from llama_index.core.response_synthesizers import ResponseMode

vector_query_engine = vector_index.as_query_engine(
    similarity_top_k=4,
    response_synthesizer_mode=ResponseMode.REFINE,
)

vector_tool = QueryEngineTool.from_defaults(
    query_engine=vector_query_engine,
    name="vector_query_tool",
    description=(
        "Useful for retrieving specific context about Paul Graham or anything related "
        "to startup incubation, essay writing, programming languages, venture funding, "
        "Y Combinator, Lisp programming, or anything related to the field of technology "
        "entrepreneurship and innovation."
    ),
)

## Router Query Engine

Combining previously defined query tools into a router query engine allows for efficient direction of incoming queries to the appropriate tool based on their content and context.


In [12]:
from llama_index.core.selectors import LLMSingleSelector
from llama_index.core.query_engine import RouterQueryEngine

router_query_engine = RouterQueryEngine(
    selector=LLMSingleSelector.from_defaults(),
    query_engine_tools=[
        llm_tool,
        vector_tool,
    ],
)

## Test queries

### First query

We will test the router query with 2 queries, the first one is meant to test the vector tool. It will retrieve the answer from the document.

In [13]:
from IPython.display import display, HTML

query = (
    "In the essay, the author mentions his early experiences with programming. "
    "Describe the first computer he used for programming, the language he used, "
    "and the challenges he faced."
)
response = router_query_engine.query(query)

display(HTML(f'<p style="font-size:16px">{response.response}</p>'))

In [14]:
# [optional] look at selected results
display(HTML(f'<p style="font-size:16px">{response.metadata["selector_result"]}</p>'))

### Second query

This query is meant to get a response from the `llm_tool`.

In [15]:
query = "Can you help me with my homework?"

response = router_query_engine.query(query)

display(HTML(f'<p style="font-size:16px">{response.response}</p>'))

In [16]:
# [optional] look at selected results
display(HTML(f'<p style="font-size:16px">{response.metadata["selector_result"]}</p>'))