# LlamaIndex Integration 🦙

LMQL can be used with the [LlamaIndex](https://github.com/jerryjliu/llama_index) python library. To illustrate, this notebook demonstrates how you can query a LlamaIndex data structure as part of an LMQL query.

This enables you to leverage [LlamaIndex's powerful index data structures](https://gpt-index.readthedocs.io/en/latest/guides/primer/index_guide.html), to enrich the reasoning capabilities of an LMQL query with retrieved information from e.g. a text document that you provide.

### Importing Libraries

First, we need to import the required LlamaIndex library. For this make sure llama_index is installed via `pip install llama_index`. Then, you can run the following commands to import the required `lmql` and `llama_index` components.

In [3]:
# setup lmql path (not shown in documentation, metadata has nbshpinx: hidden)
import sys 
sys.path.append("../../../src/")
# load and set OPENAI_API_KEY
import os 
os.environ["OPENAI_API_KEY"] = open("../../../api.env").read().split("\n")[1].split(": ")[1].strip()

%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [4]:
from llama_index import GPTSimpleVectorIndex, SimpleDirectoryReader, ServiceContext
import lmql

### Load Documents and Build Index

In this example, we want to query the full text of the LMQL research paper for useful information during question answering. For this, we first load documents using LlamaIndex's `SimpleDirectoryReader`, and build a `GPTSimpleVectorIndex` (an index that uses an in-memory embedding store).

In [14]:
# loads ./lmql.txt, the full text of the LMQL paper
documents = SimpleDirectoryReader('.').load_data() 
service_context = ServiceContext.from_defaults(chunk_size_limit=512)

Next, we construct a retrieval index over the full text of the research paper.

In [6]:
index = GPTSimpleVectorIndex.from_documents(documents, service_context=service_context)

### Question Answering by Querying with LlamaIndex

Now that we have an `index` to query, we can employ it during LMQL query execution. Since LMQL is fully integrated with the surrounding python program context, we can simply call `index.query(...)` during query execution to do so:

In [17]:
similarity_top_k = 2

@lmql.query
async def index_query(question: str):
        '''
sample(temperature=0.2, max_len=2048, openai_chunksize=2048)
    """You are a QA bot that helps users answer questions."""
    response = index.query(question, response_mode="no_text", similarity_top_k=similarity_top_k)
    information = "\n\n".join([s.node.get_text() for s in response.source_nodes])
    "Question: {question}\n"
    "\nRelevant Information: {information}\n"
    "Your response based on relevant information:[RESPONSE]"
from
    "openai/gpt-3.5-turbo"
        '''

Here, we first query the `index` using a given `question` and then process the retrieved document chunks, into an small summary answering `question`, by producing a corresponding `RESPONSE` output, using the ChatGPT, as specified in the `from`-clause of the query.

In [18]:
result = await index_query("What is scripted prompting in LMQL?", output_writer=lmql.stream(variable="RESPONSE"))

Scripted prompting in LMQL refers to the ability to specify complex interactions, control flow, and constraints using lightweight scripting and declarative SQL-like elements in the Language Model Query Language (LMQL). This allows users to prompt language models with precise constraints and efficient decoding without requiring knowledge of the LM's internals. LMQL can be used to express a wide variety of existing prompting methods using simple, concise, and vendor-agnostic code. The underlying runtime is compatible with existing LMs and can be supported easily, requiring only a simple change in the decoder logic.