# LongLLMLingua

<a href="https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/examples/node_postprocessor/LongLLMLingua.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

LongLLMLingua is a research project/paper that presents a new method for prompt compression in the long-context setting.

- Paper: https://arxiv.org/abs/2310.06839
- Repo: https://github.com/microsoft/LLMLingua

In this guide, we show how you can seamlessly use prompt compression in your RAG pipeline. We implement LongLLMLingua as a node postprocessor, which will compress context after the retrieval step before feeding it into the LLM.

**NOTE**: we don't implement the [subsequence recovery method](https://github.com/microsoft/LLMLingua/blob/main/DOCUMENT.md#post-precessing) since that is after the step of processing the nodes.

**NOTE**: You need quite a bit of RAM/GPU capacity to run this. We got it working on Colab Pro with a V100 instance.

In [None]:
!pip install llmlingua llama-index

In [None]:
import openai

In [None]:
openai.api_key = "<insert_openai_key>"

## Setup (Data + Index)

We load in PG's essay, index it, and define a retriever.

In [None]:
!wget "https://www.dropbox.com/s/f6bmb19xdg0xedm/paul_graham_essay.txt?dl=1" -O paul_graham_essay.txt

In [None]:
from llama_index import (
    VectorStoreIndex,
    SimpleDirectoryReader,
    load_index_from_storage,
    StorageContext,
)

In [None]:
# load documents
documents = SimpleDirectoryReader(
    input_files=["paul_graham_essay.txt"]
).load_data()

In [None]:
index = VectorStoreIndex.from_documents(documents)

In [None]:
retriever = index.as_retriever(similarity_top_k=2)

In [None]:
# query_str = "What did the author do growing up?"
# query_str = "What did the author do during his time in YC?"
query_str = "Where did the author go for art school?"

In [None]:
results = retriever.retrieve(query_str)
print(results)

In [None]:
results

## Setup LongLLMLingua as a Postprocessor

We setup `LongLLMLinguaPostprocessor` which will use the `longllmlingua` package to run prompt compression.

We specify a target token size of 300, and supply an instruction string.

Special thanks to Huiqiang J. for the help with the parameters.

In [None]:
from llama_index.query_engine import RetrieverQueryEngine
from llama_index.response_synthesizers import CompactAndRefine
from llama_index.postprocessor import LongLLMLinguaPostprocessor

node_postprocessor = LongLLMLinguaPostprocessor(
    instruction_str="Given the context, please answer the final question",
    target_token=300,
    rank_method="longllmlingua",
    additional_compress_kwargs={
        "condition_compare": True,
        "condition_in_question": "after",
        "context_budget": "+100",
        "reorder_context": "sort",  # enable document reorder
    },
)

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

## Try It Out

We show you how to compose a retriever + compressor + query engine into a RAG pipeline.
1. We show you this step by step.
2. We show you how to do this in an out-of-the-box fashion with our `RetrieverQueryEngine`.

### Step-by-Step

In [None]:
retrieved_nodes = retriever.retrieve(query_str)
synthesizer = CompactAndRefine()

In [None]:
from llama_index.schema import QueryBundle

# outline steps in RetrieverQueryEngine for clarity:
# postprocess (compress), synthesize
new_retrieved_nodes = node_postprocessor.postprocess_nodes(
    retrieved_nodes, query_bundle=QueryBundle(query_str=query_str)
)

In [None]:
print("\n\n".join([n.get_content() for n in new_retrieved_nodes]))

' why artists have a signature style, but' buyers pay a lot work.

I a lot the color class I tookISD, but I teaching myself I do for in 199 I dropped out I hung around Providence for a bit and then my friendmet did me big favor. A rent-control apartment a her mother in New York becoming vacant. Did I want it It wasn't much more my current place, and New York supposed to be where the artists were yes, I wanted7Id for the BFA at RISD, meant effect that had to go again. This was as as it sounds because I was only2 and art full people of different ages. RIS me as a transfer sophomore and said do the. The foundation means the classes that everyone has to fundamental subjects like color, and design.
anwhile was applying schools applied to two: RISD the US, and the Accadem di Belli Arti Florence, which, because was oldest art, I imagined be good RIS accepted me, and never heard back from the Accademia, off Providence I went.
 liked on square canv 5 on a side One194 as was stretch one of these

In [None]:
response = synthesizer.synthesize(query_str, new_retrieved_nodes)

In [None]:
print(str(response))

The author went to RISD for art school.


### Out of the box with `RetrieverQueryEngine`

In [None]:
retriever_query_engine = RetrieverQueryEngine.from_args(
    retriever, node_postprocessors=[node_postprocessor]
)

In [None]:
response = retriever_query_engine.query(query_str)
print(str(response))