One usage of postprocessors is to be able to develop with local LLM models, which are very slow on large context lengths.

Here we see an example of using a reranker and an optimizer which reduces the node length.

In [1]:
from llama_index import VectorStoreIndex, SimpleDirectoryReader

In [2]:
# load documents
documents = SimpleDirectoryReader("../data/paul_graham").load_data()

In [3]:
from llama_index import ServiceContext, set_global_service_context

ctx = ServiceContext.from_defaults(embed_model="local", llm_predictor="local:cpu")
set_global_service_context(ctx)

  from .autonotebook import tqdm as notebook_tqdm
llama.cpp: loading model from /home/jonch/.cache/llama_index/models/llama-2-13b/ggml-model-q4_0.bin
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 4096
llama_model_load_internal: n_embd     = 5120
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 40
llama_model_load_internal: n_layer    = 40
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 2 (mostly Q4_0)
llama_model_load_internal: n_ff       = 13824
llama_model_load_internal: model size = 13B
llama_model_load_internal: ggml ctx size =    0.09 MB
llama_model_load_internal: mem required  = 9031.71 MB (+ 1608.00 MB per state)


LLM metadata: context_window=4096 num_output=256 is_chat_model=False


llama_new_context_with_model: kv self size  = 3200.00 MB
AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | 


In [4]:
# build index
index = VectorStoreIndex.from_documents(documents=documents, service_context=ctx)

In [5]:
from llama_index.indices.postprocessor import (
    SentenceTransformerRerank,
    SentenceEmbeddingOptimizer,
)

rerank = SentenceTransformerRerank(
    model="cross-encoder/ms-marco-MiniLM-L-2-v2", top_n=2
)
optimizer = SentenceEmbeddingOptimizer(
    embed_model=ctx.embed_model, percentile_cutoff=0.1
)

In [6]:
query_engine = index.as_query_engine(
    similarity_top_k=10, node_postprocessors=[rerank, optimizer], service_context=ctx
)

In [12]:
response = query_engine.query(
    "Why did the author think his choice of art schools would be good?",
)

Llama.generate: prefix-match hit

llama_print_timings:        load time =  1742.92 ms
llama_print_timings:      sample time =    67.96 ms /   113 runs   (    0.60 ms per token,  1662.82 tokens per second)
llama_print_timings: prompt eval time = 29042.03 ms /   246 tokens (  118.06 ms per token,     8.47 tokens per second)
llama_print_timings:        eval time = 26013.64 ms /   112 runs   (  232.26 ms per token,     4.31 tokens per second)
llama_print_timings:       total time = 55457.03 ms


In [13]:
print(response)

This is an open-ended question. Here's my answer.
The author thought Accademia di Belle Arti in Florence would be a good choice because it was the oldest art school. He likely presumed that such a long-established institution would provide a well-rounded education in the arts, with a focus on traditional techniques and methods. Additionally, the fact that it was one of the oldest art schools might have given him an impression of its reputation and prestige, which could have been a factor in his decision to apply.


In [14]:
print(response.get_formatted_sources(length=-1))

> Source (Doc id: 474ecef1-1495-4f17-a597-9b452ef8634b): The students and faculty in the painting department at the Accademia were the nicest people you could imagine, but they had long since arrived at an arrangement whereby the students wouldn't require the faculty to teach anything, and in return the faculty wouldn't require the students to learn anything. I applied to two: RISD in the US, and the Accademia di Belli Arti in Florence, which, because it was the oldest art school, I imagined would be good. Meanwhile I was applying to art scho...

> Source (Doc id: 15161c96-d973-4d34-8bbc-f62e4761d186): Idelle Weber was a painter, one of the early photorealists, and I'd taken her painting class at Harvard. Art galleries didn't want to be online, and still don't, not the fancy ones. She liked to paint on big, square canvases, 4 to 5 feet on a side. (The painting on the cover of this book, ANSI Common Lisp, is one that I painted around this ti...
