One usage of postprocessors is to be able to develop with local LLM models, which are very slow on large context lengths.

Here we see an example of using a reranker and an optimizer which reduces the node length.

In [1]:
from llama_index import VectorStoreIndex, SimpleDirectoryReader

In [2]:
# load documents
documents = SimpleDirectoryReader("../data/paul_graham").load_data()

We set the service context to use the local:cpu llm predictor and set it to be the global default

In [3]:
from llama_index import ServiceContext, set_global_service_context

ctx = ServiceContext.from_defaults(embed_model="local", llm_predictor="local")
set_global_service_context(ctx)

  from .autonotebook import tqdm as notebook_tqdm
llama.cpp: loading model from /home/jonch/.cache/llama_index/models/llama-2-13b/ggml-model-q4_0.bin
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 4096
llama_model_load_internal: n_embd     = 5120
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 40
llama_model_load_internal: n_layer    = 40
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: freq_base  = 10000.0
llama_model_load_internal: freq_scale = 1
llama_model_load_internal: ftype      = 2 (mostly Q4_0)
llama_model_load_internal: n_ff       = 13824
llama_model_load_internal: model size = 13B
llama_model_load_internal: ggml ctx size =    0.09 MB
llama_model_load_internal: mem required  = 9132.71 MB (+ 1608.00 MB per state)


LLM metadata: context_window=4096 num_output=256 is_chat_model=False


llama_new_context_with_model: kv self size  = 3200.00 MB
AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | 


In [4]:
# build index
index = VectorStoreIndex.from_documents(documents=documents)

In [5]:
from llama_index.indices.postprocessor import (
    SentenceTransformerRerank,
    SentenceEmbeddingOptimizer,
)

rerank = SentenceTransformerRerank(
    model="cross-encoder/ms-marco-MiniLM-L-2-v2", top_n=2
)
optimizer = SentenceEmbeddingOptimizer(
    embed_model=ctx.embed_model, percentile_cutoff=0.1
)

In [6]:
query_engine = index.as_query_engine(
    similarity_top_k=10, node_postprocessors=[rerank, optimizer]
)

In [7]:
response = query_engine.query(
    "Why did the author think his choice of art schools would be good?",
)


llama_print_timings:        load time =  1375.62 ms
llama_print_timings:      sample time =    91.30 ms /   127 runs   (    0.72 ms per token,  1391.00 tokens per second)
llama_print_timings: prompt eval time = 47405.29 ms /   256 tokens (  185.18 ms per token,     5.40 tokens per second)
llama_print_timings:        eval time = 30955.60 ms /   126 runs   (  245.68 ms per token,     4.07 tokens per second)
llama_print_timings:       total time = 78855.56 ms


In [8]:
print(response)

In my understanding, the author thought that attending Accademia di Belli Arti in Florence would be a good choice because:
1. It is the oldest art school, which suggests a long history of teaching and producing successful artists.
2. The faculty and students had a "nicest people" reputation, which could provide a supportive and nurturing environment for the author's artistic growth.
3. The arrangement between the faculty and students, where no teaching or learning was required, might allow the author to focus solely on creating art without feeling pressured by academic expectations.


In [9]:
print(response.get_formatted_sources(length=-1))

> Source (Doc id: 937751f7-721f-423e-8c24-c645c09370db): The students and faculty in the painting department at the Accademia were the nicest people you could imagine, but they had long since arrived at an arrangement whereby the students wouldn't require the faculty to teach anything, and in return the faculty wouldn't require the students to learn anything. I applied to two: RISD in the US, and the Accademia di Belli Arti in Florence, which, because it was the oldest art school, I imagined would be good. Meanwhile I was applying to art scho...

> Source (Doc id: d192ceb7-7881-4ced-abb5-f9276150988a): Art galleries didn't want to be online, and still don't, not the fancy ones. Idelle Weber was a painter, one of the early photorealists, and I'd taken her painting class at Harvard. (The painting on the cover of this book, ANSI Common Lisp, is one that I painted around this time.) She liked to paint on big, square canvases, 4 to 5 feet on a s...
