One usage of postprocessors is to be able to develop with local LLM models, which are very slow on large context lengths.

Here we see an example of using a reranker and an optimizer which reduces the node length.

In [1]:
from llama_index import VectorStoreIndex, SimpleDirectoryReader

In [2]:
# load documents
documents = SimpleDirectoryReader("../data/paul_graham").load_data()

In [3]:
from llama_index import ServiceContext

ctx = ServiceContext.from_defaults(embed_model="local", llm_predictor="local:cpu")

  from .autonotebook import tqdm as notebook_tqdm
llama.cpp: loading model from /home/jonch/.cache/llama_index/models/llama-cpp-vicuna13b/ggml-model-q4_0.bin
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 32001
llama_model_load_internal: n_ctx      = 2048
llama_model_load_internal: n_embd     = 5120
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 40
llama_model_load_internal: n_layer    = 40
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 2 (mostly Q4_0)
llama_model_load_internal: n_ff       = 13824
llama_model_load_internal: model size = 13B
llama_model_load_internal: ggml ctx size =    0.09 MB
llama_model_load_internal: mem required  = 9031.71 MB (+ 1608.00 MB per state)
AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | 
llama_new_context

In [4]:
# build index
index = VectorStoreIndex.from_documents(documents=documents, service_context=ctx)

In [8]:
from llama_index.indices.postprocessor import (
    SentenceTransformerRerank,
    SentenceEmbeddingOptimizer,
)

rerank = SentenceTransformerRerank(
    model="cross-encoder/ms-marco-MiniLM-L-2-v2", top_n=2
)
optimizer = SentenceEmbeddingOptimizer(
    embed_model=ctx.embed_model, percentile_cutoff=0.1
)

In [9]:
query_engine = index.as_query_engine(
    similarity_top_k=10, node_postprocessors=[rerank, optimizer], service_context=ctx
)

In [17]:
response = query_engine.query(
    "Why did the author apply to art schools?",
)

Llama.generate: prefix-match hit

llama_print_timings:        load time =  1319.17 ms
llama_print_timings:      sample time =   151.55 ms /   170 runs   (    0.89 ms per token,  1121.74 tokens per second)
llama_print_timings: prompt eval time = 41731.33 ms /   244 tokens (  171.03 ms per token,     5.85 tokens per second)
llama_print_timings:        eval time = 59635.79 ms /   169 runs   (  352.87 ms per token,     2.83 tokens per second)
llama_print_timings:       total time = 102160.31 ms


In [18]:
print(response)

### Assistant: The author applied to art schools because he wanted to improve his painting skills and learn new techniques from experienced faculty members. He had already studied at Harvard University and had learned some basic painting skills, but he wanted more guidance and instruction from professional artists. Therefore, he applied to the Rhode Island School of Design (RISD) in the United States and the Accademia di Belle Arti in Florence, Italy, which were both renowned for their art programs. However, the author soon realized that the faculty members at the Accademia were not particularly interested in teaching, and the students were more focused on their own work than on learning from the professors. Nevertheless, the author was able to learn a lot from Idelle Weber, an experienced painter who taught at Harvard University.



In [19]:
print(response.get_formatted_sources(length=-1))

> Source (Doc id: 4591e898-70e7-4c97-9838-d5e47fa02f67): Meanwhile I was applying to art schools. I applied to two: RISD in the US, and the Accademia di Belli Arti in Florence, which, because it was the oldest art school, I imagined would be good. The students and faculty in the painting department at the Accademia were the nicest people you could imagine, but they had long since arrived at an arrangement whereby the students wouldn't require the faculty to teach anything, and in return the faculty wouldn't require the students to learn anyth...

> Source (Doc id: db91c03c-689e-4f12-bc18-e146887342bc): Idelle Weber was a painter, one of the early photorealists, and I'd taken her painting class at Harvard. (The painting on the cover of this book, ANSI Common Lisp, is one that I painted around this time.) Robert Morris showed it to me when I visited him in Cambridge, where he was now in grad school at Harvard. Art galleries didn't want to be online, and still don't, not the fancy o...


As we can see, the model hallucinates with claims not found in the sources, and is overly verbose and cites extraneous information.