### Flash ReRank

Models are better at using relevant information that occurs at the very begging (primacy bias) or end of it's input context (recency bias), and performance degrades significantly when models have to use information located in the middle of the input context.

- Cross Encoders
- Zero shot rerankers

In [2]:
%pip install -qqU flashrank

Note: you may need to restart the kernel to use updated packages.


In [3]:
from flashrank import Ranker

ranker = Ranker()

Downloading ms-marco-TinyBERT-L-2-v2...


ms-marco-TinyBERT-L-2-v2.zip: 100%|██████████| 3.26M/3.26M [00:00<00:00, 13.5MiB/s]


#### Small Ranker (~34MB), slightly slower & best performance (ranking precision)

In [4]:
ranker_small = Ranker(model_name="ms-marco-MiniLM-L-12-v2", cache_dir="../.cache/")

Cache directory ../.cache not found. Creating it..
Downloading ms-marco-MiniLM-L-12-v2...


ms-marco-MiniLM-L-12-v2.zip: 100%|██████████| 21.6M/21.6M [00:00<00:00, 27.1MiB/s]


#### Medium (~110MB), slower model with best zeroshot performance (ranking precision) 
on out of domain data.

In [5]:
ranker_medium_t5 = Ranker(model_name="rank-T5-flan", cache_dir="../.cache/")

Downloading rank-T5-flan...


rank-T5-flan.zip: 100%|██████████| 73.7M/73.7M [00:01<00:00, 41.2MiB/s]


# Medium (~150MB), slower model with competitive performance (ranking precision) 
for 100+ languages  (don't use for english)

In [6]:
ranker_medium_int = Ranker(model_name="ms-marco-MultiBERT-L-12", cache_dir="../.cache/")

Downloading ms-marco-MultiBERT-L-12...


ms-marco-MultiBERT-L-12.zip: 100%|██████████| 98.7M/98.7M [00:03<00:00, 30.1MiB/s]


- Metadata is optimal. 
- IDs come from retrieval DB or simple numeric indices

In [7]:
query = "How to speedup LLMs?"

In [8]:
passages = [
    {
        "id": 1,
        "text": "Introduce *lookahead decoding*: - a parallel decoding algo to accelerate LLM inference - w/o the need for a draft model or a data store - linearly decreases # decoding steps relative to log(FLOPs) used per decoding step.",
        "meta": {"additional": "info1"},
    },
    {
        "id": 2,
        "text": "LLM inference efficiency will be one of the most crucial topics for both industry and academia, simply because the more efficient you are, the more $$$ you will save. vllm project is a must-read for this direction, and now they have just released the paper",
        "meta": {"additional": "info2"},
    },
    {
        "id": 3,
        "text": "There are many ways to increase LLM inference throughput (tokens/second) and decrease memory footprint, sometimes at the same time. Here are a few methods I’ve found effective when working with Llama 2. These methods are all well-integrated with Hugging Face. This list is far from exhaustive; some of these techniques can be used in combination with each other and there are plenty of others to try. - Bettertransformer (Optimum Library): Simply call `model.to_bettertransformer()` on your Hugging Face model for a modest improvement in tokens per second. - Fp4 Mixed-Precision (Bitsandbytes): Requires minimal configuration and dramatically reduces the model's memory footprint. - AutoGPTQ: Time-consuming but leads to a much smaller model and faster inference. The quantization is a one-time cost that pays off in the long run.",
        "meta": {"additional": "info3"},
    },
    {
        "id": 4,
        "text": "Ever want to make your LLM inference go brrrrr but got stuck at implementing speculative decoding and finding the suitable draft model? No more pain! Thrilled to unveil Medusa, a simple framework that removes the annoying draft model while getting 2x speedup.",
        "meta": {"additional": "info4"},
    },
    {
        "id": 5,
        "text": "vLLM is a fast and easy-to-use library for LLM inference and serving. vLLM is fast with: State-of-the-art serving throughput Efficient management of attention key and value memory with PagedAttention Continuous batching of incoming requests Optimized CUDA kernels",
        "meta": {"additional": "info5"},
    },
]

In [9]:
from flashrank import RerankRequest

rerank_request = RerankRequest(query=query, passages=passages)
results = ranker_small.rerank(rerank_request)
results

[{'id': 4,
  'text': 'Ever want to make your LLM inference go brrrrr but got stuck at implementing speculative decoding and finding the suitable draft model? No more pain! Thrilled to unveil Medusa, a simple framework that removes the annoying draft model while getting 2x speedup.',
  'meta': {'additional': 'info4'},
  'score': 0.9439003},
 {'id': 3,
  'text': "There are many ways to increase LLM inference throughput (tokens/second) and decrease memory footprint, sometimes at the same time. Here are a few methods I’ve found effective when working with Llama 2. These methods are all well-integrated with Hugging Face. This list is far from exhaustive; some of these techniques can be used in combination with each other and there are plenty of others to try. - Bettertransformer (Optimum Library): Simply call `model.to_bettertransformer()` on your Hugging Face model for a modest improvement in tokens per second. - Fp4 Mixed-Precision (Bitsandbytes): Requires minimal configuration and dramat

#### Sample #2 - What is the US Capital?

In [10]:
query = "What is the capital of the United States?"

In [15]:
passages = [
    {
        "id": 1,
        "text": "Carson City is the capital city of the American state of Nevada. At the  2010 United States Census, Carson City had a population of 55,274.",
        "meta": {},
    },
    {
        "id": 2,
        "text": "The Commonwealth of the Northern Mariana Islands is a group of islands in the Pacific Ocean that are a political division controlled by the United States. Its capital is Saipan.",
        "meta": {},
    },
    {
        "id": 3,
        "text": "Charlotte Amalie is the capital and largest city of the United States Virgin Islands. It has about 20,000 people. The city is on the island of Saint Thomas.",
        "meta": {},
    },
    {
        "id": 4,
        "text": "Washington, D.C. (also known as simply Washington or D.C., and officially as the District of Columbia) is the capital of the United States. It is a federal district. ",
        "meta": {},
    },
    {
        "id": 5,
        "text": "The city of Washington, D.C. is not only known for its significant role in the United States' government but also as the capital city. It houses the President's residence, the White House, and serves as the hub for all three branches of the federal government.",
        "meta": {},
    },
    {
        "id": 6,
        "text": "The economic capital of the United States is often considered to be New York City, due to its status as the home of the New York Stock Exchange and being a major hub for financial, cultural, and business activities.",
        "meta": {},
    },
    {
        "id": 7,
        "text": "In the heart of the nation, Washington, D.C. stands out as the capital, where the U.S. Capitol Building, a symbol of the country's democracy, is located. This city is central to American politics and history, hosting numerous national landmarks.",
        "meta": {},
    },
    {
        "id": 8,
        "text": "Capital punishment (the death penalty) has existed in the United States since before the United States was a country. As of 2017, capital punishment is legal in 30 of the 50 states.",
        "meta": {},
    },
    {
        "id": 9,
        "text": "North Dakota is a state in the United States. 672,591 people lived in North Dakota in the year 2010. The capital and seat of government is Bismarck.",
        "meta": {},
    },
    {
        "id": 10,
        "text": "The debate over the capital punishment in the United States often takes center stage in Washington, D.C., where policymakers and activists gather to discuss its implications on justice and human rights.",
        "meta": {},
    },
    {
        "id": 11,
        "text": "Capital cities, including Washington, D.C., and state capitals like Olympia in Washington State, highlight the diversity of the United States. Each capital, with its unique history and culture, contributes to the rich tapestry that defines the American experience, from government operations to local heritage.",
        "meta": {},
    },
    {
        "id": 12,
        "text": "In a fictional twist, Washington, D.C., has earned the notorious title of U.S. capital of crime due to an alarming surge in various fictional criminal activities. This designation, emerging from a web of exaggerated tales and urban legends, paints the city as the epicenter of an unprecedented crime wave. From sophisticated heists and digital crimes to a rise in mysterious disappearances, the narrative transforms the city into a landscape where safety is an illusion and lawlessness reigns supreme. In this alternate version of reality, Washington, D.C., stands not as a symbol of national pride, but as the heart of crime in the United States, a stark contrast to its historical and cultural significance.",
        "meta": {},
    },
    {
        "id": 13,
        "text": "Amid the bustling streets and historic monuments of the East Coast's pride, the United States Café stands as a beacon of culinary excellence in the Capital District. This establishment, renowned for its fusion of flavors from across the nation, epitomizes the melting pot of cultures that define the American essence. Just a short walk from prominent landmarks and green spaces, the United States Café serves as a culinary capital in its own right, attracting a diverse clientele eager to partake in its unique dining experience. Celebrating the spirit of unity and innovation, it mirrors the vibrant, ever-evolving landscape of the nation, making it a must-visit for anyone exploring the heart of the city.",
        "meta": {},
    },
    {
        "id": 14,
        "text": "In an unparalleled fusion of history and modernity, the Capital Exhibition Center, located in the heart of Washington, D.C., stands as a monumental showcase of the United States' rich heritage and technological prowess. This state-of-the-art facility, situated mere blocks from the iconic National Mall, offers an immersive journey through America's pivotal moments, from the Founding Fathers' revolutionary vision to today's innovations driving the nation forward. With its vast collection of artifacts and cutting-edge interactive displays, the center illuminates the essence of the U.S. capital, celebrating its unique position at the crossroads of past achievements and future aspirations. Visitors to the Capital Exhibition Center are invited to explore a vivid tapestry of American life, encapsulated within the bustling dynamism of Washington, D.C.",
        "meta": {},
    },
    {
        "id": 15,
        "text": "Discover the Capital Gateway Park, a lush oasis at the doorstep of Washington, D.C., where the natural beauty of the United States unfolds in a tapestry of greenery and waterways. This park, while not within the city's immediate boundaries, serves as a serene counterpoint to the bustling capital, offering visitors a unique vantage point from which to reflect on the nation's history and future. With paths that wind past historical markers and art installations celebrating American innovation and spirit, the park stands as a testament to the country's resilience and diversity. The Gateway Park is more than just a greenspace; it is a living museum, a bridge between the urban expanse of the capital and the vast, wild heart of the United States.",
        "meta": {},
    },
    {
        "id": 16,
        "text": "The capital gains tax is a tax on the profit realized from the sale of a non-inventory asset in the United States. This refers to assets like stocks, bonds, and real estate.",
        "meta": {},
    },
    {
        "id": 17,
        "text": "Capital One is a bank holding company headquartered in Virginia. It specializes in credit cards, auto loans, banking, and savings products in the United States.",
        "meta": {},
    },
    {
        "id": 18,
        "text": "Washington, D.C. has long struggled with high rates of violent crime, particularly in certain neighborhoods. Despite being the nation's capital, the city has grappled with issues like gang activity and drug trafficking.",
        "meta": {},
    },
]

In [16]:
rerank_request = RerankRequest(query=query, passages=passages)
results = ranker_medium_t5.rerank(rerank_request)
print("=" * 10)
print("What???? HORRIBLE")
print("Medium T5 fails")
results

What???? HORRIBLE
Medium T5 fails


[{'id': 2,
  'text': 'The Commonwealth of the Northern Mariana Islands is a group of islands in the Pacific Ocean that are a political division controlled by the United States. Its capital is Saipan.',
  'meta': {},
  'score': 0.59897494},
 {'id': 14,
  'text': "In an unparalleled fusion of history and modernity, the Capital Exhibition Center, located in the heart of Washington, D.C., stands as a monumental showcase of the United States' rich heritage and technological prowess. This state-of-the-art facility, situated mere blocks from the iconic National Mall, offers an immersive journey through America's pivotal moments, from the Founding Fathers' revolutionary vision to today's innovations driving the nation forward. With its vast collection of artifacts and cutting-edge interactive displays, the center illuminates the essence of the U.S. capital, celebrating its unique position at the crossroads of past achievements and future aspirations. Visitors to the Capital Exhibition Center a

In [17]:
rerank_request = RerankRequest(query=query, passages=passages)
results = ranker_small.rerank(rerank_request)
print("=" * 10)
print("Hummmmm ... The NY passage ranks TOP ... Awful")
print("At least capital punishment is ranked last")
print("Capital punishment ranks #2?")
results

Hummmmm ... The NY passage ranks TOP ... Awful
At least capital punishment is ranked last
Capital punishment ranks #2?


[{'id': 4,
  'text': 'Washington, D.C. (also known as simply Washington or D.C., and officially as the District of Columbia) is the capital of the United States. It is a federal district. ',
  'meta': {},
  'score': 0.99955183},
 {'id': 6,
  'text': 'The economic capital of the United States is often considered to be New York City, due to its status as the home of the New York Stock Exchange and being a major hub for financial, cultural, and business activities.',
  'meta': {},
  'score': 0.99947685},
 {'id': 7,
  'text': "In the heart of the nation, Washington, D.C. stands out as the capital, where the U.S. Capitol Building, a symbol of the country's democracy, is located. This city is central to American politics and history, hosting numerous national landmarks.",
  'meta': {},
  'score': 0.9970722},
 {'id': 11,
  'text': 'Capital cities, including Washington, D.C., and state capitals like Olympia in Washington State, highlight the diversity of the United States. Each capital, with i

In [14]:
rerank_request = RerankRequest(query=query, passages=passages)
results = ranker_medium_int.rerank(rerank_request)
print("=" * 10)
print("Hummmmm ... Wasn't expecting much and the results are bad.")
results

Hummmmm ... Wasn't expecting much and the results are bad.


[{'id': 16,
  'text': 'The capital gains tax is a tax on the profit realized from the sale of a non-inventory asset in the United States. This refers to assets like stocks, bonds, and real estate.',
  'meta': {},
  'score': 0.970625},
 {'id': 6,
  'text': 'The economic capital of the United States is often considered to be New York City, due to its status as the home of the New York Stock Exchange and being a major hub for financial, cultural, and business activities.',
  'meta': {},
  'score': 0.96838903},
 {'id': 9,
  'text': 'North Dakota is a state in the United States. 672,591 people lived in North Dakota in the year 2010. The capital and seat of government is Bismarck.',
  'meta': {},
  'score': 0.9654177},
 {'id': 1,
  'text': 'Carson City is the capital city of the American state of Nevada. At the  2010 United States Census, Carson City had a population of 55,274.',
  'meta': {},
  'score': 0.963504},
 {'id': 11,
  'text': 'Capital cities, including Washington, D.C., and state 