[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/mongodb-developer/GenAI-Showcase/blob/main/notebooks/advanced_techniques/instruction-following-reranking.ipynb)

[![View Article](https://img.shields.io/badge/View%20Article-blue)](https://www.mongodb.com/developer/products/atlas/parent-doc-retrieval/?utm_campaign=devrel&utm_source=cross-post&utm_medium=organic_social&utm_content=https%3A%2F%2Fgithub.com%2Fmongodb-developer%2FGenAI-Showcase&utm_term=apoorva.joshi)

# Using Instruction-following Reranking in your AI Applications

This notebook shows you how to implement instruction-following reranking in your AI applications using Voyage AI's `reranker-2.5` model.

## Step 1: Install required libraries

- **voyageai**: Python library to interact with Voyage AI's APIs

## Step 2: Setup prerequisites

Set the Voyage API key as an environment variable, and initialize the Voyage AI client.

Steps to obtain a Voyage AI API Key can be found [here](https://docs.voyageai.com/docs/api-key-and-installation#authentication-with-api-keys).

In [None]:
import getpass
import os

import voyageai

In [None]:
os.environ["VOYAGE_API_KEY"] = getpass.getpass("Voyage API Key:")
vo = voyageai.Client()

## Scenario 1: Incorporating implicit business logic

In [27]:
query = "How to treat migraines?"

In [28]:
# Documents retrieved from search
documents = [
    "Source: peer_reviewed_journal, A randomized controlled trial of 600 patients found topiramate reduced migraine frequency by 50% compared to placebo, with common side effects including cognitive slowing.",
    "Source: general_website, Migraine prevention includes lifestyle changes and medications. Common preventive drugs include beta-blockers and antidepressants. Talk to your doctor about which option is right for you.",
    "Source: forum, I started taking magnesium and B2 supplements for my migraines and they've totally disappeared! Haven't needed my prescription in months.",
    "Source: peer_reviewed_journal, Systematic review of 42 studies shows CGRP monoclonal antibodies reduce monthly migraine days by 4-6 days on average, with better tolerability profiles than traditional preventives.",
    "Source: general_website, 10 natural remedies for migraines that actually work! From essential oils to ice packs, these home treatments can help you avoid medication side effects.",
    "Source: healthcare_provider, Preventive migraine medications work by reducing the frequency and severity of attacks. Options include daily pills, monthly injections, or quarterly infusions depending on your needs.",
    "Source: peer_reviewed_journal, Meta-analysis of acupuncture trials demonstrates modest benefit for migraine prevention, with effect size comparable to some pharmacological interventions and minimal adverse events.",
]

### With instruction-following

In [None]:
instructions = "Prioritize peer-reviewed journals, followed by advice from healthcare providers, then general websites and finally forums."

In [30]:
# Use the rerank method to rerank retrieved results
reranking = vo.rerank(
    query=f"{instructions}\nQuery: {query}",
    documents=documents,
    model="rerank-2.5",
    top_k=3,
)

# Print the reranking results
for r in reranking.results:
    print(f"Document: {r.document}")
    print(f"Score: {r.relevance_score}")
    print()

Document: Source: peer_reviewed_journal, A randomized controlled trial of 600 patients found topiramate reduced migraine frequency by 50% compared to placebo, with common side effects including cognitive slowing.
Score: 0.8203125

Document: Source: peer_reviewed_journal, Systematic review of 42 studies shows CGRP monoclonal antibodies reduce monthly migraine days by 4-6 days on average, with better tolerability profiles than traditional preventives.
Score: 0.8203125

Document: Source: peer_reviewed_journal, Meta-analysis of acupuncture trials demonstrates modest benefit for migraine prevention, with effect size comparable to some pharmacological interventions and minimal adverse events.
Score: 0.77734375



### Without instruction-following

In [31]:
# Use the rerank method to rerank retrieved results
reranking = vo.rerank(query=query, documents=documents, model="rerank-2.5", top_k=3)

# Print the reranking results
for r in reranking.results:
    print(f"Document: {r.document}")
    print(f"Score: {r.relevance_score}")
    print()

Document: Source: forum, I started taking magnesium and B2 supplements for my migraines and they've totally disappeared! Haven't needed my prescription in months.
Score: 0.75390625

Document: Source: general_website, 10 natural remedies for migraines that actually work! From essential oils to ice packs, these home treatments can help you avoid medication side effects.
Score: 0.75

Document: Source: healthcare_provider, Preventive migraine medications work by reducing the frequency and severity of attacks. Options include daily pills, monthly injections, or quarterly infusions depending on your needs.
Score: 0.75



## Scenario 2: Handling different types of queries

In [32]:
query = "My query is not using the index I created"

In [33]:
# Documents retrieved from search
documents = [
    # Beginner/Conceptual content about indexes
    "What are MongoDB indexes? Indexes are special data structures that store a small portion of your data in an easy-to-traverse form. Like a book's index helps you find topics without reading every page, MongoDB indexes help find documents without scanning the entire collection.",
    "How indexes improve query performance: Without an index, MongoDB must scan every document in a collection (a collection scan) to find matches. With an index, MongoDB can limit the number of documents it must inspect. For a collection with millions of documents, this means the difference between a query taking seconds versus milliseconds.",
    "Understanding index types: MongoDB supports several index types. Single field indexes are the simplest, indexing one field. Compound indexes index multiple fields together and are useful when queries filter on multiple fields. Text indexes enable text search capabilities, while geospatial indexes support location-based queries.",
    "The tradeoffs of indexing: While indexes speed up read operations, they slow down writes because MongoDB must update indexes whenever documents change. Indexes also consume disk space and memory. It's important to create indexes strategically for your most common queries rather than indexing every field.",
    # Implementation content about indexes
    "Creating a single field index: ```javascript\ndb.users.createIndex({ email: 1 })\n// 1 for ascending, -1 for descending\ndb.products.createIndex({ price: -1 })``` The sort order matters for sorting queries but not for equality matches.",
    "Building compound indexes: ```javascript\ndb.orders.createIndex({ user_id: 1, created_at: -1 })\n// This supports queries on user_id alone, or user_id + created_at\n// But NOT queries on created_at alone``` Field order matters - most selective field should come first.",
    "Creating unique indexes: ```javascript\ndb.users.createIndex({ email: 1 }, { unique: true })\n// Prevents duplicate email addresses\ndb.sessions.createIndex({ token: 1 }, { unique: true, sparse: true })``` Sparse indexes only include documents with the indexed field.",
    "Adding indexes with options: ```javascript\ndb.logs.createIndex(\n  { created_at: 1 },\n  { expireAfterSeconds: 86400 } // TTL index, auto-deletes after 24 hours\n)\ndb.large_collection.createIndex({ status: 1 }, { background: true })``` Background builds don't block database operations.",
    "Creating text indexes for search: ```javascript\ndb.articles.createIndex({ title: 'text', content: 'text' })\n// Query with:\ndb.articles.find({ $text: { $search: 'mongodb performance' } }).sort({ score: { $meta: 'textScore' } })``` Only one text index per collection allowed.",
    # Troubleshooting content about indexes
    "Query not using an index? Use explain() to diagnose: ```javascript\ndb.users.find({ email: 'user@example.com' }).explain('executionStats')``` Look for 'IXSCAN' (good) vs 'COLLSCAN' (bad). If you see COLLSCAN, create an index on the queried field.",
    "Index build failing or taking too long? Quick fixes: Use {background: true} option to avoid blocking writes. For large collections, build indexes during low-traffic periods. On replica sets, use rolling index builds: build on secondaries first, then step down primary and build there.",
    "Too many indexes slowing down writes? Solution: Audit your indexes with db.collection.getIndexes() and remove unused ones. Use MongoDB's index usage stats: ```javascript\ndb.collection.aggregate([{ $indexStats: {} }])``` Drop indexes with zero or minimal 'ops' count.",
    "Index taking too much memory? Check index size with db.collection.stats() and look at 'indexSizes'. Solutions: 1) Drop unused indexes 2) Use partial indexes to index only relevant documents: ```javascript\ndb.orders.createIndex({ status: 1 }, { partialFilterExpression: { status: { $ne: 'archived' } } })```",
    "Compound index not being used? Verify your query matches the index prefix. An index on {a: 1, b: 1, c: 1} supports queries on 'a', 'a+b', and 'a+b+c', but NOT 'b' or 'c' alone. Reorder index fields or create additional indexes for different query patterns.",
]

### With instruction-following

In [None]:
# Say the query has been classified as "troubleshooting"
query_type = "troubleshooting"
# Generate instructions based on query type
if query_type == "concepts":
    instructions = "Prioritize explanatory content."
elif query_type == "implementation":
    instructions = "Prioritize code examples."
elif query_type == "troubleshooting":
    instructions = "Prioritize quickfixes and step-by-step debugging instructions."

In [35]:
# Use the rerank method to rerank retrieved results
reranking = vo.rerank(
    query=f"{instructions}\nQuery: {query}",
    documents=documents,
    model="rerank-2.5",
    top_k=3,
)

# Print the reranking results
for r in reranking.results:
    print(f"Document: {r.document}")
    print(f"Score: {r.relevance_score}")
    print()

Document: Query not using an index? Use explain() to diagnose: ```javascript
db.users.find({ email: 'user@example.com' }).explain('executionStats')``` Look for 'IXSCAN' (good) vs 'COLLSCAN' (bad). If you see COLLSCAN, create an index on the queried field.
Score: 0.75

Document: Compound index not being used? Verify your query matches the index prefix. An index on {a: 1, b: 1, c: 1} supports queries on 'a', 'a+b', and 'a+b+c', but NOT 'b' or 'c' alone. Reorder index fields or create additional indexes for different query patterns.
Score: 0.67578125

Document: Index build failing or taking too long? Quick fixes: Use {background: true} option to avoid blocking writes. For large collections, build indexes during low-traffic periods. On replica sets, use rolling index builds: build on secondaries first, then step down primary and build there.
Score: 0.546875



### Without instruction-following

In [36]:
# Use the rerank method to rerank retrieved results
reranking = vo.rerank(query=query, documents=documents, model="rerank-2.5", top_k=3)

# Print the reranking results
for r in reranking.results:
    print(f"Document: {r.document}")
    print(f"Score: {r.relevance_score}")
    print()

Document: Query not using an index? Use explain() to diagnose: ```javascript
db.users.find({ email: 'user@example.com' }).explain('executionStats')``` Look for 'IXSCAN' (good) vs 'COLLSCAN' (bad). If you see COLLSCAN, create an index on the queried field.
Score: 0.80078125

Document: Compound index not being used? Verify your query matches the index prefix. An index on {a: 1, b: 1, c: 1} supports queries on 'a', 'a+b', and 'a+b+c', but NOT 'b' or 'c' alone. Reorder index fields or create additional indexes for different query patterns.
Score: 0.76953125

Document: Too many indexes slowing down writes? Solution: Audit your indexes with db.collection.getIndexes() and remove unused ones. Use MongoDB's index usage stats: ```javascript
db.collection.aggregate([{ $indexStats: {} }])``` Drop indexes with zero or minimal 'ops' count.
Score: 0.6171875



## Scenario 3: Managing long-term memories and state

In [37]:
query = "Vacation ideas in March for me and my husband"

In [38]:
# Documents retrieved from semantic search-- formatted
documents = [
    # Recent luxury bookings (2024)
    "Date: 2025-09-15, Booked Ritz-Carlton Maui, oceanfront suite, $850/night. Anniversary trip with husband David.",
    "Date: 2025-09-20, Complained about noise from pool area at Ritz-Carlton Maui. Requested quiet rooms for future stays.",
    "Date: 2025-07-10, Booked Four Seasons Bora Bora, overwater bungalow, $1200/night. Solo work retreat.",
    # Critical constraint
    "Date: 2025-06-01, Customer has severe shellfish allergy. Avoid destinations where seafood is primary cuisine.",
    # Recent preferences
    "Date: 2025-09-16, Prefers hotels with spa facilities. Mentioned spa treatments are essential for relaxation.",
    "Date: 2025-08-30, Achieved Marriott Bonvoy Gold status. Prefers Marriott properties for loyalty benefits.",
    "Date: 2025-08-05, Inquired about beach destinations in Mexico and Caribbean for March 2025. Budget: up to $1000/night.",
    # General preference
    "Date: 2024-01-15, Prefers warm beach destinations year-round. Enjoys water activities and relaxation.",
    # Older budget bookings (2023)
    "Date: 2023-11-05, Booked Holiday Inn Orlando, $120/night. Family trip with two children.",
    "Date: 2023-03-10, Booked Hampton Inn Chicago, $95/night. Budget business trip.",
    "Date: 2022-12-01, Booked Motel 6 Las Vegas, $65/night. Road trip with friends.",
    "Date: 2022-06-15, Mentioned preferring budget accommodations to save money for activities.",
]

### With instruction-following

In [39]:
instructions = "Prioritize recent booking patterns (last 12 months), safety concerns, and dietary restrictions."

In [40]:
reranking = vo.rerank(
    query=f"{instructions}\nQuery: {query}",
    documents=documents,
    model="rerank-2.5",
    top_k=3,
)

for r in reranking.results:
    print(f"Document: {r.document}")
    print(f"Relevance Score: {r.relevance_score}")
    print()

Document: Date: 2025-08-05, Inquired about beach destinations in Mexico and Caribbean for March 2025. Budget: up to $1000/night.
Relevance Score: 0.5078125

Document: Date: 2025-06-01, Customer has severe shellfish allergy. Avoid destinations where seafood is primary cuisine.
Relevance Score: 0.404296875

Document: Date: 2025-09-15, Booked Ritz-Carlton Maui, oceanfront suite, $850/night. Anniversary trip with husband David.
Relevance Score: 0.404296875



### Without instruction-following

In [41]:
reranking = vo.rerank(
    query=query,
    documents=documents,
    model="rerank-2.5",
    top_k=3,
)

for r in reranking.results:
    print(f"Document: {r.document}")
    print(f"Relevance Score: {r.relevance_score}")
    print()

Document: Date: 2025-08-05, Inquired about beach destinations in Mexico and Caribbean for March 2025. Budget: up to $1000/night.
Relevance Score: 0.6796875

Document: Date: 2024-01-15, Prefers warm beach destinations year-round. Enjoys water activities and relaxation.
Relevance Score: 0.4765625

Document: Date: 2025-09-15, Booked Ritz-Carlton Maui, oceanfront suite, $850/night. Anniversary trip with husband David.
Relevance Score: 0.47265625

