# Why You Don't Need Re-Ranking: Superlinked Demo
## Comparing Traditional vs Superlinked Approach

This notebook accompanies the article ["Why You Don’t Need Re-Ranking: Understanding the Superlinked Vector Layer"](https://docs.google.com/document/d/13wjmFAeRcP1Fhj_Tog9qHBavev5rKnAFYbBWRmlOwQo/edit?tab=t.0).
It demonstrates:
 - Traditional re-ranking approach with hybrid scoring
 - Superlinked's unified vector layer approach
 - Performance comparison using a product search use case

# Setup & Installation

In [None]:
# Install required packages
!pip install rerankers
!pip install superlinked




In [None]:
# Imports
import numpy as np
import pandas as pd
from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity
from rerankers import Reranker
from superlinked import framework as sl


# Dataset Overview
We'll use a simple electronics products dataset with:

### **Text descriptions**

*   Numerical fields (price, rating)
*   Categorical field (product category)

In [None]:
# Sample product data
products = [
    {
        "id": "p1",
        "title": "Premium Wireless Headphones",
        "description": "High-end wireless headphones with active noise cancellation (ANC), 30hr battery. Original price $350, now discounted to $199.",
        "price": 199,
        "rating": 4.8,
        "category": "electronics"
    },
    {
        "id": "p2",
        "title": "Budget Noise-Canceling Earbuds",
        "description": "Affordable wireless earbuds with basic noise cancellation. 20hr battery. Ideal for casual use.",
        "price": 89,
        "rating": 4.2,
        "category": "electronics"
    },
    {
        "id": "p3",
        "title": "Studio-Grade ANC Headphones",
        "description": "Professional noise-canceling headphones with Hi-Res audio. Priced at $210.",
        "price": 210,
        "rating": 4.7,
        "category": "electronics"
    }
]


## Traditional Approach with Re-Ranking  
### Implementation Steps:

1. Text embedding with Sentence Transformers  
2. Neural re-ranking with mxbai model  
3. Manual hybrid scoring with metadata  
4. Post-filtering on price


In [None]:
# User query and embedding with SentenceTransformer
query_text = "Find affordable wireless headphones with noise cancellation under $200 and high ratings"

model = SentenceTransformer('sentence-transformers/all-mpnet-base-v2')
embeddings = model.encode([p["description"] for p in products])


In [None]:
# Initialize reranker
ranker = Reranker("mixedbread-ai/mxbai-rerank-large-v1")
reranked = ranker.rank(
    query=query_text,
    docs=[p["description"] for p in products],
    doc_ids=[p["id"] for p in products]
)


Loading TransformerRanker model mixedbread-ai/mxbai-rerank-large-v1 (this message can be suppressed by setting verbose=0)
No device set
Using device cpu
No dtype set
Using dtype torch.float32
Loaded model mixedbread-ai/mxbai-rerank-large-v1
Using device cpu.
Using dtype torch.float32.


In [None]:
# Reranker-based hybrid ranking with hard price filtering
def process_traditional_results(reranked_output, max_price=200):
    results = []
    for doc in reranked_output.top_k(len(products)):
        product = next(p for p in products if p["id"] == doc.doc_id)

        # Hard filter on price
        if product["price"] > max_price:
            continue

        # Combine reranker score with rating
        combined_score = (doc.score * 0.6) + (product["rating"] / 5 * 0.4)

        results.append({
            "title": product["title"],
            "price": product["price"],
            "rating": product["rating"],
            "reranker_score": doc.score,
            "final_score": combined_score
        })

    return pd.DataFrame(results).sort_values("final_score", ascending=False)

# Display traditional results
traditional_df = process_traditional_results(reranked)
print("\nTraditional Approach Results:")
display(traditional_df)



Traditional Approach Results:


Unnamed: 0,title,price,rating,reranker_score,final_score
0,Premium Wireless Headphones,199,4.8,2.262444,1.741466
1,Budget Noise-Canceling Earbuds,89,4.2,1.319153,1.127492


---

## Superlinked Unified Approach

Now let's implement the same product search using Superlinked's unified vector layer.

Unlike the traditional approach:
- We **don’t manually embed** product descriptions.
- We **don’t re-rank** results post-search.
- All relevant features (text, price, rating, category) are encoded into **a unified index**.

Superlinked automatically handles embeddings at ingestion (`source.put(...)`) using the similarity spaces defined in the schema.


In [None]:
# Define product schema for Superlinked
@sl.schema
class Product:
    id: sl.IdField
    title: sl.String
    description: sl.String
    price: sl.Integer
    rating: sl.Float
    category: sl.String

product = Product()

# Define similarity spaces
text_space = sl.TextSimilaritySpace(
    text=product.description,
    model="sentence-transformers/all-mpnet-base-v2"
)

price_space = sl.NumberSpace(
    number=product.price,
    mode=sl.Mode.MINIMUM,
    min_value=0,
    max_value=500
)

rating_space = sl.NumberSpace(
    number=product.rating,
    mode=sl.Mode.MAXIMUM,
    min_value=0,
    max_value=5
)

category_space = sl.CategoricalSimilaritySpace(
    category_input=product.category,
    categories=["electronics", "fashion", "home", "sports", "books"],
    negative_filter=-1.0,
    uncategorized_as_category=False
)


In [None]:
# Create index with filterable fields
product_index = sl.Index(
    [text_space, price_space, rating_space, category_space],
    fields=[product.category, product.price]
)

# Load source and run executor
source = sl.InMemorySource(product)
executor = sl.InMemoryExecutor(sources=[source], indices=[product_index])
app = executor.run()
source.put(products)


In [None]:
# Unified multimodal query with dynamic weights and hard filters
query = (
    sl.Query(product_index, weights={
        text_space: 0.5,
        price_space: 0.3,
        rating_space: 0.2
    })
    .find(product)
    .similar(text_space.text, query_text)
    .filter(product.category == "electronics")
    .filter(product.price <= 200)
    .select_all()
)

# Execute query
result = app.query(query)

# Display results as DataFrame
sl.PandasConverter.to_pandas(result)


Unnamed: 0,title,description,price,rating,category,id,similarity_score
0,Premium Wireless Headphones,High-end wireless headphones with active noise...,199,4.8,electronics,p1,0.731401
1,Budget Noise-Canceling Earbuds,Affordable wireless earbuds with basic noise c...,89,4.2,electronics,p2,0.731056


## Key Advantages Demonstrated:

1. **Single unified index** combining text, numbers and categories  
2. **Pre-search filtering** eliminates irrelevant results early  
3. **Dynamic weighting** adjusts importance without re-embedding  
4. **Simpler code structure** with native multimodal support
