The goal of this second lesson is to get familiar with [Superlinked](https://superlinked.com/) — a powerful framework for building multi-attribute vector search systems — which lets us handle complex, nuanced queries that go far beyond simple keyword matching or text-only embeddings.

> We'll use Superlinked to power the property search tool in our voice agent, enabling natural language queries like "I'm looking for a spacious apartment in downtown with at least 3 bedrooms under 500k"!

In this lesson, we'll teach you the fundamentals of Superlinked, starting with why traditional vector search falls short, and progressively building a complete property search system that combines semantic understanding with structured filters.

![Superlinked Logo](./img/sl_diagram.png)

## Why Superlinked?

Traditional vector search handles text well but struggles with real-world queries that mix **semantic meaning**, **numerical constraints**, and **categorical filters**—such as real estate searches involving preferences like "spacious", numerical limits like "under €300,000", and categorical factors like location.

Common workarounds fail because:

* **Stringifying all data** loses numerical relationships (an embedding can't tell that €449k is close to €450k).

* **Metadata filters** create hard cutoffs that don’t reflect smooth user preferences.

* **Multiple searches** and **RAG fusion** miss attribute interactions.

* **Re-ranking** only works if relevant items appear in the initial retrieval, which often isn't the case.

![Superlinked Diagram](./img/why-superlinked-filterreranking.avif)

Superlinked solves this by encoding all features—text, numbers, categories, and time—into a single unified vector space using specialized encoders (`TextSimilaritySpace`, `NumberSpace`, `CategorySpace`, `RecencySpace`).

At query time, each attribute can be weighted dynamically, allowing systems (like voice agents) to reflect what the user cares about most and retrieve better multi-attribute matches.

## Superlinked 101

---

**⚠️  IMPORTANT!  ⚠️**

Before moving forward, make sure you have:

1. Set up your `.env` file with your API keys
2. Installed the project dependencies
3. Install the project Python Library with  `uv pip install -U -e .`

---

In [None]:
from dotenv import load_dotenv

load_dotenv()

### Concept 1: The Schema

The first step in building any Superlinked application is defining your **Schema** — a structured representation of your data that tells Superlinked what fields exist and what types they are.

For our real estate use case, we'll define a `Property` schema with:

- `id`: Unique identifier (required by Superlinked)
- `description`: Free-text description of the property
- `baths`: Number of bathrooms (integer)
- `rooms`: Number of rooms (integer)
- `sqft`: Size in square meters (integer)
- `location`: Neighborhood name (string)
- `price`: Price in euros (integer)

Let's define it.

In [None]:
from superlinked import framework as sl

class Property(sl.Schema):
    """Schema for real estate properties."""
    id: sl.IdField
    description: sl.String
    baths: sl.Integer
    rooms: sl.Integer
    sqft: sl.Integer
    location: sl.String
    price: sl.Integer


# Create an instance of the schema
property_schema = Property()

### Concept 2: Vector Spaces

Now comes the magic! We'll create **Spaces** that define how different attributes should be embedded into vectors.

For our property search, we'll create three spaces:

1. **TextSimilaritySpace** for `description`: This uses a sentence transformer model to create semantic embeddings. When someone searches for "modern minimalist design", it will match properties described with similar concepts even if the exact words differ.

2. **NumberSpace** for `sqft` (MAXIMUM mode): Larger is better. A search for "spacious" will favor larger apartments.

3. **NumberSpace** for `price` (MINIMUM mode): Lower is better. A search mentioning "affordable" will favor cheaper options.


The `mode` parameter is crucial:
- `Mode.MAXIMUM`: Higher values are preferred (great for size, ratings, popularity)
- `Mode.MINIMUM`: Lower values are preferred (great for price, distance)
- `Mode.SIMILAR`: Values closer to a target are preferred

In [None]:
# Embedding model for text similarity
EMBEDDING_MODEL = "sentence-transformers/all-MiniLM-L6-v2"

# TextSimilaritySpace for semantic understanding of descriptions
description_space = sl.TextSimilaritySpace(
    text=property_schema.description,
    model=EMBEDDING_MODEL
)

# NumberSpace for size - MAXIMUM mode means larger is better
size_space = sl.NumberSpace(
    number=property_schema.sqft,
    min_value=20,    # Smallest reasonable apartment
    max_value=500,   # Largest reasonable apartment
    mode=sl.Mode.MAXIMUM
)

# NumberSpace for price - MINIMUM mode means lower is better
price_space = sl.NumberSpace(
    number=property_schema.price,
    min_value=100000,    # Minimum price
    max_value=10000000,   # Maximum price
    mode=sl.Mode.MINIMUM
)

### Concept 3: Building an Index

The **Index** is where everything comes together. It combines all your spaces into a single searchable structure.

We'll also include additional fields that we want to be able to filter on (like `location`, `rooms` or `baths`) but that don't need their own vector space — they'll be used as hard filters.

This is a key design decision:
- **Spaces**: For attributes that should influence *similarity scoring* (fuzzy matching)
- **Fields**: For attributes used in *exact filtering* (hard constraints)

In [None]:
# Create the index combining all spaces
property_index = sl.Index(
    spaces=[description_space, size_space, price_space],
    fields=[
        property_schema.rooms,
        property_schema.baths,
        property_schema.sqft,
        property_schema.price,
        property_schema.location,
    ],
)

### Concept 4: Defining Queries

Now we define **Queries** — templates for how we'll search our index. The power of Superlinked queries lies in their parameterization:

- **Weights**: How much each space contributes to the final score (adjustable at query time!)
- **Filters**: Hard constraints on field values
- **Similar**: The search query itself

This is incredibly powerful for a voice agent! When a user says *"I want something affordable"*, we can increase the `price_weight`. When they say *"I need lots of space"*, we increase the `size_weight`.

Of course, this isn't ideal for a dynamic system like our voice agent — we don’t want to constantly tweak these parameters on the fly. The cool part? Superlinked provides natural queries.

**Natural queries** automatically interpret what the user says and translate it into the appropriate parameters for the system.

In [None]:
# Remember to set the .env with OpenAI keys and run `uv pip install -U -e .` to install the Python project!!
from realtime_phone_agents.config import settings

openai_config = sl.OpenAIClientConfig(
    api_key=settings.openai.api_key, model=settings.openai.model
)

Now, let's check our `property_search_query`, that contains all the logic we need to find our desired appartment.

In [None]:
# Define the semantic search query with parameterized weights and filters
search_query = (
    sl.Query(
        property_index,
        weights={
            description_space: sl.Param("description_weight"),
            size_space: sl.Param("size_weight"),
            price_space: sl.Param("price_weight"),
        },
    )
    # Explicit mention to the schema
    .find(property_schema)
    # Define natural query as a way to decompose the user's query
    .with_natural_query(sl.Param("natural_query"), openai_config)
    .similar(
        description_space,
        sl.Param(
            "description_query",
            description="The user's natural language query for property search.",
        ),
    )
    # Filters - these are hard constraints
    .filter(
        property_schema.location 
        == sl.Param(
            "location",
            description="Used to filter appartments by neighborhood"
        ))
    .filter(
        property_schema.rooms 
        >= sl.Param(
            "min_rooms",
            description="Used to find apartments with a room count equal to or greater than the specified number"
        ))
    .filter(
        property_schema.baths 
        >= sl.Param(
            "min_baths",
            description="Used to find apartments with a bath count equal to or greater than the specified number"
        ))
    .filter(
        property_schema.sqft 
        >= sl.Param(
            "sqft_bigger_than",
            description="Used to find appartments with square feet equal to or greather than the specified number"
        ))
    .filter(
        property_schema.price 
        <= sl.Param(
            "price_smaller_than",
            description="Used to find appartments with price less than the specified number"
        ))
    .limit(sl.Param("limit"))
    .select_all()
)

### Concept 5: Data Sources (InMemory version)

For this notebook, we'll keep things simple by loading all the data into memory. In the actual implementation, however, the code relies on Qdrant.

In [None]:
# We define the source type. In this case, `InMemorySource`
source = sl.InMemorySource(
    property_schema,
    parser=sl.DataFrameParser(schema=property_schema)
)

executor = sl.InMemoryExecutor(sources=[source], indices=[property_index])
app = executor.run()

Now, let’s take a look at the apartment data. It comes from [a Kaggle dataset](https://www.kaggle.com/datasets/kanchana1990/madrid-idealista-property-listings) containing real properties from Madrid.

In [18]:
import pandas as pd

df = pd.read_csv("../data/properties.csv")

In [19]:
df.head()

Unnamed: 0,id,price,baths,rooms,sqft,description,location
0,103903400,2290000,3,3,184,Luxury property with designer renovation in Ju...,Chueca-Justicia
1,102662115,245000,1,1,70,LA CASA Agency Imperial offers this property o...,Imperial
2,98711049,330000,2,2,80,RENTED - FOR SALE WITH TENANT / NO SALES COMMI...,Malasaña-Universidad
3,103863614,770000,2,4,123,FABULOUS AND SPACIOUS EXTERIOR APARTMENT IN CO...,Malasaña-Universidad
4,103369764,870000,3,3,314,GILMAR Consulting Real Estate Conde Orgaz excl...,Hortaleza


You can see the dataframe contains all the necessary columns. Now, let's insert his data into our `InMemorySource`.

In [None]:
source.put([df])

### Concept 6: Running our queries

Finally, some examples to check how our system works!

In [1]:
from pprint import pprint

In [None]:
results = app.query(
    search_query,
    natural_query="Do you have appartments in Barrio de Salamanca of at most 900000 euros?",
    limit=1,
)

In [None]:
pprint(results.entries[0].fields)

In [None]:
results = app.query(
    search_query,
    natural_query="Do you have appartments in Hortaleza of, at most 500000 euros? I'm not paying more than that!",
    limit=1,
)

In [None]:
pprint(results.entries[0].fields)

In [None]:
results = app.query(
    search_query,
    natural_query="I want an appartment with 4 rooms and 4 bathrooms in Chamartín please",
    limit=1,
)

In [None]:
pprint(results.entries[0].fields)