# Demo: Universal Query for Hybrid Retrieval

In this hands-on demo, we'll build a research paper discovery system using real arXiv data and Qdrant's Universal Query API. You'll see how to:

- Fetch real research papers from arXiv
- Generate **dense**, **sparse**, and **ColBERT** embeddings
- Execute hybrid retrieval with intelligent filtering
- Combine multiple search strategies in a single query

## Setup & Dependencies

Install required packages for working with Qdrant, embeddings, and arXiv.

In [1]:
# Install dependencies
!pip install -q "qdrant-client~=1.15.1" "fastembed~=0.7.3" arxiv


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.3.1[0m[39;49m -> [0m[32;49m25.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [2]:
from qdrant_client import QdrantClient, models

# Initialize Qdrant client
client = QdrantClient(
    url="https://your-cluster-url.cloud.qdrant.io",
    api_key="your-api-key",
)

collection_name = "research-papers"

## Step 1: Create the Collection

Configure our collection with three vector types for multi-stage retrieval:
- **Dense vectors** (384-dim) for semantic understanding
- **Sparse vectors** for exact keyword matching
- **ColBERT multivectors** (128-dim) for fine-grained reranking

In [3]:
# Clean state
if client.collection_exists(collection_name=collection_name):
    client.delete_collection(collection_name=collection_name)

# Create collection with three vector types
client.create_collection(
    collection_name=collection_name,
    vectors_config={
        "dense": models.VectorParams(size=384, distance=models.Distance.COSINE),
        "colbert": models.VectorParams(
            size=128,
            distance=models.Distance.COSINE,
            multivector_config=models.MultiVectorConfig(
                comparator=models.MultiVectorComparator.MAX_SIM
            ),
        ),
    },
    sparse_vectors_config={
        "sparse": models.SparseVectorParams(
            index=models.SparseIndexParams(on_disk=False)
        )
    },
)

print(f"✓ Collection '{collection_name}' created")

✓ Collection 'research-papers' created


### Create Payload Indexes

Create indexes for the fields we'll filter by. Qdrant applies filters at the HNSW search level, not as post-processing.

In [4]:
# Index fields for efficient filtering
client.create_payload_index(
    collection_name=collection_name,
    field_name="research_area",
    field_schema="keyword",
)
client.create_payload_index(
    collection_name=collection_name,
    field_name="open_access",
    field_schema="bool",
)
client.create_payload_index(
    collection_name=collection_name,
    field_name="published_date",
    field_schema="datetime",
)
client.create_payload_index(
    collection_name=collection_name,
    field_name="impact_score",
    field_schema="float",
)
client.create_payload_index(
    collection_name=collection_name,
    field_name="citation_count",
    field_schema="integer",
)

print("✓ Payload indexes created")

✓ Payload indexes created


## Step 2: Initialize Embedding Models

Load FastEmbed models for generating all three embedding types.

In [5]:
from fastembed import (
    TextEmbedding, 
    SparseTextEmbedding, 
    LateInteractionTextEmbedding
)

DENSE_MODEL_ID = "sentence-transformers/all-MiniLM-L6-v2"  # 384-dim
SPARSE_MODEL_ID = "prithivida/Splade_PP_en_v1"  # SPLADE sparse
COLBERT_MODEL_ID = "colbert-ir/colbertv2.0"  # 128-dim multivector

print("Loading embedding models...")
dense_model = TextEmbedding(DENSE_MODEL_ID)
sparse_model = SparseTextEmbedding(SPARSE_MODEL_ID)
colbert_model = LateInteractionTextEmbedding(COLBERT_MODEL_ID)

print("✓ All embedding models loaded")

Loading embedding models...
✓ All embedding models loaded


## Step 3: Fetch Papers from arXiv

Let's search arXiv for papers about transformers and multimodal learning.

In [6]:
import arxiv

# Initialize arXiv client and search
arxiv_client = arxiv.Client()

search = arxiv.Search(
    query="transformer AND multimodal",
    max_results=50,
    sort_by=arxiv.SortCriterion.SubmittedDate,
)

print("Fetching papers from arXiv...")

Fetching papers from arXiv...


## Step 4: Process and Ingest Papers

For each paper, we'll:
1. Extract metadata (title, authors, abstract, date)
2. Generate dense, sparse, and ColBERT embeddings
3. Upload to Qdrant

In [7]:
points = []

for i, paper in enumerate(arxiv_client.results(search)):
    if i % 10 == 0:
        print(f"Processing paper {i + 1}...")

    # Extract paper information
    abstract = paper.summary

    # Generate embeddings for the abstract
    dense_vector = next(dense_model.embed(abstract))
    sparse_vector = next(sparse_model.embed(abstract)).as_object()
    colbert_vector = next(colbert_model.embed(abstract))

    # Determine research area (simplified categorization)
    research_area = "machine_learning"
    if any(term in abstract.lower() for term in ["vision", "image", "visual"]):
        research_area = "computer_vision"
    elif any(term in abstract.lower() for term in ["language", "nlp", "text"]):
        research_area = "nlp"

    # Create point
    point = models.PointStruct(
        id=i,
        payload={
            "title": paper.title,
            "authors": [author.name for author in paper.authors],
            "abstract": abstract,
            "published_date": paper.published.isoformat(),
            "research_area": research_area,
            "citation_count": 10,  # Placeholder (would need external API)
            "impact_score": 0.7,  # Placeholder (would need calculation)
            "open_access": True,  # arXiv is open access
            "arxiv_id": paper.entry_id,
        },
        vector={
            "dense": dense_vector,
            "sparse": sparse_vector,
            "colbert": colbert_vector,
        },
    )
    points.append(point)

print(f"\n✓ Processed {len(points)} papers")

Processing paper 1...
Processing paper 11...
Processing paper 21...
Processing paper 31...
Processing paper 41...

✓ Processed 50 papers


Since we use FastEmbed, we could also create the points with alternative syntax using so-called local inference. Here's how that would look:

```python
point = models.PointStruct(
	id=i,
	payload={
		...  # same as above
	},
	vector={
		"dense": model.Document(
			model=DENSE_MODEL_ID,
			text=abstract,
		),
		"sparse": model.Document(
			model=SPARSE_MODEL_ID,
			text=abstract,
		),
		"colbert": model.Document(
			model=COLBERT_MODEL_ID,
			text=abstract,
		),
	},
)
```

FastEmbed would then handle embedding generation within Qdrant during upload. This syntax is also compatible with [Cloud Inference](https://qdrant.tech/documentation/cloud/inference/) if you prefer to offload embedding generation to Qdrant Cloud.

In [8]:
# Upload to Qdrant
client.upload_points(
    collection_name=collection_name,
    points=points,
)

print(f"✓ Uploaded {len(points)} research papers to Qdrant")

✓ Uploaded 50 research papers to Qdrant


## Step 5: Execute the Universal Query

Now we'll search for papers using hybrid retrieval that combines:
1. **Parallel dense and sparse search**
2. **Reciprocal Rank Fusion (RRF)**
3. **ColBERT reranking**
4. **Global filtering** (applied at every stage)

In [9]:
# Define our research query
research_query = "transformer architectures for multimodal learning"

# Encode query with all three models
research_query_dense = next(dense_model.query_embed(research_query))
research_query_sparse = next(sparse_model.query_embed(research_query)).as_object()
research_query_colbert = next(colbert_model.query_embed(research_query))

### Define Global Filter

This filter will automatically propagate to all prefetch stages.

In [10]:
from datetime import datetime, timedelta

# Define quality constraints
global_filter = models.Filter(
    must=[
        # Research domain filtering
        models.FieldCondition(
            key="research_area",
            match=models.MatchAny(
                any=[
                    "machine_learning",
                    "computer_vision",
                    "nlp",
                ]
            ),
        ),
        # Open access only
        models.FieldCondition(key="open_access", match=models.MatchValue(value=True)),
        # Recent research (last 6 years)
        models.FieldCondition(
            key="published_date",
            range=models.DatetimeRange(
                gte=(datetime.now() - timedelta(days=365 * 6)).isoformat()
            ),
        ),
        # High-impact papers
        models.FieldCondition(key="impact_score", range=models.Range(gte=0.6)),
        # Well-cited work
        models.FieldCondition(key="citation_count", range=models.Range(gte=5)),
    ]
)

### Build and Execute Multi-Stage Query

In [11]:
# Stage 1: Parallel prefetch (dense + sparse)
hybrid_query = [
    models.Prefetch(query=research_query_dense, using="dense", limit=100),
    models.Prefetch(query=research_query_sparse, using="sparse", limit=100),
]

# Stage 2: Fusion with RRF
fusion_query = models.Prefetch(
    prefetch=hybrid_query,
    query=models.FusionQuery(fusion=models.Fusion.RRF),
    limit=100,
)

# Stage 3: Execute with ColBERT reranking
response = client.query_points(
    collection_name=collection_name,
    prefetch=fusion_query,
    query=research_query_colbert,
    using="colbert",
    query_filter=global_filter,  # Automatically propagates to all stages
    limit=10,
    with_payload=True,
)

print(f"✓ Query executed successfully")
print(f"Found {len(response.points)} results")

✓ Query executed successfully
Found 10 results


## Results

Let's examine the top papers discovered by our hybrid retrieval system.

In [12]:
print("=" * 100)
print("TOP RESEARCH PAPERS")
print("=" * 100)

for i, hit in enumerate(response.points or [], 1):
    paper = hit.payload
    print(f"\n{i}. {paper['title']}")
    print(
        f"   Authors: {', '.join(paper['authors'][:3])}{'...' if len(paper['authors']) > 3 else ''}"
    )
    print(f"   Published: {paper['published_date'][:10]}")
    print(f"   Research Area: {paper['research_area']}")
    print(f"   Relevance Score: {hit.score:.4f}")
    print(f"   arXiv: {paper['arxiv_id']}")
    print(f"   Abstract: {paper['abstract'][:200]}...")

TOP RESEARCH PAPERS

1. Collaborative Text-to-Image Generation via Multi-Agent Reinforcement Learning and Semantic Fusion
   Authors: Jiabao Shi, Minfeng Qi, Lefeng Zhang...
   Published: 2025-10-12
   Research Area: computer_vision
   Relevance Score: 25.1145
   arXiv: http://arxiv.org/abs/2510.10633v1
   Abstract: Multimodal text-to-image generation remains constrained by the difficulty of
maintaining semantic alignment and professional-level detail across diverse
visual domains. We propose a multi-agent reinfo...

2. Beyond Appearance: Transformer-based Person Identification from Conversational Dynamics
   Authors: Masoumeh Chapariniya, Teodora Vukovic, Sarah Ebling...
   Published: 2025-10-06
   Research Area: machine_learning
   Relevance Score: 24.5158
   arXiv: http://arxiv.org/abs/2510.04753v1
   Abstract: This paper investigates the performance of transformer-based architectures
for person identification in natural, face-to-face conversation scenario. We
implement and evaluate