Vector Search with Qdrant :

Vector Search:
Vector search replaces exact keyword matching with semantic similarity, enabling retrieval based on meaning.
It excels in searching through diverse data types like text, images, audio, video, and code, even when phrasing or formats differ.
It does that my converting words into number(vector embeddings), and the distance between two vectors in a 3D space gives them the meaning - words related to each other based on similarity and context are closer to each other.  It recognizes patterns and relationships between concepts, enabling search systems to retrieve the most relevant content, even when the phrasing differs, terminology varies, or no explicit keywords exist.

Qdrant:
Qdrant is a high-performance, open-source vector search engine built in Rust for scalable, production-grade applications.
It offers advanced vector search capabilities beyond basic similarity, staying aligned with modern AI search trends.


TLDR: Vector Search is excellent for semantic search to capture context and meaning behind words (and unstructured data) by relating language with maths(vectors and geometry), unlike syntactic search where strings are compared. and Qdrant is a vector search engine.


running qdrant in a docker container (in github codespace):
run in terminal:
<!-- 
docker pull qdrant/qdrant

docker run -p 6333:6333 -p 6334:6334 \
   -v "$(pwd)/qdrant_storage:/qdrant/storage:z" \
   qdrant/qdrant -->

In [1]:
# qdrant-client package for python and fastembed for optimized embedding (data vectorization) designed for qdrant.

!python -m pip install -q "qdrant-client[fastembed]>=1.14.2"

Step 1:
Import required libraries and connect to Qdrant.

In [3]:
from qdrant_client import QdrantClient, models

In [4]:
#connecting to local Qdrant instance
client = QdrantClient("http://localhost:6333") 

In [5]:
import requests

docs_url = 'https://github.com/alexeygrigorev/llm-rag-workshop/raw/main/notebooks/documents.json'
docs_response = requests.get(docs_url)
documents_raw = docs_response.json()

Step 2:
Study the dataset

In [12]:
#documents_raw

As the data seems already cleaned and chunked (list of dictionaries of question-answer pairs), and is only English text. Next we decide on which fields to be used for vector search and which to be stored as metadata.
Metadata is useful for filtering conditions. 

Since we are building a Q&A RAG system, it makes sense to store Answers (text) as embeddings and use vector search using the Question as a query.
Filters like Course and Section could be stored as metadata.

Step 3:
Choose the optimal embedding model with fastembed for our textual data

In [13]:
from fastembed import TextEmbedding
TextEmbedding.list_supported_models()

[{'model': 'BAAI/bge-base-en',
  'sources': {'hf': 'Qdrant/fast-bge-base-en',
   'url': 'https://storage.googleapis.com/qdrant-fastembed/fast-bge-base-en.tar.gz',
   '_deprecated_tar_struct': True},
  'model_file': 'model_optimized.onnx',
  'description': 'Text embeddings, Unimodal (text), English, 512 input tokens truncation, Prefixes for queries/documents: necessary, 2023 year.',
  'license': 'mit',
  'size_in_GB': 0.42,
  'additional_files': [],
  'dim': 768,
  'tasks': {}},
 {'model': 'BAAI/bge-base-en-v1.5',
  'sources': {'hf': 'qdrant/bge-base-en-v1.5-onnx-q',
   'url': 'https://storage.googleapis.com/qdrant-fastembed/fast-bge-base-en-v1.5.tar.gz',
   '_deprecated_tar_struct': True},
  'model_file': 'model_optimized.onnx',
  'description': 'Text embeddings, Unimodal (text), English, 512 input tokens truncation, Prefixes for queries/documents: not so necessary, 2023 year.',
  'license': 'mit',
  'size_in_GB': 0.21,
  'additional_files': [],
  'dim': 768,
  'tasks': {}},
 {'model':

In [14]:
import json

embedding_dimensionality = 512

for model in TextEmbedding.list_supported_models():
    if model['dim'] == embedding_dimensionality:
        print(json.dumps(model, indent = 2))

{
  "model": "BAAI/bge-small-zh-v1.5",
  "sources": {
    "hf": "Qdrant/bge-small-zh-v1.5",
    "url": "https://storage.googleapis.com/qdrant-fastembed/fast-bge-small-zh-v1.5.tar.gz",
    "_deprecated_tar_struct": true
  },
  "model_file": "model_optimized.onnx",
  "description": "Text embeddings, Unimodal (text), Chinese, 512 input tokens truncation, Prefixes for queries/documents: not so necessary, 2023 year.",
  "license": "mit",
  "size_in_GB": 0.09,
  "additional_files": [],
  "dim": 512,
  "tasks": {}
}
{
  "model": "Qdrant/clip-ViT-B-32-text",
  "sources": {
    "hf": "Qdrant/clip-ViT-B-32-text",
    "url": null,
    "_deprecated_tar_struct": false
  },
  "model_file": "model.onnx",
  "description": "Text embeddings, Multimodal (text&image), English, 77 input tokens truncation, Prefixes for queries/documents: not necessary, 2021 year",
  "license": "mit",
  "size_in_GB": 0.25,
  "additional_files": [],
  "dim": 512,
  "tasks": {}
}
{
  "model": "jinaai/jina-embeddings-v2-small-e

In [17]:
model_handle = "jinaai/jina-embeddings-v2-small-en"

# like most dense embedding model, this one also measures semantic closeness through cosine similarity - angle between 2 vectors