# Vector Search API Demo

This notebook demonstrates how to use the Vector Search API client to interact with our thread-safe vector content management system. We'll walk through creating libraries, documents, and chunks, and then performing both text and vector searches.

## Setup

First, let's import the necessary libraries and initialize the client. Make sure the API is running (either locally or in a Docker container).

In [None]:
import sys
import os
import uuid
from datetime import datetime
import pandas as pd
import matplotlib.pyplot as plt
import time

# Add the project root to the path
sys.path.append(os.path.abspath('../'))

# Import the client library
from api.client import VectorSearchClient

# Initialize the client
client = VectorSearchClient(base_url="http://localhost:8000")

## Check API Health

Let's first check if the API is running and healthy.

In [None]:
try:
    health = client.health_check()
    print(f"API Health: {health}")
except Exception as e:
    print(f"Error: {str(e)}")
    print("Make sure the API is running. You can start it with 'docker-compose up -d'")

## Get API Statistics

Let's check the current statistics of the API to see what's already in the system.

In [None]:
stats = client.get_stats()
print("API Statistics:")
print(f"Embedding Service: {stats['embedding_service']}")
print(f"Similarity Service: {stats['similarity_service']}")
print(f"Content Service: {stats['content_service']}")
print(f"Indexer Type: {stats['indexer_type']}")

## Create a Library

Let's create a new library to store our documents and chunks.

In [None]:
# Generate a unique ID for the library
library_id = f"lib-{uuid.uuid4().hex[:8]}"

# Create the library
library = client.create_library(
    id=library_id,
    name="Demo Library",
    description="Library for vector search demo"
)

print(f"Created library: {library}")

## Create a Document

Now, let's create a document in our library.

In [None]:
# Generate a unique ID for the document
document_id = f"doc-{uuid.uuid4().hex[:8]}"

# Create the document
document = client.create_document(
    id=document_id,
    library_id=library_id,
    title="Vector Search Overview",
    content="This document provides an overview of vector search techniques and applications.",
    metadata={"category": "technical", "author": "Windsurf Team"}
)

print(f"Created document: {document}")

## Create Chunks

Let's create several chunks with different content to demonstrate the search capabilities.

In [None]:
# Sample chunks for our document
chunk_texts = [
    "Vector search is a technique for finding similar items in a dataset based on their vector representations.",
    "Embeddings are numerical representations of text, images, or other data that capture semantic meaning.",
    "Cosine similarity is a common metric used to measure the similarity between two vectors in a vector space.",
    "Thread safety ensures that concurrent operations don't cause data corruption or race conditions.",
    "Suffix arrays enable efficient substring searches with O(P log T + k) time complexity.",
    "Tries are tree data structures optimized for prefix matching and autocomplete functionality.",
    "Inverted indices map terms to documents, enabling fast word-based searches and boolean queries.",
    "TF-IDF (Term Frequency-Inverse Document Frequency) is a numerical statistic used in information retrieval.",
    "Dimensionality reduction techniques like random projection can compress high-dimensional vectors.",
    "Content management systems organize and store digital content with metadata and search capabilities."
]

# Create chunks
chunks = []
for i, text in enumerate(chunk_texts):
    chunk_id = f"chunk-{uuid.uuid4().hex[:8]}"
    chunk = client.create_chunk(
        id=chunk_id,
        document_id=document_id,
        text=text,
        position=i,
        metadata={"index": i, "length": len(text)}
    )
    chunks.append(chunk)
    print(f"Created chunk {i+1}: {chunk['message']}")

print(f"\nCreated {len(chunks)} chunks successfully")

## Vector Search

Now let's perform a vector search to find chunks similar to a query.

In [None]:
# Vector search query
query = "How do vector embeddings work for semantic search?"

# Perform vector search
vector_results = client.vector_search(
    query_text=query,
    top_k=5
)

print(f"Vector search results for query: '{query}'\n")
for i, result in enumerate(vector_results):
    print(f"Result {i+1}:")
    print(f"  Score: {result['score']:.4f}")
    print(f"  Text: {result['chunk']['text']}")
    print()

## Text Search

Let's also try text search with different indexers.

In [None]:
# Text search queries
text_queries = [
    {"query": "vector", "indexer": "suffix"},
    {"query": "vector", "indexer": "trie"},
    {"query": "vector", "indexer": "inverted"}
]

# Perform text searches with different indexers
for query_info in text_queries:
    query = query_info["query"]
    indexer = query_info["indexer"]
    
    text_results = client.text_search(
        query=query,
        indexer_type=indexer
    )
    
    print(f"Text search results for query: '{query}' using {indexer} indexer\n")
    for i, result in enumerate(text_results):
        print(f"Result {i+1}:")
        print(f"  Text: {result['chunk']['text']}")
    print("\n" + "-"*50 + "\n")

## Performance Comparison

Let's compare the performance of vector search vs. text search.

In [None]:
# Queries for performance testing
performance_queries = [
    "vector similarity search",
    "embedding techniques for text",
    "efficient data structures",
    "thread safety in concurrent systems",
    "content management and organization"
]

# Measure performance
results = {
    "query": [],
    "vector_time": [],
    "vector_results": [],
    "suffix_time": [],
    "suffix_results": [],
    "trie_time": [],
    "trie_results": [],
    "inverted_time": [],
    "inverted_results": []
}

for query in performance_queries:
    results["query"].append(query)
    
    # Vector search
    start_time = time.time()
    vector_results = client.vector_search(query_text=query, top_k=5)
    vector_time = time.time() - start_time
    results["vector_time"].append(vector_time)
    results["vector_results"].append(len(vector_results))
    
    # Suffix array search
    start_time = time.time()
    suffix_results = client.text_search(query=query, indexer_type="suffix")
    suffix_time = time.time() - start_time
    results["suffix_time"].append(suffix_time)
    results["suffix_results"].append(len(suffix_results))
    
    # Trie search
    start_time = time.time()
    trie_results = client.text_search(query=query, indexer_type="trie")
    trie_time = time.time() - start_time
    results["trie_time"].append(trie_time)
    results["trie_results"].append(len(trie_results))
    
    # Inverted index search
    start_time = time.time()
    inverted_results = client.text_search(query=query, indexer_type="inverted")
    inverted_time = time.time() - start_time
    results["inverted_time"].append(inverted_time)
    results["inverted_results"].append(len(inverted_results))

# Create a DataFrame
df = pd.DataFrame(results)
df

## Visualize Performance Results

In [None]:
# Plot search times
plt.figure(figsize=(12, 6))

# Create a bar chart for search times
x = range(len(df["query"]))
width = 0.2

plt.bar(x, df["vector_time"], width=width, label="Vector Search")
plt.bar([i + width for i in x], df["suffix_time"], width=width, label="Suffix Array")
plt.bar([i + 2*width for i in x], df["trie_time"], width=width, label="Trie")
plt.bar([i + 3*width for i in x], df["inverted_time"], width=width, label="Inverted Index")

plt.xlabel("Query")
plt.ylabel("Time (seconds)")
plt.title("Search Time Comparison")
plt.xticks([i + 1.5*width for i in x], df["query"], rotation=45, ha="right")
plt.legend()
plt.tight_layout()
plt.show()

In [None]:
# Plot result counts
plt.figure(figsize=(12, 6))

# Create a bar chart for result counts
plt.bar(x, df["vector_results"], width=width, label="Vector Search")
plt.bar([i + width for i in x], df["suffix_results"], width=width, label="Suffix Array")
plt.bar([i + 2*width for i in x], df["trie_results"], width=width, label="Trie")
plt.bar([i + 3*width for i in x], df["inverted_results"], width=width, label="Inverted Index")

plt.xlabel("Query")
plt.ylabel("Number of Results")
plt.title("Search Result Count Comparison")
plt.xticks([i + 1.5*width for i in x], df["query"], rotation=45, ha="right")
plt.legend()
plt.tight_layout()
plt.show()

## Cleanup

Finally, let's clean up by deleting the resources we created.

In [None]:
# Delete chunks
for chunk in chunks:
    chunk_id = chunk["chunk"]["id"]
    result = client.delete_chunk(chunk_id)
    print(f"Deleted chunk {chunk_id}: {result['message']}")

# Delete document
result = client.delete_document(document_id)
print(f"Deleted document {document_id}: {result['message']}")

# Delete library
result = client.delete_library(library_id)
print(f"Deleted library {library_id}: {result['message']}")

## Conclusion

In this notebook, we've demonstrated how to use the Vector Search API client to interact with our thread-safe vector content management system. We've shown how to:

1. Create libraries, documents, and chunks
2. Perform vector searches for semantic similarity
3. Perform text searches with different indexers (suffix array, trie, inverted index)
4. Compare the performance of different search methods

The system provides a powerful and flexible way to manage and search content using both traditional text search and modern vector search techniques.