# Vector Index Services Test Notebook

This notebook demonstrates the usage of the vector index services in LiveRAG.

## Setup
First, let's add the project directory to the Python path and import the necessary modules.

In [1]:
import os
import sys

# Add the src directory to the Python path
sys.path.append(os.path.abspath('../src'))

# Import the services
from services.aws_utils import AWSUtils
from services.embedding_utils import EmbeddingUtils
from services.pinecone_index import PineconeService
from services.opensearch_index import OpenSearchService

  from .autonotebook import tqdm as notebook_tqdm


## 1. Individual Services
Let's demonstrate using each service individually.

### 1.1 AWS Utilities
The AWSUtils class provides functions for interacting with AWS services, particularly SSM Parameter Store.

In [13]:
# Create an AWS utilities instance
aws_utils = AWSUtils()

# Print the configured AWS region
print(f"AWS Region: {aws_utils.aws_region_name}")

# Note: The following commands would work with valid AWS credentials and SSM parameters
value = aws_utils.get_ssm_value("/opensearch/endpoint")
secret = aws_utils.get_ssm_secret("/pinecone/ro_token")
print(f"SSM Value: {value}")
print(f"SSM Secret: {secret}")

AWS Region: us-east-1
SSM Value: search-index01-zc4xlabgpncm3uqmerfedq5nx4.us-east-1.es.amazonaws.com
SSM Secret: pcsk_4x5iM2_LTm789oif1c1cU7TvPbUki1GiJGWmrZu4TiHLHkDsgJsHyyGVkFJy2yNjJiuqL


### 1.2 Embedding Utilities
The EmbeddingUtils class handles text embedding generation using transformer models.

In [15]:
# Create an embedding utilities instance
embedding_utils = EmbeddingUtils()

# Check for available hardware acceleration
print(f"MPS available: {embedding_utils.has_mps()}")
print(f"CUDA available: {embedding_utils.has_cuda()}")

# Note: The following command would load a model and generate embeddings
# This is commented out to avoid loading large models unnecessarily
query_embedding = embedding_utils.embed_query("What is a vector database?")
print(f"Embedding dimension: {len(query_embedding)}")

MPS available: True
CUDA available: False
Embedding dimension: 768


### 1.3 Pinecone Service
The PineconeService class provides functionality for querying Pinecone vector databases.

In [17]:
# Create a Pinecone service instance
pinecone_service = PineconeService()

print(f"Pinecone Index: {pinecone_service.index_name}")
print(f"Pinecone Namespace: {pinecone_service.namespace}")

# Note: The following commands would query Pinecone with valid credentials
results = pinecone_service.query_pinecone("What is a vector database?", top_k=3)
pinecone_service.show_pinecone_results(results)

Pinecone Index: fineweb10bt-512-0w-e5-base-v2
Pinecone Namespace: default
chunk: doc-<urn:uuid:67e8ec8b-08a9-4628-981e-1d1c6a992f76>::chunk-0 score: 0.8347314
Welcome to Vector Database!
Vector database is a digital-only collection of vector backbone information compiled by Addgene from third party sources.
This vector is NOT available from Addgene and the database is no longer actively maintained.
- Plasmid Type
- Cloning Method
- pCF83 map is shown here. Promoter reporter construct: S.cerevisaie CYC1 TATA cloned upstream of lacZ. Based on pIRT2U and pIRT2 respectively. ura4+, LEU2 respectively.

chunk: doc-<urn:uuid:212c8842-b072-45f6-87be-b3289186caf5>::chunk-0 score: 0.826883
Welcome to Vector Database!
Vector database is a digital collection of vector backbones assembled from publications and commercially available sources. This is a free resource for the scientific community that is compiled by Addgene.
This page is informational only - this vector is NOT available from Addgene -

### 1.4 OpenSearch Service
The OpenSearchService class provides functionality for querying OpenSearch vector databases.

In [5]:
# Create an OpenSearch service instance
opensearch_service = OpenSearchService()

print(f"OpenSearch Index: {opensearch_service.index_name}")

# Note: The following commands would query OpenSearch with valid credentials
# results = opensearch_service.query_opensearch("What is a vector database?", top_k=3)
# opensearch_service.show_opensearch_results(results)

OpenSearch Index: fineweb10bt-512-0w-e5-base-v2


## 2. Example Queries
The following examples demonstrate how to run queries using the individual services.

In [6]:
# Define some test queries
test_queries = [
    "What is a vector database?",
    "How do embedding models work?",
    "What are the advantages of vector search?"
]

# Note: The following code blocks would execute with valid credentials
# They are commented out to avoid unnecessary API calls

In [7]:
# Example 1: Pinecone single query
# print("=== Pinecone Single Query Example ===")
# results = pinecone_service.query_pinecone(test_queries[0], top_k=3)
# PineconeService.show_pinecone_results(results)

In [8]:
# Example 2: Pinecone batch query
# print("=== Pinecone Batch Query Example ===")
# batch_results = pinecone_service.batch_query_pinecone(test_queries, top_k=2)
# for i, results in enumerate(batch_results):
#     print(f"\nResults for query: {test_queries[i]}")
#     PineconeService.show_pinecone_results(results)

In [9]:
# Example 3: OpenSearch single query
# print("=== OpenSearch Single Query Example ===")
# results = opensearch_service.query_opensearch(test_queries[0], top_k=3)
# OpenSearchService.show_opensearch_results(results)

In [10]:
# Example 4: OpenSearch batch query
# print("=== OpenSearch Batch Query Example ===")
# batch_results = opensearch_service.batch_query_opensearch(test_queries, top_k=2)
# for i, response in enumerate(batch_results['responses']):
#     print(f"\nResults for query: {test_queries[i]}")
#     OpenSearchService.show_opensearch_results(response)

## 3. Embedding Generation Example
This demonstrates how to generate embeddings for text queries.

In [11]:
# Note: The following commands would load models and generate embeddings
# They are commented out to avoid loading large models unnecessarily

# Generate a single embedding
# single_embedding = embedding_utils.embed_query("What is a vector database?")
# print(f"Single embedding dimension: {len(single_embedding)}")

# Generate batch embeddings
# batch_embeddings = embedding_utils.batch_embed_queries(test_queries)
# print(f"Batch embeddings count: {len(batch_embeddings)}")
# print(f"Each embedding dimension: {len(batch_embeddings[0])}")