# Test Pinecone VectorDB Query

This notebook queries the Pinecone vector database and retrieves the top 5 most similar results for a given query.


In [1]:
# Import necessary libraries
import sys
import os
from pathlib import Path
import pinecone
from pinecone import Pinecone

# Add src/backend to path
project_root = Path().resolve()
sys.path.append(str(project_root / "src" / "backend"))

from src.backend.query_processing import QueryProcessor
from src.backend.context_retriever import ContextRetriever


  from .autonotebook import tqdm as notebook_tqdm


## Configuration

Set up Pinecone connection and initialize query processing components.


In [2]:
# Pinecone configuration
PINECONE_API_KEY = os.getenv("PINECONE_API_KEY", "your-pinecone-api-key-here")
PINECONE_INDEX_NAME = os.getenv("PINECONE_INDEX_NAME", "test")

# Initialize Pinecone
pc = Pinecone(api_key=PINECONE_API_KEY)

# Connect to the index
index = pc.Index(PINECONE_INDEX_NAME)
print(f"Connected to index: {PINECONE_INDEX_NAME}")

# Initialize query processing components
query_processor = QueryProcessor()
context_retriever = ContextRetriever()
print("Initialized QueryProcessor and ContextRetriever")


Connected to index: test
Initialized QueryProcessor and ContextRetriever


## Query Function

Query the vector database and retrieve top 5 results.


In [3]:
def query_pinecone(query: str, top_k: int = 5):
    """
    Query Pinecone vector database and retrieve top K results.
    
    Args:
        query: The query string to search for
        top_k: Number of top results to retrieve (default: 5)
        
    Returns:
        Query results from Pinecone
    """
    # Step 1: Process the query
    print(f"Original query: {query}")
    processed_query = query_processor.process(query)
    print(f"Processed query: {processed_query}")
    
    # Step 2: Convert query to embeddings
    print("\nConverting query to embeddings...")
    query_embedding = context_retriever.convert_to_embeddings(processed_query)
    print(f"Embedding dimension: {len(query_embedding)}")
    
    # Step 3: Query Pinecone
    print(f"\nQuerying Pinecone for top {top_k} results...")
    results = index.query(
        vector=query_embedding.tolist(),
        top_k=top_k,
        include_metadata=True
    )
    
    return results


In [5]:
query = "explain Coupling Reaction of Diblock Copolymers"
print(query_pinecone(query))

Original query: explain Coupling Reaction of Diblock Copolymers
Processed query: explain coupling reaction of diblock copolymers

Converting query to embeddings...
Embedding dimension: 1024

Querying Pinecone for top 5 results...
{'matches': [{'id': '4df4fc15-cec3-4c27-92eb-bdf38007346c',
              'metadata': {'chunk_index': 16.0,
                           'pdf_name': 'takamuku2009',
                           'processed_text': 'function of the reaction '
                                             'temperature. the existence ratio '
                                             'of pees to pes-f at 100 °c [fig. '
                                             '8(1)] was calcu- lated to be '
                                             'around 20%, much lower than '
                                             'expected. further, some '
                                             'partially insoluble prod- ucts '
                                             'were observed after is

## Execute Query and Print Results

Run a test query and display the top 5 results.
