Skip to content

feat: NPU semantic search with pre-computed embeddings #68

@mrveiss

Description

@mrveiss

Summary

Two TODO items indicate NPU semantic search is not fully utilizing pre-computed embeddings or NPU acceleration for actual search operations.

Current State

TODO #1 - src/npu_semantic_search.py:378:

# TODO: Enhance to use our pre-computed embedding

TODO #2 - src/agents/npu_code_search_agent.py:742:

# TODO: Implement proper semantic search with embeddings and NPU acceleration

Requirements

1. Pre-computed Embedding Utilization

Current Problem: Embeddings may be computed on-the-fly instead of using cached/pre-computed vectors.

Solution:

2. NPU-Accelerated Semantic Search

Current Problem: Search operations may not be utilizing NPU acceleration.

Solution:

  • Route embedding generation through NPU when available
  • Use OpenVINO-optimized models for vector operations
  • Batch processing for multiple queries
  • GPU/NPU fallback chain: NPU → GPU → CPU

Implementation Plan

Phase 1: Embedding Pre-computation

class NPUSemanticSearch:
    def __init__(self):
        self.embedding_cache = get_embedding_cache()
        self.npu_client = get_npu_client()
    
    async def search(self, query: str):
        # Check cache first
        cached = await self.embedding_cache.get(query)
        if cached:
            return await self._search_with_embedding(cached)
        
        # Compute via NPU
        embedding = await self.npu_client.compute_embedding(query)
        await self.embedding_cache.put(query, embedding)
        return await self._search_with_embedding(embedding)

Phase 2: NPU Integration

async def compute_embedding_npu(self, text: str) -> List[float]:
    """Use NPU for embedding computation"""
    if self.npu_available:
        return await self.npu_worker.process({
            "task": "embedding",
            "model": "sentence-transformers",
            "text": text,
            "use_openvino": True
        })
    else:
        # Fallback to CPU
        return self.cpu_model.encode(text)

Phase 3: Batch Operations

async def batch_search(self, queries: List[str]) -> List[SearchResult]:
    """Batch process multiple queries via NPU"""
    # Check cache for all queries
    cached_results = {}
    uncached_queries = []
    
    for query in queries:
        cached = await self.embedding_cache.get(query)
        if cached:
            cached_results[query] = cached
        else:
            uncached_queries.append(query)
    
    # Batch compute uncached embeddings via NPU
    if uncached_queries:
        embeddings = await self.npu_client.batch_compute_embeddings(uncached_queries)
        for query, embedding in zip(uncached_queries, embeddings):
            await self.embedding_cache.put(query, embedding)
            cached_results[query] = embedding
    
    # Perform searches
    return await self._batch_search_with_embeddings(cached_results)

Acceptance Criteria

  • Pre-computed embeddings stored and retrieved from cache
  • NPU acceleration used for embedding computation when available
  • Fallback to CPU when NPU unavailable
  • Batch processing for multiple queries
  • Performance metrics showing improvement over CPU-only
  • Integration with existing ChromaDB knowledge base
  • Unit tests for NPU semantic search
  • Benchmark showing 40-60% improvement with NPU

Performance Targets

  • Cache hit: < 1ms for embedding lookup
  • NPU embedding: < 50ms per query (vs 200ms CPU)
  • Batch processing: 10 queries in < 100ms
  • Memory efficiency: Cache limited to 1000 embeddings

Related Files

Testing

  • Unit tests for embedding caching
  • Unit tests for NPU computation
  • Integration test with ChromaDB
  • Performance benchmark
  • Fallback behavior test

Estimated Effort

6-8 hours:

  • Embedding cache integration: 2 hours
  • NPU acceleration: 3-4 hours
  • Batch processing: 2 hours
  • Testing and benchmarking: 2+ hours

Priority

Medium - Performance enhancement for AI-driven code search

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions