-
-
Notifications
You must be signed in to change notification settings - Fork 1
Closed
Labels
Description
Summary
Two TODO items indicate NPU semantic search is not fully utilizing pre-computed embeddings or NPU acceleration for actual search operations.
Current State
TODO #1 - src/npu_semantic_search.py:378:
# TODO: Enhance to use our pre-computed embeddingTODO #2 - src/agents/npu_code_search_agent.py:742:
# TODO: Implement proper semantic search with embeddings and NPU accelerationRequirements
1. Pre-computed Embedding Utilization
Current Problem: Embeddings may be computed on-the-fly instead of using cached/pre-computed vectors.
Solution:
- Leverage
EmbeddingCachefrom Issue perf: Performance optimization opportunities (21 identified improvements) #65 P0 optimizations - Pre-compute embeddings for common queries
- Store embeddings in ChromaDB with metadata
- Use NPU for embedding computation when available
2. NPU-Accelerated Semantic Search
Current Problem: Search operations may not be utilizing NPU acceleration.
Solution:
- Route embedding generation through NPU when available
- Use OpenVINO-optimized models for vector operations
- Batch processing for multiple queries
- GPU/NPU fallback chain: NPU → GPU → CPU
Implementation Plan
Phase 1: Embedding Pre-computation
class NPUSemanticSearch:
def __init__(self):
self.embedding_cache = get_embedding_cache()
self.npu_client = get_npu_client()
async def search(self, query: str):
# Check cache first
cached = await self.embedding_cache.get(query)
if cached:
return await self._search_with_embedding(cached)
# Compute via NPU
embedding = await self.npu_client.compute_embedding(query)
await self.embedding_cache.put(query, embedding)
return await self._search_with_embedding(embedding)Phase 2: NPU Integration
async def compute_embedding_npu(self, text: str) -> List[float]:
"""Use NPU for embedding computation"""
if self.npu_available:
return await self.npu_worker.process({
"task": "embedding",
"model": "sentence-transformers",
"text": text,
"use_openvino": True
})
else:
# Fallback to CPU
return self.cpu_model.encode(text)Phase 3: Batch Operations
async def batch_search(self, queries: List[str]) -> List[SearchResult]:
"""Batch process multiple queries via NPU"""
# Check cache for all queries
cached_results = {}
uncached_queries = []
for query in queries:
cached = await self.embedding_cache.get(query)
if cached:
cached_results[query] = cached
else:
uncached_queries.append(query)
# Batch compute uncached embeddings via NPU
if uncached_queries:
embeddings = await self.npu_client.batch_compute_embeddings(uncached_queries)
for query, embedding in zip(uncached_queries, embeddings):
await self.embedding_cache.put(query, embedding)
cached_results[query] = embedding
# Perform searches
return await self._batch_search_with_embeddings(cached_results)Acceptance Criteria
- Pre-computed embeddings stored and retrieved from cache
- NPU acceleration used for embedding computation when available
- Fallback to CPU when NPU unavailable
- Batch processing for multiple queries
- Performance metrics showing improvement over CPU-only
- Integration with existing ChromaDB knowledge base
- Unit tests for NPU semantic search
- Benchmark showing 40-60% improvement with NPU
Performance Targets
- Cache hit: < 1ms for embedding lookup
- NPU embedding: < 50ms per query (vs 200ms CPU)
- Batch processing: 10 queries in < 100ms
- Memory efficiency: Cache limited to 1000 embeddings
Related Files
src/npu_semantic_search.py- Main semantic search modulesrc/agents/npu_code_search_agent.py- Agent using semantic searchsrc/knowledge_base.py- EmbeddingCache (from Issue perf: Performance optimization opportunities (21 identified improvements) #65)src/npu_integration.py- NPU worker clientsrc/ai_hardware_accelerator.py- Hardware detection
Testing
- Unit tests for embedding caching
- Unit tests for NPU computation
- Integration test with ChromaDB
- Performance benchmark
- Fallback behavior test
Estimated Effort
6-8 hours:
- Embedding cache integration: 2 hours
- NPU acceleration: 3-4 hours
- Batch processing: 2 hours
- Testing and benchmarking: 2+ hours
Priority
Medium - Performance enhancement for AI-driven code search
Reactions are currently unavailable