feat: NPU semantic search with pre-computed embeddings

## Summary

Two TODO items indicate NPU semantic search is not fully utilizing pre-computed embeddings or NPU acceleration for actual search operations.

## Current State

**TODO #1** - `src/npu_semantic_search.py:378`:
```python
# TODO: Enhance to use our pre-computed embedding
```

**TODO #2** - `src/agents/npu_code_search_agent.py:742`:
```python
# TODO: Implement proper semantic search with embeddings and NPU acceleration
```

## Requirements

### 1. Pre-computed Embedding Utilization

**Current Problem**: Embeddings may be computed on-the-fly instead of using cached/pre-computed vectors.

**Solution**:
- Leverage `EmbeddingCache` from Issue #65 P0 optimizations
- Pre-compute embeddings for common queries
- Store embeddings in ChromaDB with metadata
- Use NPU for embedding computation when available

### 2. NPU-Accelerated Semantic Search

**Current Problem**: Search operations may not be utilizing NPU acceleration.

**Solution**:
- Route embedding generation through NPU when available
- Use OpenVINO-optimized models for vector operations
- Batch processing for multiple queries
- GPU/NPU fallback chain: NPU → GPU → CPU

## Implementation Plan

### Phase 1: Embedding Pre-computation
```python
class NPUSemanticSearch:
    def __init__(self):
        self.embedding_cache = get_embedding_cache()
        self.npu_client = get_npu_client()
    
    async def search(self, query: str):
        # Check cache first
        cached = await self.embedding_cache.get(query)
        if cached:
            return await self._search_with_embedding(cached)
        
        # Compute via NPU
        embedding = await self.npu_client.compute_embedding(query)
        await self.embedding_cache.put(query, embedding)
        return await self._search_with_embedding(embedding)
```

### Phase 2: NPU Integration
```python
async def compute_embedding_npu(self, text: str) -> List[float]:
    """Use NPU for embedding computation"""
    if self.npu_available:
        return await self.npu_worker.process({
            "task": "embedding",
            "model": "sentence-transformers",
            "text": text,
            "use_openvino": True
        })
    else:
        # Fallback to CPU
        return self.cpu_model.encode(text)
```

### Phase 3: Batch Operations
```python
async def batch_search(self, queries: List[str]) -> List[SearchResult]:
    """Batch process multiple queries via NPU"""
    # Check cache for all queries
    cached_results = {}
    uncached_queries = []
    
    for query in queries:
        cached = await self.embedding_cache.get(query)
        if cached:
            cached_results[query] = cached
        else:
            uncached_queries.append(query)
    
    # Batch compute uncached embeddings via NPU
    if uncached_queries:
        embeddings = await self.npu_client.batch_compute_embeddings(uncached_queries)
        for query, embedding in zip(uncached_queries, embeddings):
            await self.embedding_cache.put(query, embedding)
            cached_results[query] = embedding
    
    # Perform searches
    return await self._batch_search_with_embeddings(cached_results)
```

## Acceptance Criteria

- [ ] Pre-computed embeddings stored and retrieved from cache
- [ ] NPU acceleration used for embedding computation when available
- [ ] Fallback to CPU when NPU unavailable
- [ ] Batch processing for multiple queries
- [ ] Performance metrics showing improvement over CPU-only
- [ ] Integration with existing ChromaDB knowledge base
- [ ] Unit tests for NPU semantic search
- [ ] Benchmark showing 40-60% improvement with NPU

## Performance Targets

- Cache hit: < 1ms for embedding lookup
- NPU embedding: < 50ms per query (vs 200ms CPU)
- Batch processing: 10 queries in < 100ms
- Memory efficiency: Cache limited to 1000 embeddings

## Related Files

- `src/npu_semantic_search.py` - Main semantic search module
- `src/agents/npu_code_search_agent.py` - Agent using semantic search
- `src/knowledge_base.py` - EmbeddingCache (from Issue #65)
- `src/npu_integration.py` - NPU worker client
- `src/ai_hardware_accelerator.py` - Hardware detection

## Testing

- [ ] Unit tests for embedding caching
- [ ] Unit tests for NPU computation
- [ ] Integration test with ChromaDB
- [ ] Performance benchmark
- [ ] Fallback behavior test

## Estimated Effort

**6-8 hours**:
- Embedding cache integration: 2 hours
- NPU acceleration: 3-4 hours
- Batch processing: 2 hours
- Testing and benchmarking: 2+ hours

## Priority

**Medium** - Performance enhancement for AI-driven code search

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: NPU semantic search with pre-computed embeddings #68

Summary

Current State

Requirements

1. Pre-computed Embedding Utilization

2. NPU-Accelerated Semantic Search

Implementation Plan

Phase 1: Embedding Pre-computation

Phase 2: NPU Integration

Phase 3: Batch Operations

Acceptance Criteria

Performance Targets

Related Files

Testing

Estimated Effort

Priority

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

feat: NPU semantic search with pre-computed embeddings #68

Description

Summary

Current State

Requirements

1. Pre-computed Embedding Utilization

2. NPU-Accelerated Semantic Search

Implementation Plan

Phase 1: Embedding Pre-computation

Phase 2: NPU Integration

Phase 3: Batch Operations

Acceptance Criteria

Performance Targets

Related Files

Testing

Estimated Effort

Priority

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions