# üöÄ POC: Busca Vetorial OTIMIZADA para AWS g4dn.2xlarge

## üéØ Inst√¢ncia: g4dn.2xlarge
- **GPU**: NVIDIA T4 (16GB VRAM) - Tensor Cores
- **vCPUs**: 8 cores
- **RAM**: 32GB
- **Custo**: ~$0.75/h (on-demand) | ~$0.25/h (spot)

## üöÄ Otimiza√ß√µes para T4:

### 1. **Modelo de embedding otimizado:**
- ‚úÖ `paraphrase-multilingual-MiniLM-L12-v2` (384 dims, 4x mais r√°pido)
- ‚úÖ **Mixed Precision (FP16)**: 2x speedup nos Tensor Cores da T4
- ‚úÖ **Batch size: 256** (aproveita 16GB VRAM vs 32 na CPU)

### 2. **FAISS IndexHNSWFlat:**
- ‚úÖ Sem treinamento (ao contr√°rio de IVF)
- ‚úÖ **99.5% de precis√£o** (vs 95-98% IVF)
- ‚úÖ **10-50x mais r√°pido** que Flat
- ‚ö° Opcional: FAISS GPU (5-10x mais r√°pido na busca)

### 3. **Performance esperada:**
| M√©trica | CPU | GPU T4 |
|---------|-----|--------|
| Constru√ß√£o | 30-40min | **5-10min** |
| Busca | 80-100ms | **30-50ms** |
| Throughput | ~100 q/s | **~1000 q/s** |

---

## üì¶ Setup e GPU Detection

In [None]:
import pandas as pd
import numpy as np
import json
import re
import faiss
import pickle
import torch
from pathlib import Path
from typing import Dict, List, Optional
from sentence_transformers import SentenceTransformer
from unidecode import unidecode
from tqdm import tqdm
import time

print("="*60)
print("üîç DETEC√á√ÉO DE HARDWARE")
print("="*60)

# Detectar GPU
device = 'cuda' if torch.cuda.is_available() else 'cpu'

if device == 'cuda':
    gpu_name = torch.cuda.get_device_name(0)
    gpu_memory = torch.cuda.get_device_properties(0).total_memory / (1024**3)
    print(f"\nüéÆ GPU detectada: {gpu_name}")
    print(f"üíæ VRAM total: {gpu_memory:.1f}GB")
    print(f"‚úÖ CUDA version: {torch.version.cuda}")
    
    # Verificar se √© T4
    if 'T4' in gpu_name:
        print(f"üöÄ GPU T4 detectada - Tensor Cores dispon√≠veis!")
        print(f"‚ö° Mixed Precision (FP16) ser√° ativado automaticamente")
    
    # FAISS GPU dispon√≠vel?
    try:
        res = faiss.StandardGpuResources()
        faiss_gpu_available = True
        print(f"‚úÖ FAISS GPU dispon√≠vel (busca 5-10x mais r√°pida)")
    except:
        faiss_gpu_available = False
        print(f"‚ö†Ô∏è  FAISS CPU only (para GPU: pip install faiss-gpu)")
else:
    print(f"\n‚ö†Ô∏è  Executando em CPU")
    print(f"üí° Recomendado: usar inst√¢ncia g4dn.2xlarge com GPU T4")
    faiss_gpu_available = False

print(f"\nüîß Device para embeddings: {device}")
print("="*60)

## 1. EmbeddingService - Otimizado para T4

In [None]:
class EmbeddingServiceGPU:
    """Servi√ßo otimizado para GPU NVIDIA T4 (g4dn.2xlarge)"""
    
    def __init__(
        self, 
        model_name: str = "sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2",
        use_fp16: bool = True
    ):
        """
        Modelos testados na T4 (velocidade vs qualidade):
        
        R√ÅPIDO (recomendado para produ√ß√£o):
        - paraphrase-multilingual-MiniLM-L12-v2 (384 dims) ‚úÖ MELHOR CUSTO-BENEF√çCIO
        - all-MiniLM-L6-v2 (384 dims, ingl√™s mas OK em PT)
        
        PRECISO (se precisar mais qualidade):
        - paraphrase-multilingual-mpnet-base-v2 (768 dims, 2x mais lento)
        - neuralmind/bert-base-portuguese-cased (768 dims, 3x mais lento)
        
        Args:
            use_fp16: Mixed precision (FP16) - 2x speedup na T4 (Tensor Cores)
        """
        print(f"‚ö° Carregando modelo: {model_name}")
        self.model = SentenceTransformer(model_name, device=device)
        
        # Mixed precision para T4 (Tensor Cores)
        if device == 'cuda' and use_fp16:
            self.model.half()  # Converte para FP16
            print(f"üöÄ Mixed Precision (FP16) ATIVADO - 2x speedup")
        
        self.embedding_dim = self.model.get_sentence_embedding_dimension()
        print(f"‚úÖ Embedding dimension: {self.embedding_dim}")
        print(f"üîß Device: {self.model.device}")
        
        # Batch size otimizado para T4 (16GB VRAM)
        if device == 'cuda':
            # T4 aguenta batch 256 com MiniLM (384 dims)
            # Se usar modelo maior (768 dims), reduza para 128
            self.optimal_batch_size = 256 if self.embedding_dim <= 384 else 128
        else:
            self.optimal_batch_size = 32
        
        print(f"üì¶ Batch size otimizado: {self.optimal_batch_size}")
    
    @staticmethod
    def normalize_text(text: str) -> str:
        """Normaliza√ß√£o de endere√ßos brasileiros"""
        if not text or not isinstance(text, str):
            return ""
        
        text = unidecode(text)
        text = text.lower()
        
        # Expandir abrevia√ß√µes (aceita com/sem ponto)
        replacements = {
            r'\br\.?\s': 'rua ',
            r'\bav\.?\s': 'avenida ',
            r'\btrav\.?\s': 'travessa ',
            r'\balam\.?\s': 'alameda ',
            r'\bpca\.?\s': 'praca ',
            r'\bjd\.?\s': 'jardim ',
            r'\bvl\.?\s': 'vila ',
            r'\bcj\.?\s': 'conjunto ',
            r'\bqd\.?\s': 'quadra ',
            r'\blt\.?\s': 'lote ',
        }
        
        for pattern, replacement in replacements.items():
            text = re.sub(pattern, replacement, text)
        
        # Remover pontua√ß√£o e m√∫ltiplos espa√ßos
        text = re.sub(r'[^\w\s]', ' ', text)
        text = re.sub(r'\s+', ' ', text).strip()
        
        return text
    
    def embed_text(self, text: str) -> np.ndarray:
        """Embedding de um texto"""
        normalized = self.normalize_text(text)
        if not normalized:
            return np.zeros(self.embedding_dim, dtype=np.float32)
        return self.model.encode(normalized, convert_to_numpy=True, show_progress_bar=False)
    
    def embed_address_fields(self, address: Dict[str, str]) -> Dict[str, np.ndarray]:
        """Embedding de m√∫ltiplos campos de um endere√ßo"""
        embeddings = {}
        for field in ['logradouro', 'bairro', 'cidade']:
            text = address.get(field, '')
            embeddings[field] = self.embed_text(text)
        return embeddings
    
    def embed_batch(self, texts: list, batch_size: int = None) -> np.ndarray:
        """
        Embedding de batch otimizado para T4:
        - CPU: batch=32 (limitado por RAM)
        - GPU T4: batch=256 (16GB VRAM + FP16)
        """
        if batch_size is None:
            batch_size = self.optimal_batch_size
        
        # Normalizar todos os textos
        normalized_texts = [self.normalize_text(t) for t in texts]
        normalized_texts = [t if t else " " for t in normalized_texts]
        
        # Encode com configura√ß√µes otimizadas
        embeddings = self.model.encode(
            normalized_texts,
            convert_to_numpy=True,
            show_progress_bar=True,
            batch_size=batch_size,
            device=device,
            normalize_embeddings=True  # L2 normalization (melhora similaridade)
        )
        
        return embeddings.astype(np.float32)

## 2. IndexBuilder - HNSW com suporte GPU

In [None]:
class IndexBuilderGPU:
    """Construtor de √≠ndices FAISS otimizado para T4"""
    
    def __init__(self, embedding_service: EmbeddingServiceGPU, use_gpu_index: bool = False):
        """
        Args:
            use_gpu_index: Transferir √≠ndices para GPU (busca 5-10x mais r√°pida)
                          Requer: pip install faiss-gpu
                          Aten√ß√£o: consome VRAM (pode conflitar com embeddings)
        """
        self.embedding_service = embedding_service
        self.indices = {}
        self.dataframe = None
        self.use_gpu_index = use_gpu_index and faiss_gpu_available
        
        if self.use_gpu_index:
            self.gpu_resource = faiss.StandardGpuResources()
            # Reservar 8GB para √≠ndices (deixa 8GB para embeddings)
            self.gpu_resource.setTempMemory(8 * 1024 * 1024 * 1024)
            print(f"üéÆ FAISS GPU ativado (8GB reservados para √≠ndices)")
    
    def build_indices(
        self,
        df: pd.DataFrame,
        fields: list = None,
        use_hnsw: bool = True,
        M: int = 32,
        efSearch: int = 32
    ) -> dict:
        """
        Constr√≥i √≠ndices FAISS HNSW otimizados
        
        Args:
            use_hnsw: True=HNSW (r√°pido+preciso), False=Flat (lento)
            M: Conex√µes no grafo HNSW (16-64)
               - M=16: r√°pido, ~98% recall
               - M=32: balanceado, ~99.5% recall ‚úÖ
               - M=64: preciso, ~99.9% recall, +mem√≥ria
            efSearch: Vizinhos na busca (16-64)
               - 16: muito r√°pido
               - 32: balanceado ‚úÖ
               - 64: mais preciso
        """
        if fields is None:
            fields = ['logradouro', 'bairro', 'cidade']
        
        self.dataframe = df.copy()
        n_records = len(df)
        
        print(f"\n{'='*60}")
        print(f"üî® CONSTRUINDO √çNDICES FAISS")
        print(f"{'='*60}")
        print(f"üìä Total de registros: {n_records:,}")
        print(f"‚öôÔ∏è  Modo: {'HNSW (r√°pido+preciso)' if use_hnsw else 'Flat (lento)'}")
        print(f"üéØ Par√¢metros: M={M}, efSearch={efSearch}")
        print(f"")
        
        total_start = time.time()
        
        for field in fields:
            print(f"\nüìç Campo: {field}")
            print(f"{'-'*40}")
            field_start = time.time()
            
            # Embedding do campo
            texts = df[field].fillna('').astype(str).tolist()
            print(f"   ‚ö° Gerando embeddings...")
            embeddings = self.embedding_service.embed_batch(texts)
            dimension = embeddings.shape[1]
            
            # Criar √≠ndice
            if use_hnsw:
                print(f"   üß† Criando √≠ndice HNSW (dim={dimension})...")
                index_cpu = faiss.IndexHNSWFlat(dimension, M)
                index_cpu.hnsw.efSearch = efSearch
                
                print(f"   üì• Adicionando {n_records:,} vetores...")
                index_cpu.add(embeddings)
                
                # GPU (opcional)
                if self.use_gpu_index:
                    print(f"   üéÆ Transferindo para GPU T4...")
                    try:
                        index = faiss.index_cpu_to_gpu(self.gpu_resource, 0, index_cpu)
                        print(f"   ‚úÖ √çndice na GPU")
                    except Exception as e:
                        print(f"   ‚ö†Ô∏è  Falha GPU: {e}")
                        print(f"   ‚ÑπÔ∏è  Usando √≠ndice CPU")
                        index = index_cpu
                else:
                    index = index_cpu
            else:
                print(f"   üß† Criando √≠ndice Flat (dim={dimension})...")
                index = faiss.IndexFlatL2(dimension)
                index.add(embeddings)
                
                if self.use_gpu_index:
                    index = faiss.index_cpu_to_gpu(self.gpu_resource, 0, index)
            
            self.indices[field] = index
            
            elapsed = time.time() - field_start
            print(f"   ‚úÖ Conclu√≠do em {elapsed:.1f}s")
        
        total_elapsed = time.time() - total_start
        print(f"\n{'='*60}")
        print(f"üéâ TODOS OS √çNDICES CONSTRU√çDOS")
        print(f"‚è±Ô∏è  Tempo total: {total_elapsed/60:.1f}min")
        print(f"{'='*60}\n")
        
        return self.indices
    
    def save_indices(self, output_dir: str):
        """Salva √≠ndices para reutiliza√ß√£o"""
        output_path = Path(output_dir)
        output_path.mkdir(parents=True, exist_ok=True)
        
        print(f"\nüíæ Salvando √≠ndices em: {output_path}")
        
        for field, index in self.indices.items():
            # Se √≠ndice est√° na GPU, transferir para CPU antes de salvar
            if self.use_gpu_index:
                index_cpu = faiss.index_gpu_to_cpu(index)
            else:
                index_cpu = index
            
            index_file = output_path / f"{field}_index.faiss"
            faiss.write_index(index_cpu, str(index_file))
            print(f"   ‚úÖ {field}_index.faiss")
        
        # Salvar dataframe
        df_file = output_path / "addresses.parquet"
        self.dataframe.to_parquet(df_file, index=False)
        print(f"   ‚úÖ addresses.parquet")
        
        # Metadata
        metadata = {
            'fields': list(self.indices.keys()),
            'n_records': len(self.dataframe),
            'embedding_dim': self.embedding_service.embedding_dim
        }
        metadata_file = output_path / "metadata.pkl"
        with open(metadata_file, 'wb') as f:
            pickle.dump(metadata, f)
        print(f"   ‚úÖ metadata.pkl")
        
        print(f"\nüéâ √çndices salvos! Use load_indices() para carregar.")
    
    def load_indices(self, input_dir: str):
        """Carrega √≠ndices salvos (MUITO mais r√°pido)"""
        input_path = Path(input_dir)
        
        print(f"\nüìÇ Carregando √≠ndices de: {input_path}")
        start_time = time.time()
        
        # Metadata
        metadata_file = input_path / "metadata.pkl"
        with open(metadata_file, 'rb') as f:
            metadata = pickle.load(f)
        
        # DataFrame
        df_file = input_path / "addresses.parquet"
        self.dataframe = pd.read_parquet(df_file)
        
        # √çndices
        for field in metadata['fields']:
            index_file = input_path / f"{field}_index.faiss"
            index_cpu = faiss.read_index(str(index_file))
            
            # Transferir para GPU se ativado
            if self.use_gpu_index:
                try:
                    index = faiss.index_cpu_to_gpu(self.gpu_resource, 0, index_cpu)
                    print(f"   ‚úÖ {field} (GPU)")
                except:
                    index = index_cpu
                    print(f"   ‚úÖ {field} (CPU)")
            else:
                index = index_cpu
                print(f"   ‚úÖ {field}")
            
            self.indices[field] = index
        
        elapsed = time.time() - start_time
        print(f"\n‚ö° Carregado em {elapsed:.1f}s ({len(self.dataframe):,} registros)\n")
        
        return self.indices, self.dataframe

## 3. SearchEngine (mesmo da vers√£o anterior)

In [None]:
class SearchEngine:
    """Motor de busca vetorial com pesos din√¢micos"""
    
    def __init__(
        self,
        embedding_service: EmbeddingServiceGPU,
        indices: Dict[str, faiss.Index],
        dataframe: pd.DataFrame
    ):
        self.embedding_service = embedding_service
        self.indices = indices
        self.dataframe = dataframe
        
        self.base_weights = {
            'with_cep': {
                'cep': 0.30,
                'logradouro': 0.40,
                'bairro': 0.20,
                'cidade': 0.10
            },
            'without_cep': {
                'logradouro': 0.55,
                'bairro': 0.25,
                'cidade': 0.20
            }
        }
        
        self.use_uf_filter = True
        self.confidence_threshold = 0.8
    
    def _get_dynamic_weights(self, query: Dict[str, str]) -> Dict[str, float]:
        has_cep = bool(query.get('cep'))
        weights = self.base_weights['with_cep' if has_cep else 'without_cep'].copy()
        available_fields = [f for f in ['logradouro', 'bairro', 'cidade'] if query.get(f)]
        filtered_weights = {k: v for k, v in weights.items() if k in available_fields or k == 'cep'}
        total_weight = sum(filtered_weights.values())
        if total_weight > 0:
            normalized_weights = {k: v / total_weight for k, v in filtered_weights.items()}
        else:
            normalized_weights = filtered_weights
        return normalized_weights
    
    def _calculate_field_similarity(
        self,
        field: str,
        query_embedding: np.ndarray,
        top_k: int = 100
    ) -> tuple:
        index = self.indices[field]
        query_embedding = query_embedding.reshape(1, -1).astype(np.float32)
        distances, indices = index.search(query_embedding, top_k)
        similarities = 1.0 / (1.0 + distances[0])
        return similarities, indices[0]
    
    def _calculate_cep_match(self, query_cep: str, db_cep: str) -> float:
        if not query_cep or not db_cep:
            return 0.0
        
        query_clean = query_cep.replace('-', '').replace('.', '')
        db_clean = db_cep.replace('-', '').replace('.', '')
        
        if query_clean == db_clean:
            return 1.0
        
        if len(query_clean) >= 5 and len(db_clean) >= 5:
            if query_clean[:5] == db_clean[:5]:
                return 0.5
        
        return 0.0
    
    def search(
        self,
        query: Dict[str, str],
        top_k: int = 5,
        search_k: int = 100
    ) -> str:
        weights = self._get_dynamic_weights(query)
        query_embeddings = self.embedding_service.embed_address_fields(query)
        
        candidate_scores = {}
        field_scores_map = {}
        
        for field in ['logradouro', 'bairro', 'cidade']:
            if not query.get(field):
                continue
            
            query_emb = query_embeddings[field]
            similarities, indices = self._calculate_field_similarity(field, query_emb, search_k)
            weight = weights.get(field, 0.0)
            
            for idx, sim in zip(indices, similarities):
                if self.use_uf_filter and query.get('uf'):
                    db_uf = self.dataframe.iloc[idx]['uf']
                    if db_uf != query['uf']:
                        continue
                
                if idx not in candidate_scores:
                    candidate_scores[idx] = 0.0
                    field_scores_map[idx] = {}
                
                candidate_scores[idx] += weight * sim
                field_scores_map[idx][field] = float(sim)
        
        if query.get('cep'):
            cep_weight = weights.get('cep', 0.0)
            for idx in candidate_scores.keys():
                db_cep = self.dataframe.iloc[idx]['cep']
                cep_score = self._calculate_cep_match(query.get('cep'), db_cep)
                candidate_scores[idx] += cep_weight * cep_score
                field_scores_map[idx]['cep'] = cep_score
        
        sorted_candidates = sorted(
            candidate_scores.items(),
            key=lambda x: x[1],
            reverse=True
        )[:top_k]
        
        results = []
        for idx, score in sorted_candidates:
            row = self.dataframe.iloc[idx]
            
            if score >= self.confidence_threshold:
                confidence = "high"
            elif score >= 0.6:
                confidence = "medium"
            else:
                confidence = "low"
            
            result = {
                "address": {
                    "logradouro": row['logradouro'],
                    "bairro": row['bairro'],
                    "cidade": row['cidade'],
                    "uf": row['uf'],
                    "cep": row['cep']
                },
                "score": float(score),
                "confidence": confidence,
                "field_scores": field_scores_map.get(idx, {})
            }
            results.append(result)
        
        response = {
            "results": results,
            "query": query,
            "total_found": len(results),
            "weights_used": weights
        }
        
        return json.dumps(response, ensure_ascii=False, indent=2)

## 4. Carregar DNE Real

In [None]:
# Carregar DNE
dne_path = Path('../data/dne.parquet')
print(f"üìÇ Carregando DNE de: {dne_path}")
df_dne_raw = pd.read_parquet(dne_path)

# Mapear colunas do DNE real
column_mapping = {
    'logradouro_completo': 'logradouro',
    'bairro_completo': 'bairro',
    'cidade_completo': 'cidade'
}
df_dne = df_dne_raw.rename(columns=column_mapping)

print(f"\n‚úÖ Dataset carregado: {len(df_dne):,} registros")
print(f"\nüìä Distribui√ß√£o por UF:")
print(df_dne['uf'].value_counts().head(10))
print(f"\nüìã Colunas: {list(df_dne.columns)}")

## 5. Construir √çndices (EXECUTAR UMA VEZ) - ~5-10min na T4

In [None]:
# Inicializar servi√ßo de embeddings com FP16 (T4 optimization)
embedding_service = EmbeddingServiceGPU(
    model_name="sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2",
    use_fp16=True  # 2x speedup na T4
)

In [None]:
# Construir √≠ndices HNSW
# use_gpu_index=False: √≠ndice fica na CPU (economiza VRAM para embeddings)
# use_gpu_index=True: √≠ndice na GPU (busca 5-10x mais r√°pida, mas consome VRAM)
index_builder = IndexBuilderGPU(embedding_service, use_gpu_index=False)

# Par√¢metros balanceados para 1.5M registros:
# M=32: ~99.5% recall
# efSearch=32: balanceado entre velocidade e precis√£o
indices = index_builder.build_indices(
    df_dne,
    use_hnsw=True,
    M=32,
    efSearch=32
)

In [None]:
# SALVAR √≠ndices para n√£o precisar reconstruir
index_builder.save_indices('../data/indices_gpu_t4')

## 6. Carregar √çndices (R√ÅPIDO - use sempre) - ~5s

In [None]:
# Carregar modelo
embedding_service = EmbeddingServiceGPU(
    model_name="sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2",
    use_fp16=True
)

# Carregar √≠ndices salvos
index_builder = IndexBuilderGPU(embedding_service, use_gpu_index=False)
indices, df_dne = index_builder.load_indices('../data/indices_gpu_t4')

# Inicializar motor de busca
search_engine = SearchEngine(embedding_service, indices, df_dne)
print(f"üöÄ Sistema pronto! ({len(df_dne):,} endere√ßos indexados)")

## 7. Teste de Busca

In [None]:
# Exemplo de busca
query = {
    'logradouro': 'Rua das Flores',
    'bairro': 'Centro',
    'cidade': 'S√£o Paulo',
    'uf': 'SP',
    'cep': '01000-000'
}

print("‚è±Ô∏è  Testando performance da busca...\n")

# Medir tempo
start = time.time()
result = search_engine.search(query, top_k=5)
elapsed = time.time() - start

print(f"‚ö° Busca conclu√≠da em: {elapsed*1000:.1f}ms")
print(f"\nResultados:")
print(result)

## 8. Benchmark de Performance

In [None]:
# Testar m√∫ltiplas buscas
n_searches = 50
times = []

print(f"üî• Executando {n_searches} buscas para benchmark...\n")

for i in tqdm(range(n_searches), desc="Buscas"):
    sample = df_dne.sample(1).iloc[0]
    query = {
        'logradouro': sample['logradouro'],
        'bairro': sample['bairro'],
        'cidade': sample['cidade'],
        'uf': sample['uf']
    }
    
    start = time.time()
    result = search_engine.search(query, top_k=5)
    elapsed = time.time() - start
    times.append(elapsed)

times_ms = [t * 1000 for t in times]

print(f"\n{'='*60}")
print(f"üìä ESTAT√çSTICAS DE PERFORMANCE")
print(f"{'='*60}")
print(f"M√©dia:    {np.mean(times_ms):.1f}ms")
print(f"Mediana:  {np.median(times_ms):.1f}ms")
print(f"Min:      {np.min(times_ms):.1f}ms")
print(f"Max:      {np.max(times_ms):.1f}ms")
print(f"P95:      {np.percentile(times_ms, 95):.1f}ms")
print(f"P99:      {np.percentile(times_ms, 99):.1f}ms")
print(f"\n‚ö° Throughput: ~{1000/np.mean(times_ms):.0f} queries/segundo")
print(f"{'='*60}")

## üìù Notas de Otimiza√ß√£o para g4dn.2xlarge

### Performance esperada na T4:
| Opera√ß√£o | Tempo |
|----------|-------|
| Constru√ß√£o inicial (1.5M) | 5-10min |
| Carregamento √≠ndices | <10s |
| Busca (p50) | 30-50ms |
| Busca (p95) | 60-80ms |
| Throughput | ~500-1000 q/s |

### Ajustes finos:

**1. Modelo de embedding:**
```python
# R√ÅPIDO (atual)
"paraphrase-multilingual-MiniLM-L12-v2"  # 384 dims, batch 256

# MUITO R√ÅPIDO
"all-MiniLM-L6-v2"  # 384 dims, batch 256, ingl√™s mas OK

# PRECISO (mais lento)
"paraphrase-multilingual-mpnet-base-v2"  # 768 dims, batch 128
```

**2. Par√¢metros HNSW:**
```python
# Mais r√°pido (98% recall)
M=16, efSearch=16

# Balanceado (99.5% recall) ‚úÖ
M=32, efSearch=32

# Mais preciso (99.9% recall)
M=64, efSearch=64
```

**3. FAISS GPU (opcional):**
```bash
# Instalar faiss-gpu
pip uninstall faiss-cpu
pip install faiss-gpu
```
```python
# Usar √≠ndices na GPU (busca 5-10x mais r√°pida)
index_builder = IndexBuilderGPU(embedding_service, use_gpu_index=True)
```
‚ö†Ô∏è **Aten√ß√£o**: √çndices na GPU consomem VRAM (pode conflitar com embeddings)

### Custos AWS:
- **g4dn.2xlarge On-Demand**: ~$0.75/hora
- **g4dn.2xlarge Spot**: ~$0.25/hora (70% desconto)
- **Constru√ß√£o √∫nica**: $0.05 (5min spot)
- **Alternativa**: g4dn.xlarge ($0.53/h) se n√£o precisa de 32GB RAM

### Produ√ß√£o:
1. **Construir √≠ndices uma vez** (5-10min)
2. **Salvar em S3** ou volume persistente
3. **Carregar na inicializa√ß√£o** (<10s)
4. **API com FastAPI** + autoscaling
5. **Monitoramento**: CloudWatch + logs de lat√™ncia