## Indexing
Indexing is the process of organizing and structuring data to enable efficient retrieval in search and retrieval-augmented generation (RAG) systems. The right indexing technique impacts query speed, relevance, and scalability.

Need for Different Indexing Techniques  
- Data types vary (structured, unstructured, vector-based).
- Search requirements differ (full-text search, semantic retrieval, hybrid search).
- Scalability & performance are critical for large datasets.
- Storage efficiency matters when handling vast knowledge bases.

### Indexing Techniques

#### Inverted Index - Full-text search (BM25, ElasticSearch)
An inverted index maps each word (token) to the documents that contain it.

How it Works:
- Tokenize documents into words.
- Store each unique word as a key in a dictionary.
- Associate each word with a list of document IDs where it appears.

Optimizations:
- BM25 ranking improves relevance.
- Stop-word removal reduces noise.
- Stemming/Lemmatization merges variations of the same word.

#### B-Tree Index - Range queries (Database queries, time-based search)
A B-Tree index is a self-balancing tree that helps databases quickly locate values.

How it Works:
- Each node contains multiple keys and pointers to child nodes.
- Lookup operations traverse the tree logarithmically (O(log N)).
- Efficient for sorted and range-based queries.
  
Optimizations:
- Use multi-column indexing for composite queries.
- Ensure indexing on frequently searched columns.

#### Bitmap Index - Categorical data (Low-cardinality fields)
Bitmap indexes store binary representations for categorical fields.

How it Works:
- Each unique value gets a bitmap (binary vector).
- Querying is fast via bitwise operations (AND, OR, XOR).

Optimizations:
- Use for low-cardinality columns (e.g., gender, status).
- Avoid for high-cardinality data (inefficient storage).

#### Hash Index - Exact match queries (NoSQL, key-value store)
A hash index uses a hash table for fast lookups.

How it Works:
- A hash function converts a value into a fixed-size key.
- Lookup is O(1) (constant time).

Optimizations:
- Avoid for range queries (not ordered).
- Use consistent hashing in distributed systems.


#### Vector Index (FAISS, HNSW) - Semantic search (RAG, vector search, Image retrieval)
A vector index stores high-dimensional embeddings for similarity search.

How it Works:
- Convert text into vector embeddings (e.g., OpenAI embeddings, SBERT).
- Store embeddings in a vector database (e.g., FAISS, Pinecone).
- Use approximate nearest neighbor (ANN) search for retrieval.

Optimizations:
- Use HNSW (Hierarchical Navigable Small World) for faster retrieval.
- Use IVFFlat or PQ (Product Quantization) for storage efficiency.
- Combine with BM25 (hybrid search) for best results.


#### Hybrid Index - Optimized search in LLM-based applications
A hybrid index combines keyword-based retrieval (BM25) with semantic search (FAISS).

How it Works:
- Use BM25 for keyword search.
- Use FAISS for vector search.
- Combine results using a ranking strategy.

Optimizations:
- Hybrid re-ranking models (e.g., Cohere Rerank, BERT) improve results.
- Adjust weightage of keyword vs. semantic scores.

### Performance optimization strategies

#### Preprocessing for Efficient Indexing
- Tokenization, stop-word removal → for better full-text search.
- Embedding dimensionality reduction (PCA, Quantization) → for vector search.

#### Storage Optimization
- Shard large indexes across multiple nodes for scalability.
- Compress vector embeddings using Product Quantization (PQ).

#### Faster Query Execution
- Parallelized ANN search (e.g., FAISS GPU acceleration).
- Using partitioning techniques (e.g., FAISS IVF).
- Caching frequently accessed embeddings.

#### Adaptive Index Selection
- Use multi-index strategies (e.g., BM25 + FAISS).
- Dynamically switch between keyword & semantic search based on query type.


## Retrieval
Retrieval is the process of finding relevant information from a knowledge base in a Retrieval-Augmented Generation (RAG) system. The quality of retrieval directly impacts the accuracy and relevance of LLM responses.

### Retrieval Strategies

#### Lexical Retrieval - Keyword based (FAQs, Documents)
Matches exact words in queries using an inverted index.

How it Works:
- Uses inverted index (BM25, TF-IDF) to find documents containing the query terms.
- Ranks documents based on term frequency and relevance scoring.

Techniques:
- TF-IDF (Term Frequency-Inverse Document Frequency): Weighs terms based on their importance.
- BM25: Improved TF-IDF ranking with term saturation.
- Keyword Expansion: Adds synonyms to enhance matching.

> Optimizations: Stop-word removal, stemming, query expansion

✅ Pros:  
✔️ Fast, efficient for structured text.  
✔️ Easy to implement with tools like Elasticsearch, Whoosh.  

❌ Cons:  
❌ Cannot handle synonyms, paraphrasing.  
❌ Does not understand context or meaning.  

#### Semantic search - Embedding based (Chatbots, Q&A systems)
Uses vector search to find conceptually similar results based on embeddings.

How it Works:
- Converts queries and documents into vector embeddings using models like SBERT, OpenAI embeddings.
- Finds the nearest neighbors in a vector database (FAISS, Pinecone).

Techniques:
- Approximate Nearest Neighbor (ANN) Search: Finds the top-K most similar results.
- Dense Vector Retrieval: Uses deep learning models to generate embeddings.
- HNSW (Hierarchical Navigable Small World): Fast graph-based retrieval.

> Optimizations: Reduce vector dimensions (PCA), use HNSW index

✅ Pros:  
✔️ Captures semantic meaning, synonyms, and context.  
✔️ Works well for longer, unstructured queries.  

❌ Cons:  
❌ More compute-intensive than lexical search.  
❌ May retrieve irrelevant results without fine-tuning.  

#### Hybrid Retrieval - Lexical + Semantic (High-accuracy search with LLM-based applications)
Combines keyword search with semantic similarity ranking.

How it Works:
- BM25 retrieves relevant documents based on keywords.
- Vector search finds semantically similar documents.
- Re-ranking models (e.g., Cohere Rerank, BERT) improve result ranking.

Techniques:
- Weighted Fusion: Assigns scores to both lexical and semantic results.
- Re-ranking with Transformer Models: Uses LLMs to reorder retrieved documents.

> Optimizations: Use a weighted scoring model for merging results

✅ Pros:  
✔️ Best of both worlds (keyword + meaning).  
✔️ More accurate than pure keyword-based retrieval.  

❌ Cons:  
❌ More complex to implement and optimize.  
❌ Requires fine-tuning retrieval weights.  

> Performance scaling: Sharding + Distributed retrieval (ElasticSearch, Pinecone)