# 079: RAG Fundamentals

# Introduction

In this notebook, we will explore the fundamentals of Retrieval-Augmented Generation (RAG), a powerful architecture that combines retrieval mechanisms with generative models. RAG is particularly useful for tasks that require precise information retrieval followed by generative responses, such as technical documentation search, failure analysis, and test parameter recommendations.

## Workflow Overview
The RAG architecture involves several key components:
- **Document Chunking**: Breaking down documents into manageable pieces.
- **Embedding Generation**: Creating vector representations of chunks.
- **Semantic Search**: Using vector embeddings to find relevant information.
- **Generative Response**: Using a language model to generate responses based on retrieved information.

```mermaid
graph TD;
    A[Input Query] -->|Embedding| B(Vector Database);
    B -->|Retrieve| C[Relevant Documents];
    C -->|Context| D[Generative Model];
    D --> E[Output]
```


# Mathematical Foundation

The core of RAG involves vector embeddings and semantic similarity. Here, we define the mathematical foundation for these concepts.

## Vector Embeddings
A vector embedding \( \mathbf{v} \in \mathbb{R}^d \) is a representation of a document or query in a high-dimensional space.

## Semantic Similarity
The similarity between two vectors \( \mathbf{v}_1 \) and \( \mathbf{v}_2 \) can be computed using cosine similarity:

\[
\text{similarity}(\mathbf{v}_1, \mathbf{v}_2) = \frac{\mathbf{v}_1 \cdot \mathbf{v}_2}{\|\mathbf{v}_1\| \|\mathbf{v}_2\|}
\]

This measures the cosine of the angle between two vectors, providing a measure of similarity.

In [None]:
# From Scratch Implementation: NumPy Embeddings
import numpy as np

# Example documents
documents = [
    "Technical documentation on Vdd specifications.",
    "Failure analysis report for recent tests.",
    "Design verification processes and protocols."
]

# Simple word embedding using character count vectorization
def simple_embedding(doc):
    vector = np.zeros(26)
    for char in doc.lower():
        if char.isalpha():
            vector[ord(char) - ord('a')] += 1
    return vector

# Create embeddings for all documents
document_embeddings = np.array([simple_embedding(doc) for doc in documents])
print("Document Embeddings:", document_embeddings)

# Production Implementation: Semantic Search with FAISS

For production systems, we use libraries like FAISS for efficient similarity search. FAISS is optimized for high-dimensional vector searches and is widely used in semantic search applications.

We will demonstrate how to index and query document embeddings using FAISS.

In [None]:
# FAISS Implementation
import faiss

# Convert document embeddings to float32
document_embeddings = document_embeddings.astype('float32')

# Create FAISS index
index = faiss.IndexFlatL2(document_embeddings.shape[1])
index.add(document_embeddings)

# Query embedding
query = "What are the Vdd specifications?"
query_embedding = simple_embedding(query).astype('float32').reshape(1, -1)

# Search
D, I = index.search(query_embedding, k=1)
print("Closest document index:", I)
print("Closest document:", documents[I[0][0]])

# Post-Silicon Validation Examples

## Technical Documentation Search
The RAG system can be used to search for technical specifications from datasheets, enabling engineers to quickly find relevant information based on test parameters.

## Failure Analysis Report Retrieval
Retrieve relevant failure analysis reports to diagnose issues during semiconductor testing.

## Test Parameter Recommendation Engine
Based on historical test data and outcomes, recommend optimal test parameters for new silicon wafers.

## Design Verification Query Assistant
Assist engineers in verifying design specifications against test results by retrieving pertinent documents and reports.

# General AI/ML Examples

Beyond post-silicon applications, RAG architectures can be utilized in various domains:
- **Customer Support**: Retrieve relevant FAQs and generate responses.
- **Legal Document Analysis**: Extract case law or statutes relevant to a legal query.
- **Academic Research**: Find related research papers and generate summaries.

# Evaluation & Diagnostics

To evaluate the performance of a RAG system, we consider metrics such as:
- **Precision and Recall**: Measure the accuracy of retrieved documents.
- **Response Coherence**: Evaluate the quality of generated responses.
- **Latency**: Assess the time taken to retrieve and generate responses.

Visualization tools such as confusion matrices and response timing charts can provide insights into system performance.

In [None]:
# Example Evaluation: Precision and Recall
from sklearn.metrics import precision_score, recall_score

# Assume ground truth and predictions
ground_truth = [1, 0, 1]
predictions = [1, 0, 1]

# Calculate precision and recall
precision = precision_score(ground_truth, predictions)
recall = recall_score(ground_truth, predictions)

print("Precision:", precision)
print("Recall:", recall)

# Real-World Projects

Here are several project ideas that leverage RAG architecture:

1. **Automated Datasheet Analyzer**: Develop a system to parse and retrieve specific parameters from extensive datasheets.

2. **Failure Analysis Assistant**: Create a tool that uses RAG to assist engineers in finding relevant failure reports and solutions.

3. **Test Optimization Tool**: Implement a RAG-based engine that recommends test parameters to optimize yield.

4. **Design Verification Helper**: Build a query assistant that verifies design parameters against test results.

5. **Legal Document Retrieval System**: Use RAG to enhance legal research by retrieving and summarizing relevant case laws.

6. **Academic Paper Finder**: Develop a system that locates and summarizes academic research papers based on user queries.

# Best Practices & Takeaways

- **Efficient Chunking**: Ensure documents are chunked in a way that preserves context without overwhelming the embedding model.
- **Embedding Quality**: Use high-quality embeddings to improve retrieval accuracy.
- **Performance Monitoring**: Continuously evaluate system performance using appropriate metrics.
- **Scalability**: Design the RAG system to handle increasing volumes of data and queries efficiently.

By following these best practices, you can build robust RAG systems that effectively combine retrieval and generation capabilities.