# Reverse Engineering Assistant
Reverse Engineering Assistant utilizing RAG and an LLM.

Retrieval-Augmented Generation (RAG) is a powerful technique in natural language processing (NLP) that combines retrieval-based methods with generative models to produce more accurate and contextually relevant outputs. This approach was introduced in the 2020 paper "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks" by Facebook AI Research (FAIR).

For further reading and a deeper understanding of RAG, refer to the original paper by Facebook AI Research: [Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks](https://arxiv.org/pdf/2005.11401). 

## Overview of the RAG Architecture

The RAG model consists of two main components:
1. **Retriever**: This component retrieves relevant documents from a large corpus based on the input query.
2. **Generator**: This component generates responses conditioned on the retrieved documents.

## Mathematical Formulation

### Retriever

The retriever selects the top $k$ documents from a corpus $\mathcal{D}$ based on their relevance to the input query $q$.

### Generator

The generator produces a response $r$ based on the input query $q$ and the retrieved documents $\{d_1, d_2, \ldots, d_k\}$.

### Combining Retriever and Generator

The final probability of generating a response $r$ given the query $q$ is obtained by marginalizing over the top $k$ retrieved documents:

$$
P(r \mid q) = \sum_{i=1}^{k} P(d_i \mid q) P(r \mid q, d_i)
$$

Here, $P(d_i \mid q)$ is the normalized relevance score of document $d_i$ given the query $q$, and $P(r \mid q, d_i)$ is the probability of generating response $r$ given the query $q$ and document $d_i$.

## Implementation Details

### Training

The RAG model is trained in two stages:
1. **Retriever Training**: The retriever is trained to maximize the relevance score $s(q, d_i)$ for relevant documents.
2. **Generator Training**: The generator is trained to maximize the probability $P(r \mid q, d_i)$ for the ground-truth responses.

### Inference

During inference, the RAG model retrieves the top $k$ documents for a given query and generates a response conditioned on these documents. The final response is obtained by marginalizing over the retrieved documents as described above.

## Conclusion

RAG leverages the strengths of both retrieval-based and generation-based models to produce more accurate and informative responses. By conditioning the generation on retrieved documents, RAG can incorporate external knowledge from large corpora, leading to better performance on various tasks.

The combination of retriever and generator in the RAG model makes it a powerful approach for tasks that require access to external knowledge and the ability to generate coherent and contextually appropriate responses.

### Install Conda Environment

In [1]:
!conda create -n rea python=3.11 -y

### Close & Reload VSCode
In order for the Conda environment to be available, you need to close down VSCode and reload it and select `rea` in the Kernel area in the top-right of VSCode.
1. In the VSCode pop-up command window select `Select Another Kernel...`.
2. In the next command window select `Python Environments...`.
3. In the next command window select `rea (Python 3.11.9)`.

### Install Packages

In [2]:
%pip install 
%pip install ipywidgets 
%pip install llama-index 
%pip install llama-index-embeddings-huggingface 
%pip install llama-index-llms-groq 
%pip install groq 
%pip install gradio

### Import Libraries

In [12]:
import os
from llama_index.core import (
    Settings,
    VectorStoreIndex,
    SimpleDirectoryReader,
    StorageContext,
    ServiceContext,
    load_index_from_storage
)
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core.node_parser import SentenceSplitter
from llama_index.llms.groq import Groq
from transformers import AutoModelForCausalLM, AutoTokenizer
import gradio as gr

### ACTION ITEM
Visit https://console.groq.com/keys and set up an API Key then replace `<GROQ_API_KEY>` below with the newly generated key.

In [13]:
# os.environ["GROQ_API_KEY"] = "<GROQ_API_KEY>"
os.environ["GROQ_API_KEY"] = "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"
GROQ_API_KEY = os.getenv("GROQ_API_KEY")

### Data Ingestion

In [14]:
reader = SimpleDirectoryReader(input_files=["files/reversing-for-everyone.pdf"])
documents = reader.load_data()

### Chunking

In [15]:
text_splitter = SentenceSplitter(chunk_size=1024, chunk_overlap=200)
nodes = text_splitter.get_nodes_from_documents(documents)

### Embedding Model

In [16]:
embed_model = HuggingFaceEmbedding(model_name="sentence-transformers/all-MiniLM-L6-v2")

### Define LLM Model

In [17]:
llm = Groq(model="llama-3.1-8b-instant", api_key=GROQ_API_KEY)

### Configure Service Context

In [24]:
Settings.embed_model = embed_model
Settings.llm = llm

### Create Vector Store Index

In [25]:
# Debug VectorStoreIndex
print("VectorStoreIndex initialization")
vector_index = VectorStoreIndex.from_documents(
    documents, 
    show_progress=True, 
    service_context=service_context, 
    node_parser=nodes
)

VectorStoreIndex initialization


Parsing nodes:   0%|          | 0/430 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/430 [00:00<?, ?it/s]

#### Persist/Save Index

In [26]:
vector_index.storage_context.persist(persist_dir="./storage_mini")

#### Define Storage Context

In [27]:
storage_context = StorageContext.from_defaults(persist_dir="./storage_mini")

#### Load Index

In [28]:
index = load_index_from_storage(storage_context, service_context=service_context)

### Define Query Engine

In [29]:
query_engine = index.as_query_engine(service_context=service_context)

#### Feed in user query

In [30]:
def query_function(query):
    """
    Processes a query using the query engine and returns the response.

    Args:
        query (str): The query string to be processed by the query engine.

    Returns:
        str: The response generated by the query engine based on the input query.
        
    Example:
        >>> query_function("What is Reverse Engineering?")
        'Reverse engineering is the process of deconstructing an object to understand its design, architecture, and functionality.'
    """
    response = query_engine.query(query)
    return response


iface = gr.Interface(
    fn=query_function,
    inputs=gr.Textbox(label="Query"),
    outputs=gr.Textbox(label="Response")
)

iface.launch()

Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.


