# Reverse Engineering Assistant
A Reverse Engineering Assistant leveraging Retrieval-Augmented Generation (RAG) and the LLaMA-3.1-8B-Instant Large Language Model (LLM). This tool is designed to revolutionize reverse engineering tasks by combining machine learning with retrieval-based systems.

## Origin of the RAG Architecture

Retrieval-Augmented Generation (RAG) is a powerful technique in natural language processing (NLP) that combines retrieval-based methods with generative models to produce more accurate and contextually relevant outputs. This approach was introduced in the 2020 paper "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks" by Facebook AI Research (FAIR).

For further reading and a deeper understanding of RAG, refer to the original paper by Facebook AI Research: [Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks](https://arxiv.org/pdf/2005.11401). 

## Overview of the RAG Architecture

The RAG model consists of three main components:
1. **Indexer**: This component creates an index of the corpus to facilitate efficient retrieval of relevant documents.
2. **Retriever**: This component retrieves relevant documents from the indexed corpus based on the input query.
3. **Generator**: This component generates responses conditioned on the retrieved documents.

## Mathematical Formulation

### Indexer

The indexer preprocesses the corpus $\mathcal{D}$ to create an index that maps queries to relevant documents. This index is used by the retriever for efficient document retrieval.

### Retriever

The retriever selects the top $k$ documents from the indexed corpus $\mathcal{D}$ based on their relevance to the input query $q$. The relevance of a document $d_i$ to a query $q$ is denoted as $s(q, d_i)$.

### Generator

The generator produces a response $r$ based on the input query $q$ and the retrieved documents $\{d_1, d_2, \ldots, d_k\}$. The probability of generating a response $r$ given the query $q$ and a document $d_i$ is denoted as $P(r \mid q, d_i)$.

### Combining Indexer, Retriever, and Generator

The final probability of generating a response $r$ given the query $q$ is obtained by marginalizing over the top $k$ retrieved documents:

$$
P(r \mid q) = \sum_{i=1}^{k} P(d_i \mid q) P(r \mid q, d_i)
$$

Here, $P(d_i \mid q)$ is the normalized relevance score of document $d_i$ given the query $q$, and $P(r \mid q, d_i)$ is the probability of generating response $r$ given the query $q$ and document $d_i$.

## Implementation Details

### Training

The RAG model is trained in three stages:
1. **Indexer Training**: The indexer is trained to create an efficient and accurate mapping of queries to documents.
2. **Retriever Training**: The retriever is trained to maximize the relevance score $s(q, d_i)$ for relevant documents.
3. **Generator Training**: The generator is trained to maximize the probability $P(r \mid q, d_i)$ for the ground-truth responses.

### Inference

During inference, the RAG model follows these steps:
1. **Indexing**: The corpus is indexed to facilitate efficient retrieval.
2. **Retrieval**: The top $k$ documents are retrieved for a given query based on their relevance scores.
3. **Generation**: A response is generated conditioned on the input query and the retrieved documents. The final response is obtained by marginalizing over the retrieved documents as described above.

## Conclusion

RAG leverages the strengths of indexing, retrieval-based, and generation-based models to produce more accurate and informative responses. By conditioning the generation on retrieved documents, RAG can incorporate external knowledge from large corpora, leading to better performance on various tasks.

The combination of indexer, retriever, and generator in the RAG model makes it a powerful approach for tasks that require access to external knowledge and the ability to generate coherent and contextually appropriate responses.

### Install Conda Environment
1. To select a Conda environment in Visual Studio Code, press the play button in the next cell which will open up a command prompt then select `Python Environments...`
2. A new command prompt will pop up and select `+ Create Python Environment`.
3. A new command prompt will again pop up and select `Conda Creates a .conda Conda environment in the current workspace`.
4. A new command prompt will again pop up and select `* Python 3.11`.

In [1]:
!conda create -n rea python=3.11 -y

Channels:
 - defaults
Platform: osx-arm64
Collecting package metadata (repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /opt/anaconda3/envs/rea

  added / updated specs:
    - python=3.11


The following NEW packages will be INSTALLED:

  bzip2              pkgs/main/osx-arm64::bzip2-1.0.8-h80987f9_6 
  ca-certificates    pkgs/main/osx-arm64::ca-certificates-2024.7.2-hca03da5_0 
  libffi             pkgs/main/osx-arm64::libffi-3.4.4-hca03da5_1 
  ncurses            pkgs/main/osx-arm64::ncurses-6.4-h313beb8_0 
  openssl            pkgs/main/osx-arm64::openssl-3.0.14-h80987f9_0 
  pip                pkgs/main/osx-arm64::pip-24.0-py311hca03da5_0 
  python             pkgs/main/osx-arm64::python-3.11.9-hb885b13_0 
  readline           pkgs/main/osx-arm64::readline-8.2-h1a28f6b_0 
  setuptools         pkgs/main/osx-arm64::setuptools-69.5.1-py311hca03da5_0 
  sqlite             pkgs/main/osx-arm64::sqlite-3.45.3-h80987f9_0 
  tk                 pkgs

### !!! ACTION ITEM !!!
In order for the Conda environment to be available, you need to close down VSCode and reload it and select `rea` in the Kernel area in the top-right of VSCode.
1. In the VSCode pop-up command window select `Select Another Kernel...`.
2. In the next command window select `Python Environments...`.
3. In the next command window select `rea (Python 3.11.9)`.

### Install Packages

In [1]:
%pip install ipywidgets
%pip install llama-index
%pip install llama-index-embeddings-huggingface
%pip install llama-index-llms-groq
%pip install groq
%pip install gradio

Collecting ipywidgets
  Using cached ipywidgets-8.1.3-py3-none-any.whl.metadata (2.4 kB)
Collecting widgetsnbextension~=4.0.11 (from ipywidgets)
  Using cached widgetsnbextension-4.0.11-py3-none-any.whl.metadata (1.6 kB)
Collecting jupyterlab-widgets~=3.0.11 (from ipywidgets)
  Using cached jupyterlab_widgets-3.0.11-py3-none-any.whl.metadata (4.1 kB)
Using cached ipywidgets-8.1.3-py3-none-any.whl (139 kB)
Using cached jupyterlab_widgets-3.0.11-py3-none-any.whl (214 kB)
Using cached widgetsnbextension-4.0.11-py3-none-any.whl (2.3 MB)
Installing collected packages: widgetsnbextension, jupyterlab-widgets, ipywidgets
Successfully installed ipywidgets-8.1.3 jupyterlab-widgets-3.0.11 widgetsnbextension-4.0.11
Note: you may need to restart the kernel to use updated packages.
Collecting llama-index
  Using cached llama_index-0.10.59-py3-none-any.whl.metadata (11 kB)
Collecting llama-index-agent-openai<0.3.0,>=0.1.4 (from llama-index)
  Using cached llama_index_agent_openai-0.2.9-py3-none-any.w

### Import Libraries

In [1]:
import os
from llama_index.core import (
    Settings,
    VectorStoreIndex,
    SimpleDirectoryReader,
    StorageContext,
    load_index_from_storage
)
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core.node_parser import SentenceSplitter
from llama_index.llms.groq import Groq
import gradio as gr

### !!! ACTION ITEM !!!
Visit https://console.groq.com/keys and set up an API Key then replace `<GROQ_API_KEY>` below with the newly generated key.

In [23]:
os.environ["GROQ_API_KEY"] = "<GROQ_API_KEY>"
GROQ_API_KEY = os.getenv("GROQ_API_KEY")

### Disable Tokenizer Parallelism Globally

In [24]:
os.environ["TOKENIZERS_PARALLELISM"] = "false"

### Data Ingestion

In [8]:
reader = SimpleDirectoryReader(input_files=["files/reversing-for-everyone.pdf"])
documents = reader.load_data()

### Chunking

In [9]:
text_splitter = SentenceSplitter(chunk_size=1024, chunk_overlap=200)
nodes = text_splitter.get_nodes_from_documents(documents)

### Embedding Model

In [10]:
embed_model = HuggingFaceEmbedding(model_name="sentence-transformers/all-MiniLM-L6-v2")

### Define LLM Model

In [11]:
llm = Groq(model="llama-3.1-8b-instant", api_key=GROQ_API_KEY)

### Configure Service Context

In [12]:
Settings.embed_model = embed_model
Settings.llm = llm

### Create Vector Store Index

In [14]:
print("VectorStoreIndex initialization")
vector_index = VectorStoreIndex.from_documents(
    documents,
    show_progress=True,
    node_parser=nodes
)

VectorStoreIndex initialization


Parsing nodes:   0%|          | 0/430 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/430 [00:00<?, ?it/s]

#### Persist/Save Index

In [15]:
vector_index.storage_context.persist(persist_dir="./storage_mini")

#### Define Storage Context

In [16]:
storage_context = StorageContext.from_defaults(persist_dir="./storage_mini")

#### Load Index

In [18]:
index = load_index_from_storage(storage_context)

### Define Query Engine

In [19]:
query_engine = index.as_query_engine()

#### Feed in user query

In [25]:
def query_function(query):
    """
    Processes a query using the query engine and returns the response.

    Args:
        query (str): The query string to be processed by the query engine.

    Returns:
        str: The response generated by the query engine based on the input query.

    Example:
        >>> query_function("What is Reverse Engineering?")
        'Reverse engineering is the process of deconstructing an object to understand its design, architecture, and functionality.'
    """
    response = query_engine.query(query)
    return response


iface = gr.Interface(
    fn=query_function,
    inputs=gr.Textbox(label="Query"),
    outputs=gr.Textbox(label="Response")
)

iface.launch()

Running on local URL:  http://127.0.0.1:7861

To create a public link, set `share=True` in `launch()`.


