<p style="color:#153462; 
          font-weight: bold; 
          font-size: 30px; 
          font-family: Gill Sans, sans-serif;
          text-align: center;">
          Retrieval - Augmeted Generation (RAG)</p>

<p style="text-align: justify; text-justify: inter-word; font-size:17px;">
    RAG has two main components:
    <ul style="text-align: justify; text-justify: inter-word; font-size:17px;">
        <li><b>Retriever</b>: Identifies and retrieves relavant documents</li>
        <li><b>Generator</b>: Takes retrieved docs and the input query to generate coherent and contextually relevant response</li>
    </ul>
</p>

### <span style="color:#C738BD; font-weight: bold;">Definition</span>

<p style="text-align: justify; text-justify: inter-word; font-size:17px;">
    A framework that combines the strengths of retrieved-based systems and generation based models to produce
    more accurate and contextual relevant response.
</p>

<img src="images\rag-architecture.png" alt="rag-architecture" style="width: 600px;"/>

<p style="text-align: justify; 
          text-justify: inter-word;
          font-size:17px;">
    Let's break down the RAG (Retrieval Augmented Generation) architecture step-by-step. <br>
    <ol style="text-align: justify; 
          text-justify: inter-word;
          font-size:17px;">
        <li>
            Indexing
            <ul>
                <li>
                    <b>Documents:</b> The process begins with a collection of documents (text, code, etc.)
                    that you want to make searchable and useable.
                </li>
                <li>
                    <b>Parsed Files:</b> These documents are parsed into smaller chunks of data. 
                    This is done for efficient indexing and processing.
                </li>
                <li>
                    <b>Embedding Model:</b> Each chunk of data is passed through an embedding model. This model
                    transforms the text into numerical vectors (embeddings). These vectors represent the semantic
                    meaning of the text.
                </li>
                <li>
                    <b>Vectorization:</b> The embeddings are then stored in a vector store. This is a specialized
                    database designed to efficiently store and search high-dimensional vectors.
                </li>
            </ul>
        </li>
        <li>
            Query
            <ul>
                <li><b>User Query:</b> A user submits a query (a question or prompt) to the system.</li>
            </ul>
        </li>
        <li>
            Query Processing
            <ul>
                <li> 
                    <b>Embedding Model:</b> The user query is also passed through the same embedding model as
                the documents, generating a query vector.
                </li>
                <li>
                    <b>Vectorization:</b> This query vector is then used to search the vector store.
                </li>
            </ul>
        </li>
        <li>
            Retrieval
            <ul>
                <b>Search:</b> The system uses similarity search algorithms to find the chunks of data in the vector store that are most similar to the query vector. These are the most relevant chunks of data to the user's query.
            </ul>
        </li>
        <li>
            Augmentation
            <ul>
                <li>                
                    <b>Prompt:</b>The retrieved chunks of data are used to augment the user's original query.
                    This means that the original query is combined with the relevant information from the retrieved
                    chunks to create a more informative and contextually relevant prompt.
                </li>
                <li>
                     <b>Related Docs:</b> The system can also provide the user with the actual documents from which
                    the relevant chunks were extracted. This can help the user to explore the context further.
                </li>
            </ul>
        </li>
        <li>
            Generation
            <ul>
                <li>
                    <b>Gen. LLM:</b> The augmented prompt is fed into a generative language model (LLM). This model 
                    is trained on a massive amount of text data and is capable of generating human-quality text.
                </li>
                <li>
                    <b>Response:</b>The LLM generates a response based on the augmented prompt. This response is
                    the final output of the RAG system.
                </li>
            </ul>
        </li>
    </ol>
</p>