# Agentic AI with Retrieval-Augmented Generation (RAG) 

Imagine a user who has a question and submits a prompt to a sophisticated Retrieval Augmented Generation (RAG) system. The journey of this prompt begins with a small, specialized language model known as the router LLM. This clever little model takes on the crucial task of analyzing the prompt to determine whether it needs to dig deeper into a vector database for additional information. It’s like a traffic director at a busy intersection, deciding which route the prompt should take. If the router decides that the prompt can be answered without further information, it sends it directly to another LLM for response generation. However, if it determines that retrieval is necessary, the prompt is directed to the vector database.

Once the prompt reaches the vector database, it retrieves relevant documents or information that could help answer the user's question. This is akin to a librarian searching through a vast collection of books to find the most pertinent ones. After the retrieval, the next step involves an evaluator LLM, which assesses whether the gathered documents are sufficient to provide a complete answer. If the evaluator finds that more information is needed, it can request additional retrievals, ensuring that the user receives a well-informed response.

Once the evaluator is satisfied with the information collected, an augmented prompt is constructed, weaving together the original question and the retrieved data. This enriched prompt is then handed over to a larger LLM, which generates a draft response, much like a writer crafting a detailed answer based on thorough research. Finally, to ensure credibility and accuracy, a citation generation LLM steps in to add references to the response, providing the user with sources for further reading. 

This entire process showcases the beauty of an agentic workflow, where each step is handled by a specialized model, working together harmoniously to deliver a high-quality answer to the user’s query. It’s a seamless collaboration of intelligent systems, each playing its part to enhance the overall experience and effectiveness of the RAG system.

![image.png](attachment:image.png)
*Sreenshot from DeepLearning.ai's course on RAG*

## Router LLM

The router LLM is usually a small language model that analyzes the user's prompt to determine whether it requires additional information from a vector database. It acts as a decision-maker, directing the prompt either to the vector database for retrieval or directly to another LLM for response generation if no further information is needed. It can also be used to classify the type of query, such as whether it is a factual question, a request for opinion, or a task-oriented prompt. This classification helps in routing the prompt to the most appropriate LLM or retrieval mechanism.

## Evaluator LLM
The evaluator LLM is a specialized language model that assesses the relevance and sufficiency of the documents retrieved from the vector database in response to the user's prompt. Its primary role is to determine whether the gathered information adequately addresses the user's query or if further retrievals are necessary. The evaluator analyzes the content of the retrieved documents, checking for completeness, accuracy, and pertinence to the original question. If it finds that the information is lacking or insufficient, it can prompt additional searches in the vector database to gather more data. This ensures that the final response generated by the larger LLM is well-informed and comprehensive, ultimately enhancing the quality of the answer provided to the user.

## Citation Generation LLM
The citation generation LLM is a specialized language model that focuses on creating accurate and relevant citations for the information provided in the response generated by the larger LLM. Its main function is to ensure that the response is not only informative but also credible by attributing the information to appropriate sources. The citation generation LLM analyzes the content of the response and identifies key pieces of information that require sourcing. It then formats these sources into proper citations, adhering to specific citation styles (e.g., APA, MLA, Chicago) as needed. This process enhances the trustworthiness of the response, allowing users to verify the information and explore the original sources for further reading. By integrating citations, the citation generation LLM adds an essential layer of academic rigor and reliability to the overall RAG system.

## Orchestrator LLM
The orchestrator LLM is a central language model that coordinates the entire workflow of the RAG system. It manages the interactions between the router, evaluator, citation generation LLMs, and the larger LLM responsible for generating the final response. The orchestrator ensures that each component operates in harmony, passing prompts and information seamlessly from one stage to the next. It oversees the decision-making process, ensuring that the right models are engaged at the appropriate times based on the user's query and the information retrieved. The orchestrator LLM plays a crucial role in maintaining the efficiency and effectiveness of the RAG system, ultimately delivering a coherent and well-supported answer to the user.

By now we've entered a pretty complex territory of agentic AI, where multiple specialized LLMs work together to handle different aspects of the information retrieval and response generation process. This modular approach allows for greater flexibility, scalability, and the ability to fine-tune each component for specific tasks, ultimately leading to more accurate and reliable outcomes in RAG systems.

This approach also calls for more imagination and creativity as well as a deeper understanding of the capabilities and limitations of each LLM involved. It requires careful design and orchestration to ensure that the different components work together seamlessly, and that the overall system is robust and efficient.

It goes beyond the scope of this simple notebook, but I hope this gives you a good overview of how agentic AI can be applied in the context of RAG systems.