In [None]:
%%capture
!pip install langchain openai
!pip install -q -U faiss-cpu tiktoken

In [None]:
import os
import getpass

os.environ["OPENAI_API_KEY"] = getpass.getpass("Open AI API Key:")

Open AI API Key:··········


# Retrieval Augmented Generation (RAG)

[Meta AI introduced the RAG method](https://ai.meta.com/blog/retrieval-augmented-generation-streamlining-the-creation-of-intelligent-natural-language-processing-models/), emphasizing its potential for knowledge-intensive tasks.

## 🔍 **1. What is RAG?**

- 🤖 RAG, or Retrieval Augmented Generation, boosts large language models (LLMs) by tapping into external knowledge sources.

- 🚀 Meta AI pioneered RAG to tackle knowledge-heavy tasks efficiently.

- 💡 Combines info retrieval with text generation, enabling LLMs to access fresh, reliable info.

- 🎯 Ideal for tasks needing accurate, current data.

## 🤔 **2. Why RAG was developed?**

- 📝 LLMs excel in mimicking human text but face limitations.

- 💸 High training/fine-tuning costs.

- 📚 Knowledge is static, outdated post-training.

- 🌌 "Hallucinating" issue: confidently giving wrong info.

- 🌐 RAG overcomes these by merging LLM prowess with real-time data access.

## 3. 🛠️ **3. How RAG Works?**

- 🔎 **Retrieval Component:** On receiving a query (like a question), RAG fetches relvant documents/passages from external sources (like Wikipedia).

- ✍️ **Generation Component:** Blends these retrieved docs with the query to create an enriched context. This is then processed by a text generator (e.g., GPT-3) to generate the final answer.

<img src="https://docs.aws.amazon.com/images/sagemaker/latest/dg/images/jumpstart/jumpstart-fm-rag.jpg">

## 🌟 **4. Key Features of RAG:**

- 🔄 **Dynamic Knowledge Access:** RAG stays current, accessing the latest info, unlike static-knowledge LLMs.

- 💸 **Cost Efficiency:** Integrates fresh info without the high cost of retraining the whole LLM.

- ✔️ **Accuracy and Reliability:** Sources reliable info, reducing wrong answers or "hallucinations."

## 🛠 **5. Practical Implementations:**

- ❓ Answering evolving topic questions.

- 🔬 Useful in domains needing real-time accuracy (e.g., medical, legal).

- 🤖 Boosts chatbots/virtual assistants with factual, updated replies.

## 📌 **In Summary:**

- RAG: Marrying vast LLM knowledge with the latest real-world info.

- Ensures models are not just knowledgeable, but also up-to-date and accurate.

# 🛠️ **RAG Implementation in LangChain: Understanding the Tools**

1. 🧠 **LLM**: The brain of the system, generating human-like text.

2. 🌐 **Vector Store**: The heart of retrieval - stores text embeddings for quick, efficient access.

3. 🔍 **Vector Store Retriever**: The system's "search engine," finding relevant documents via vector similarities.

4. 🔄 **Embedder**: Transforms text into vectors, making it readable for the system.

5. 💬 **Prompt**: Captures the initial user query or statement, kicking off the process.

6. 📚 **Document Loader**: Manages the import and preparation of documents for processing.

7. 🧩 **Document Chunker**: Breaks down large documents into smaller segments for better efficiency.

8. 👤 **User Input**: The starting point, where the user's query activates the RAG workflow.


# 🌐 **The RAG System and Its Subsystems**

1. 🗂️ **Index Subsystem**:
   - **Components**: Embedder, Vector Store, Document Loader, Document Chunker.
   - **Function**: Processes and organizes data into an accessible format.
   - **Role**: Creates a searchable database of vectorized information.

2. 🔎 **Retrieval Subsystem**:
   - **Components**: User Input, Prompt, Vector Store Retriever.
   - **Function**: Matches user queries with relevant data.
   - **Role**: Fetches the most pertinent information from the index based on user input.

3. 🤖 **Augment Subsystem**:
   - **Components**: LLM, User Input, Retrieved Data.
   - **Function**: Integrates user queries with retrieved data.
   - **Role**: Generates accurate and context-rich responses, blending human-like text generation with factually correct information.

Together, these subsystems form a seamless flow, transforming user queries into comprehensive and reliable responses.

# Load documents
There are SO MANY document loaders in LangChain

I won't go every single one in this notebook. But, you can check out [the documentation](https://github.com/langchain-ai/langchain/tree/master/libs/langchain/langchain/document_loaders) to see jusy how many are available to you.

## 📂 **Understanding Document Loaders in LangChain**

- 📚 LangChain document loaders load data from various sources into Document objects.

- 📄 A Document is text with metadata.

- 🌐 Loaders fetch data from text files, web pages, video transcripts, etc.

- 🔄 Main role: Retrieve data for further processing.

- 🛠️ Method: Use `load` to fetch data and return it as a Document.

- 🧠 Some loaders support lazy loading (data loads into memory only when needed).

## 🔧 **How to Use Document Loaders**

1. 📥 Import the loader class from `langchain.document_loaders`.

2. 🏗️ Create an instance of your chosen class with the directory path.

3. 🚀 Use `load()` to load files in the directory into Document format.


In [None]:
from langchain.document_loaders import WebBaseLoader

yolo_nas_loader = WebBaseLoader("https://deci.ai/blog/yolo-nas-object-detection-foundation-model/").load()

decicoder_loader = WebBaseLoader("https://deci.ai/blog/decicoder-efficient-and-accurate-code-generation-llmx").load()

yolo_newsletter_loader = WebBaseLoader("https://deeplearningdaily.substack.com/p/unleashing-the-power-of-yolo-nas").load()

# Chunk documents

🔢 **Exploring Text Splitters in LangChain**

- 📖 Text splitters divide long texts into
smaller, meaningful parts.

- 🧩 Aim: Make large texts easier to handle for analysis or processing.

### How Text Splitters Work:

1. ✂️ Split text into small, meaningful chunks (like sentences).

2. 📏 Combine these chunks into a larger one until a certain size is reached.

3. 📌 Once the size is reached, start a new chunk with some overlap for context.

### Customization Axes:

1. 🛠️ How the text is split.

2. 📐 How chunk size is measured.

## Getting Started with Text Splitters

- 🚀 Default choice: `RecursiveCharacterTextSplitter`.

- 📋 Works by: Splitting text based on a list of characters.

- 🔄 If chunks are too large, it moves to the next character.

- 📌 Default split characters: `["\n\n", "\n", " ", ""]`.

### Additional Controls:

- 📏 `length_function`: Defines how chunk length is calculated (default: character count, token counter is common).

- 🔍 `chunk_size`: Sets the maximum chunk size.

- 🔀 `chunk_overlap`: Determines overlap between chunks for continuity.

- 📊 `add_start_index`: Option to include each chunk's start position in the original document in metadata.

In [None]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 500,
    chunk_overlap = 50,
    length_function = len
)

yolo_nas_chunks = text_splitter.transform_documents(yolo_nas_loader)

decicoder_chunks = text_splitter.transform_documents(decicoder_loader)

yolo_newsletter_chunks = text_splitter.transform_documents(yolo_newsletter_loader)

# Index System

- 🎯 **Purpose:** Efficiently organize data for easy retrieval.

### Steps in the Index System:
1. 📚 **Load Documents (Document Loader):**
   - Import and read large amounts of data.
2. 🧩 **Chunk Documents (Document Chunker):**
   - Break down documents into smaller parts for better handling.
3. 🌐 **Embed Documents (Embedder):**
   - Convert text chunks into vector formats for searchability.
4. 💾 **Store Embeddings (Vector Store):**
   - Keep embeddings and their textual counterparts for retrieval.

In [None]:
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.embeddings import CacheBackedEmbeddings
from langchain.vectorstores import FAISS
from langchain.storage import LocalFileStore

store = LocalFileStore("./cachce/")

# create an embedder
core_embeddings_model = OpenAIEmbeddings()

embedder = CacheBackedEmbeddings.from_bytes_store(
    core_embeddings_model,
    store,
    namespace = core_embeddings_model.model
)

# store embeddings in vector store
vectorstore = FAISS.from_documents(yolo_nas_chunks, embedder)

vectorstore.add_documents(decicoder_chunks)

vectorstore.add_documents(yolo_newsletter_chunks)

# instantiate a retriever
retriever = vectorstore.as_retriever()

# 🔍 **Understanding the Retrieval System in Information Access**

- 🎯 **Purpose:** Fetch relevant information based on user queries.

### Steps in the Retrieval System:

1. 💬 **Obtain User Query (User Input):**
   - Capture the user's question or statement.

2. 🔄 **Embed User Query (Embedder):**
   - Convert the user's query into a vector format, aligning with indexed documents.

3. 🔍 **Vector Search (Vector Store Retriever):**
   - Search for document embeddings in the Vector Store that closely match the user query.

4. 📄 **Return Relevant Documents:**
   - Provide the top matching documents, ensuring pertinence to the query.

In [None]:
from langchain.llms.openai import OpenAIChat
from langchain.chains import RetrievalQA
from langchain.callbacks import StdOutCallbackHandler

In [None]:
llm = OpenAIChat()
handler =  StdOutCallbackHandler()



In [None]:
# this is the entire retrieval system
qa_with_sources_chain = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=retriever,
    callbacks=[handler],
    return_source_documents=True,
    verbose=True
)

# Augment System

- 🎯 **Purpose:** Improve the LLM input prompt with relevant context for better responses.

### Steps in the Augment System:
1. 📝 **Create Initial Prompt (Prompt):**
   - Begin with the user's original query or statement.
2. ✨ **Augment Prompt with Retrieved Context:**
   - Combine the initial prompt with context from the Vector Store for a richer input.
3. 📤 **Send Augmented Prompt to LLM:**
   - Forward the enhanced prompt to the LLM for processing.
4. 📥 **Receive LLM's Response:**
   - Get the comprehensive response generated by the LLM.

In [None]:
# This is the entire augment system!
response = qa_with_sources_chain({"query":"What does Neural Architecture Search have to do with how Deci creates its models?"})



[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


Look at the entire response  

In [None]:
response

{'query': 'What does Neural Architecture Search have to do with how Deci creates its models?',
 'result': 'Deci uses Neural Architecture Search (NAS) technology, specifically AutoNAC, to create efficient and effective neural network architectures for its models. AutoNAC intelligently searches a large space of possible architectures and zeroes in on the most promising ones. This technology allows Deci to automate the development of superior neural networks and optimize the accuracy and speed of its models.',
 'source_documents': [Document(page_content='Neural Architecture Search is define the architecture search space. For YOLO-NAS, our researchers took inspiration from the basic blocks of YOLOv6 and YOLOv8. With the architecture and training regime in place, our researchers harnessed the power of AutoNAC. It intelligently searched a vast space of ~10^14 possible architectures, ultimately zeroing in on three final networks that promised outstanding results. The result is a family of arc

If you want just the response

In [None]:
print(response['result'])

Deci utilizes Neural Architecture Search (NAS) technology, specifically their proprietary AutoNAC technology, to automatically generate and optimize the architecture of their models. Neural Architecture Search helps Deci in efficiently constructing deep learning models for various tasks and hardware.


And you can get the source like so:

In [None]:
print(response['source_documents'])

[Document(page_content='Neural Architecture Search is define the architecture search space. For YOLO-NAS, our researchers took inspiration from the basic blocks of YOLOv6 and YOLOv8. With the architecture and training regime in place, our researchers harnessed the power of AutoNAC. It intelligently searched a vast space of ~10^14 possible architectures, ultimately zeroing in on three final networks that promised outstanding results. The result is a family of architectures with a novel quantization-friendly basic', metadata={'source': 'https://deeplearningdaily.substack.com/p/unleashing-the-power-of-yolo-nas', 'title': 'Unleashing the Power of YOLO-NAS: A New Era in Object Detection and Computer Vision', 'description': 'The Future of Computer Vision is Here', 'language': 'en'}), Document(page_content='Deci’s suite of Large Language Models and text-to-Image models, with DeciCoder leading the charge, is spearheading the movement to address this gap.DeciCoder’s efficiency is evident when c

In [None]:
response = qa_with_sources_chain({"query":"What is DeciCoder"})



[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


In [None]:
print(response['result'])

DeciCoder is a 1B-parameter open-source Large Language Model (LLM) for code generation. It has a 2048-context window, permissively licensed, delivers a 3.5x increase in throughput, improved accuracy on the HumanEval benchmark, and smaller memory usage compared to widely-used code generation LLMs such as SantaCoder.


In [None]:
response = qa_with_sources_chain({"query":"Write a blog about Deci and how it used NAS to generate YOLO-NAS and DeciCoder"})



[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


In [None]:
print(response['result'])

Deci, a company focused on pushing the boundaries of accuracy and efficiency, has introduced a new architecture called YOLO-NAS. YOLO-NAS is a benchmark for object detection that has the potential to drive innovation and unlock new possibilities across various industries and research domains.

Deci has showcased its robust capabilities with the DeciCoder model, which consistently outperforms models like SantaCoder. By leveraging AutoNAC, Deci was able to generate an architecture that is both efficient and powerful.

Deci's use of NAS (Neural Architecture Search) played a pivotal role in the development of YOLO-NAS. NAS is a technique that automates the design process of neural networks, allowing for the discovery of optimized architectures. By deploying NAS, Deci was able to achieve state-of-the-art performance on object detection with YOLO-NAS.

The integration of NAS in the development of YOLO-NAS and DeciCoder showcases Deci's commitment to pushing the boundaries of AI innovation. W