# Simple RAG Application with Llama-Index and Replicate

In this notebook, we'll create a simple RAG (Retrieval-Augmented Generation) application using the `llama-index` for document retrieval and `replicate` for leveraging a pre-trained language model.

## An Overview of Retrieval Augmented Generation (RAG)

## Introduction
  
The landscape of natural language processing (NLP) and artificial intelligence (AI) is continually evolving, with innovations aimed at making content generation more efficient, personalized, and accurate. Among these advancements, Retrieval Augmented Generation (RAG) has emerged as a pivotal technology, combining the strengths of both information retrieval and text generation. This blog post delves into the world of RAG, exploring its fundamentals, applications, benefits, and the potential it holds for transforming how we interact with and produce content.

## What is Retrieval Augmented Generation (RAG)?  
  
Retrieval Augmented Generation is a cutting-edge approach in the realm of NLP that integrates two primary functionalities:  
  
### 1. Information Retrieval (IR)  
* This component involves searching and retrieving relevant information from a vast database or knowledge repository based on a query or prompt.  
* The goal is to fetch the most pertinent and up-to-date information related to the input.  
  
### 2. Text Generation
* Once relevant information is retrieved, the text generation component crafts coherent, contextual, and often personalized text based on the fetched data.  
* This can range from short responses to lengthy documents, adapting to the specified requirements.  
  
## How Does RAG Work?  
  
The operational workflow of RAG can be outlined in the following steps:  
  
* **Input/Prompt:** A user provides a query or topic of interest.  
* **Retrieval:** The system searches through its indexed database to retrieve a set of relevant documents or passages.  
* **Ranking (Optional):** Retrieved documents may undergo ranking to prioritize the most relevant ones, ensuring the generation phase works with the best possible inputs.  
* **Generation:** Using the retrieved (and possibly ranked) information, the text generation model crafts a response.  
	+ This can involve summarizing the content, expanding on it, or transforming it to fit a specific format (e.g., turning a piece of text into a question).  
* **Post-processing (Optional):** The generated text might undergo editing for grammar, coherence, or to better align with the original query's intent.  
  
## Applications of Retrieval Augmented Generation
  
RAG's versatility and efficiency make it a valuable tool across various sectors and use cases:  
  
* **Customer Service Chatbots:** Enhance response accuracy and personalization by retrieving customer history and preferences.  
* **Content Creation Tools:** Assist writers with research and drafts, potentially automating routine content types (e.g., news summaries, product descriptions).  
* **Educational Platforms:** Provide students with detailed, up-to-date study materials by retrieving and generating content around specific topics or questions.  
* **Knowledge Base Construction:** Automate the updating and expansion of internal knowledge bases for companies, ensuring information is current and accessible.  
  
## Benefits of Using RAG  
  
* **Efficiency:** Streamlines content creation by automating research and initial drafting.  
* **Accuracy:** Ensures responses are based on the most relevant and up-to-date information.  
* **Personalization:** Can tailor outputs to individual preferences or context.  
* **Scalability:** Handles a high volume of queries without a proportional increase in manual labor.  
  
## Challenges and Future Directions  
  
While RAG holds tremendous potential, its development and implementation are not without challenges:  
  
* **Data Quality and Availability:** The efficacy of RAG heavily depends on the quality and comprehensiveness of its database.  
* **Contextual Understanding:** Improving the model's ability to understand nuanced queries and generate contextually appropriate responses.  
* **Ethical Considerations:** Addressing issues related to information source credibility, privacy, and potential biases in generated content.  
  
## Conclusion
  
Retrieval Augmented Generation stands at the forefront of NLP innovations, poised to revolutionize how we generate and interact with content. By harnessing the strengths of both information retrieval and text generation, RAG offers unparalleled efficiency, accuracy, and personalization. As this technology continues to evolve, addressing its challenges will be key to unlocking its full transformative potential across industries and aspects of our digital lives.  
  

## Setup

First, we need to install the necessary packages. We'll be using `llama-index` for document indexing and retrieval, and `replicate` for interacting with a language model. Make sure to install these packages if you haven't already.

We'll also be using:
- `llama-index-readers-string-iterable` for loading strings as 'documents'
- `llama-index-embeddings-huggingface` for our embedding model.

In [None]:
!pip install llama-index replicate llama-index-readers-string-iterable llama-index-embeddings-huggingface

## Step 1: Prepare the Documents

Let's assume we have a small set of documents, represented as a list of strings. These documents could come from any source, such as text files, APIs, or databases. For simplicity, we'll create a few sample documents here.

In [6]:
doc_strings = [
    "Python is a powerful programming language known for its simplicity and versatility.",
    "Machine learning is a method of data analysis that automates analytical model building.",
    "The Pacific Ocean is the largest and deepest of the world's ocean basins.",
    "Climate change refers to long-term shifts in temperatures and weather patterns.",
]

from llama_index.readers.string_iterable import StringIterableReader

# Initialize StringIterableReader
reader = StringIterableReader()

# Load data from an iterable of strings
documents = reader.load_data(
    texts=doc_strings
)


## Step 2: Index the Documents with Llama-Index

We'll use `llama-index` to create an index from our documents. This will allow us to efficiently retrieve relevant documents based on a query.

In [10]:
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core import Settings

Settings.embed_model = HuggingFaceEmbedding(
    model_name="BAAI/bge-small-en-v1.5"
)

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/94.8k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/52.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/743 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/133M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/366 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [12]:
from llama_index.core import VectorStoreIndex

# Create an index from the documents
index = VectorStoreIndex.from_documents(documents)

## Step 3: Query the Index

Now, we can use the index to retrieve relevant documents for a given query. Let's try it with a question about programming.

In [24]:
query = "Tell me about programming languages."

from llama_index.core.llms import MockLLM
llm = MockLLM()

# Retrieve relevant documents
query_engine = index.as_query_engine(llm=llm)
relevant_doc_ids = query_engine.query(query)
retrieved_documents = [doc_id.text for doc_id in relevant_doc_ids.source_nodes]
retrieved_documents

['Python is a powerful programming language known for its simplicity and versatility.',
 'Machine learning is a method of data analysis that automates analytical model building.']

## Step 4: Use Granite hosted by Replicate to Generate a Response

With the retrieved documents, we can now generate a response using a language model via `replicate`. Make sure to set up your API key for Replicate.

In [36]:
import replicate

from google.colab import userdata
api_token= userdata.get('REPLICATE_API_TOKEN')
# Assume you've already set your replicate API key
client = replicate.Client(api_token=api_token)

# Concatenate retrieved documents as context
context = " ".join(retrieved_documents)

# Generate a response using a language model
response = client.run("ibm-granite/granite-3.0-8b-instruct", input={"prompt": f"{context} {query} base your response on the information provided",
                                                                      "temperature" : 0.7})
print("".join(response))

Python is indeed a popular programming language, especially in the field of machine learning due to its simplicity and extensive libraries such as TensorFlow and Scikit-learn. These libraries simplify the process of creating and training machine learning models.

Programming languages are tools that allow us to communicate instructions to a computer. They are used to create software, applications, and websites. Some other popular programming languages include Java, C++, JavaScript, and Ruby. Each language has its own strengths and is suited to different types of projects.

For instance, Java is known for its robustness and is often used for large-scale enterprise applications. C++ is a powerful language often used for system software, game development, and applications that require high performance. JavaScript is primarily used for web development, both on the client-side and server-side. Ruby is known for its simplicity and is often used for web development and automation.

In the con

## Conclusion

In this notebook, we demonstrated how to build a simple RAG application using `llama-index` for document retrieval and `replicate` for generating responses via a language model. This approach allows us to combine the strengths of information retrieval and generation, providing informative and contextually relevant answers.

In future notebooks we'll dive into the topics presented here in much greater detail.