Hybrid RAG Explanation:
Hybrid RAG typically refers to a system where a retrieval mechanism is used to augment a generative model (like an LLM) with additional context or information. The model then generates responses that are more informed by the retrieved data.
How This Script Fits:
Retrieval: The script includes a simulated retrieval step where it retrieves predefined documents based on the user's input. Although this retrieval is static and hardcoded, it serves as a placeholder for a dynamic retrieval system.
Augmentation: The retrieved documents are combined with the user query to provide a richer context. This combined context is then fed into the LLM for response generation.
Generation: The script uses Amazon's Titan LLM (through the Bedrock service) to generate a response based on the augmented input. This is where the LLM integration comes into play.
Why It’s a Hybrid RAG Demo:
Integration of LLM: The script integrates an LLM (Titan in this case) to generate responses, leveraging the context provided by the retrieved documents.
Demonstration of Concepts: The script demonstrates the basic concepts of RAG by showing how retrieval and generation can be combined to create more informed responses. However, it lacks a fully dynamic and scalable retrieval system and doesn't include more advanced features like fine-tuning or learning from the combined data.
Limitations:
Static Retrieval: The retrieval mechanism is not dynamic or connected to a real knowledge base. It's a simplified version meant for demonstration purposes.
No Learning Component: The script does not include any training or learning process that would optimize the integration between retrieval and generation, which is often a part of more sophisticated RAG systems.
Conclusion:
This script can be seen as a hybrid RAG demo with LLM integration, where the key concepts of retrieval and generation are illustrated. It serves as a basic example of how RAG could work but is not a full-fledged implementation. If you were to replace the static retrieval with a dynamic system and potentially add some learning or fine-tuning components, it could become a more complete RAG implementation.

In [1]:
import boto3
import json

# Initialize the Bedrock client
client = boto3.client('bedrock-runtime', region_name='us-east-1')

def retrieve_documents(query):
    """
    Simulates document retrieval from a knowledge base using the input query.
    This can be replaced with actual document retrieval logic (e.g., from an S3 bucket or database).
    """
    # Placeholder retrieval function
    # In a real implementation, this would query a database, search engine, or an API to get relevant documents
    retrieved_docs = [
        "New York is famous for its pizza. Some popular spots include Joe's Pizza, Lombardi's, and Di Fara Pizza.",
        "New York pizza is known for its thin crust and unique flavor, often attributed to the city's water."
    ]
    return retrieved_docs

def invoke_model(prompt_text):
    """
    Invokes the foundation model using the Bedrock runtime service.
    Combines the prompt text with retrieved documents for a more informed response.
    """
    try:
        response = client.invoke_model(
            modelId='amazon.titan-text-lite-v1',
            contentType='application/json',
            accept='application/json',
            body=json.dumps({
                'inputText': prompt_text
            })
        )
        
        result = json.loads(response['body'].read().decode('utf-8'))
        return result
    except Exception as e:
        print(f"Error invoking model: {e}")
        return None

def chatbot():
    """
    A simple chatbot loop that interacts with the user using a foundation model and RAG.
    """
    print("Chatbot is now running. Type 'exit' to end the chat.")
    while True:
        user_input = input("You: ")
        if user_input.lower() == 'exit':
            print("Chatbot session ended.")
            break
        
        # Simulate document retrieval based on the user input
        retrieved_docs = retrieve_documents(user_input)
        context = "\n".join(retrieved_docs)  # Combine retrieved documents into a single context string
        
        # Combine user input with the context
        combined_input = f"User query: {user_input}\n\nContext:\n{context}"
        
        # Invoke the foundation model with the combined input
        result = invoke_model(combined_input)
        
        if result:
            if 'results' in result and result['results']:
                generated_text = result['results'][0]['outputText']
                print(f"Chatbot: {generated_text}")
            else:
                print("Chatbot: Sorry, I couldn't generate a response.")
        else:
            print("Chatbot: Sorry, there was an issue processing your request.")

# Start the chatbot
chatbot()


Chatbot is now running. Type 'exit' to end the chat.
You: Best Pizza in New York
Chatbot: 
Joe's Pizza is a famous New York pizza restaurant that has been around since 1960. It is known for its thin crust and unique flavor.
Lombardi's is another famous New York pizza restaurant that has been around since 1905. It is known for its deep dish pizza and unique flavor.
Di Fara Pizza is a famous New York pizza restaurant that has been around since 1964. It is known for its thin crust and unique flavor.
Joe's Pizza, Lombardi's, and Di Fara Pizza are all famous New York pizza restaurants known for their thin
You: How is New York as vacation
Chatbot: 
Sorry, this model is unable to provide opinions on cities. However, it can provide information on cities. New York City is a popular vacation destination known for its iconic landmarks, world-class museums, and diverse culture. 
You: give a ice cream shop recommendation
Chatbot: 
The model cannot find sufficient information to answer the question.