# Open Source Real-Time RAG Demo with LangChain, Milvus, Ollama, Quix Streams and Apache Kafka

![Streaming RAG Demo](Streaming_RAG_Demo_LangChain.png)

**Everything is running on Docker with Docker Compose.**


This notebook demonstrates how to build a Retrieval Augmented Generation (RAG) system that can:
1. Answer questions using a vector database ([Milvus](https://github.com/milvus-io/milvus)).
2. Integrate streaming data containing current context using [Quix Streams](https://github.com/quixio/quix-streams).
3. Update its knowledge base in real time.

We'll use:
- **LangChain**: For orchestrating the RAG pipeline.
- **Milvus**: As our vector database.
- **Ollama**: For running the LLM locally (`mistral` model).
- **Quix Streams**: For creating the streaming data applications.
- **Apache Kafka**: As the streaming data broker.

## Setup and imports

First, let's import all necessary libraries:

In [1]:
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_milvus import Milvus
from langchain_ollama.llms import OllamaLLM
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
import json
import sys
import time

## Initialize RAG components

Now we'll set up our RAG system with:
1. Embeddings model for converting text to vectors.
2. LLM for generating responses.
3. Vector store for storing and retrieving documents.
4. RAG prompt template.
5. The complete RAG chain.

In [2]:
def setup_rag_components():
    # Initialize RAG components
    embeddings = HuggingFaceEmbeddings(model_name="BAAI/bge-small-en-v1.5")
    llm = OllamaLLM(model="mistral")
    
    vector_store = Milvus.from_texts(
        texts=["Initial empty document"],
        embedding=embeddings,
        connection_args={"host": "localhost", "port": "19530"},
        collection_name="streaming_rag_demo",
        drop_old=True
    )
    
    # Create RAG prompt
    template = """
        Answer the question based only on the following context: {context}
        Question: {question}
        Answer:
    """
    
    prompt = ChatPromptTemplate.from_template(template)
    
    rag_chain = (
        {"context": vector_store.as_retriever(), "question": RunnablePassthrough()}
        | prompt
        | llm
        | StrOutputParser()
    )
    
    return vector_store, rag_chain

# Initialize our components
vector_store, rag_chain = setup_rag_components()

  from .autonotebook import tqdm as notebook_tqdm


## Test initial RAG system

Let's test our RAG system before adding any real data. It should respond that it doesn't have relevant information since our vector store is empty.

In [3]:
print("Initial Query (before integrating streaming data):")
question = "What do you know about artificial intelligence developments?"

print(f"Question: {question}")
print(f"Answer: {rag_chain.invoke(question)}\n")

Initial Query (before integrating streaming data):
Question: What do you know about artificial intelligence developments?
Answer:  Based on the provided context, I don't have any specific information about the current developments in artificial intelligence. The context provided is an empty document without any relevant content regarding AI or its advancements.



## Kakfa topic cleanup

To make sure we have a clean state, we'll delete and recreate the topic before adding some sample messages.

In [4]:
from confluent_kafka.admin import AdminClient

config = {
    "bootstrap.servers": "localhost:9092",
}

admin_client = AdminClient(config)

admin_client.delete_topics(topics=["messages"])

{'messages': <Future at 0x12442d010 state=running>}

## Kafka producer setup

Now let's create a producer that will send some sample messages to Kafka. These messages will contain information that our RAG system can learn from.

In [5]:
from quixstreams import Application

def get_sample_messages():
    return [
        {"chat_id": "id1", "text": "The latest developments in artificial intelligence have revolutionized how we approach problem solving"},
        {"chat_id": "id2", "text": "Climate change poses significant challenges to global ecosystems and human societies"},
        {"chat_id": "id3", "text": "Quantum computing promises to transform cryptography and drug discovery"},
        {"chat_id": "id4", "text": "Sustainable energy solutions are crucial for addressing environmental concerns"}
    ]
    
app = Application(
    broker_address="localhost:9092",
    auto_create_topics=True
)

# Get producer with automatic resource cleanup
with app.get_producer() as producer:
    messages = get_sample_messages()
    print("\nSending messages to Kafka...")
    
    for message in messages:
        print(f'Sending: "{message["text"]}"')
        producer.produce(
            topic="messages",
            key=message["chat_id"].encode(),
            value=json.dumps(message).encode(),
        )
        
    print("\nAll messages sent!")

[2025-02-25 16:30:46,375] [INFO] [quixstreams] : Topics required for this application: 
[2025-02-25 16:30:46,376] [INFO] [quixstreams] : Validating Kafka topics exist and are configured correctly...
[2025-02-25 16:30:46,382] [INFO] [quixstreams] : Kafka topics validation complete



Sending messages to Kafka...
Sending: "The latest developments in artificial intelligence have revolutionized how we approach problem solving"
Sending: "Climate change poses significant challenges to global ecosystems and human societies"
Sending: "Quantum computing promises to transform cryptography and drug discovery"
Sending: "Sustainable energy solutions are crucial for addressing environmental concerns"

All messages sent!


## Process streaming data

Now we'll consume the messages from Kafka using Quix Streams and add them to our vector store. This simulates how our RAG system can learn from real-time data.

In [6]:
from quixstreams import Application

def process_value(row):
    text = row["text"]
    print(f"\nReceived message: {text}")
    # Add text to vector store
    vector_store.add_texts([text])
    
    return row

app = Application(
    broker_address="localhost:9092",
    consumer_group="rag-consumer",
    auto_offset_reset="earliest"
)

input_topic = app.topic(name="messages")

# Create a Streaming DataFrame for every new message in the topic
sdf = app.dataframe(topic=input_topic)

sdf = sdf.apply(process_value)

app.run()

# NOTE: Streaming applications runs in a continuous loop.
# You must manually interrupt the kernel after processing the 
# sample messages to ensure subsequent notebook cells can run

[2025-02-25 16:30:49,153] [INFO] [quixstreams] : Starting the Application with the config: broker_address="{'bootstrap.servers': 'localhost:9092'}" consumer_group="rag-consumer" auto_offset_reset="earliest" commit_interval=5.0s commit_every=0 processing_guarantee="at-least-once"
[2025-02-25 16:30:49,154] [INFO] [quixstreams] : Topics required for this application: "messages"
[2025-02-25 16:30:49,162] [INFO] [quixstreams] : Validating Kafka topics exist and are configured correctly...
[2025-02-25 16:30:49,189] [INFO] [quixstreams] : Kafka topics validation complete
[2025-02-25 16:30:49,191] [INFO] [quixstreams] : Initializing state directory at "/Users/tun/Dev/Git/stephen37/talks/quix_milvus/state/rag-consumer"
[2025-02-25 16:30:49,195] [INFO] [quixstreams] : Waiting for incoming messages



Received message: The latest developments in artificial intelligence have revolutionized how we approach problem solving

Received message: Climate change poses significant challenges to global ecosystems and human societies

Received message: Quantum computing promises to transform cryptography and drug discovery

Received message: Sustainable energy solutions are crucial for addressing environmental concerns


[2025-02-25 16:30:54,672] [INFO] [quixstreams] : Stop processing of StreamingDataFrame


## Test Updated RAG System

Now let's test our RAG system again. This time it should have knowledge from the streamed messages.

In [7]:
# Query about AI
print("Query about AI developments:")
question = "What do you know about artificial intelligence developments?"
print(f"Question: {question}")
print(f"Answer: {rag_chain.invoke(question)}\n")

# Query about climate change
print("Query about climate change:")
question = "What information do you have about climate change?"
print(f"Question: {question}")
print(f"Answer: {rag_chain.invoke(question)}\n")

Query about AI developments:
Question: What do you know about artificial intelligence developments?
Answer:  The latest developments in artificial intelligence have revolutionized how we approach problem solving.

Query about climate change:
Question: What information do you have about climate change?
Answer:  The provided context indicates that climate change poses significant challenges to global ecosystems and human societies.

