# Vector Databases For Workflow State Management

Workflows are notoriously event-driven code solutions, but it is more and more crucial, for production-grade applications, that they also have write and read access to a global state where they can store and fetch data that are relevant to their run.

[llama-index-workflows](https://github.com/run-llama/workflows-py) offer a perfect solution for customized, asynchronous and lockable [state management](https://docs.llamaindex.ai/en/latest/module_guides/workflow/#adding-typed-state), but the state lacks persistency across different sessions - and this is exactly where databases, and especially vector databases, enter the game.

With a database, you can take snapshots of the workflow state at the end of a run and store them into it, to retrieve them in later runs and inform the behavior of the workflow itself.

This is key for resource management, but also to create a first proof-of-concept of self-learning workflows.

In this examples, we will see how we can combine [Qdrant](https://qdrant.tech) vector database services with [OpenAI](https://openai.com) LLM capabilties (leveraging remote MCP support for searching DeepWiki and structured generation for parsing the ouputs).

### 1. Install needed dependencies

In [None]:
%pip install -q llama-index-workflows qdrant-client sentence-transformers openai

### 2. Define a Raw Vector Database Client

To fulfil our state-management requirements, we need to write a raw vector database client that is able to create a database collection, upload data points (not only text, but also metadata - since our state can easily be transformed into a dictionary) and retrieve those data points with similarity search.

We will use dense vector search to keep things plain and simple, and we will employ [all-MiniLM-L6-v2](sentence-transformers/all-MiniLM-L6-v2) as an embedding model, but you can easily change these settings to add layers of complexity (hybrid search, a better embedding model...) to the vector database client.

In [None]:
import json

from qdrant_client import AsyncQdrantClient, models
from sentence_transformers import SentenceTransformer
from typing import List, Dict, Any


class QdrantVectorDatabase:
    def __init__(
        self,
        client: AsyncQdrantClient,
        model: SentenceTransformer,
        collection_name: str,
    ):
        self._has_vectors = False
        self.client = client
        self.model = model
        self.collection_name = collection_name

    async def create_collection(self):
        await self.client.create_collection(
            collection_name=self.collection_name,
            vectors_config=models.VectorParams(
                size=self.model.get_sentence_embedding_dimension() or 384,
                distance=models.Distance.COSINE,
            ),
        )

    async def upload(
        self, texts: List[str], metadatas: List[Dict[str, Any]], ids: List[str]
    ) -> None:
        self._has_vectors = True
        embeddings = self.model.encode(texts).tolist()
        for i, embedding in enumerate(embeddings):
            await self.client.upsert(
                collection_name=self.collection_name,
                points=[
                    models.PointStruct(
                        id=ids[i],
                        vector=embedding,
                        payload=metadatas[i],
                    )
                ],
            )

    async def search(self, query: str, limit: int, threshold: float = 0.75) -> str:
        if not self._has_vectors:
            return ""
        embedding = self.model.encode(query).tolist()
        results = await self.client.search(
            self.collection_name, query_vector=embedding, limit=limit
        )
        payloads = [hit.payload for hit in results if hit.score > threshold]
        return json.dumps(payloads, indent=4)

Let's now create the collection and verify that the collection creation was successful.

In [None]:
qdrant_client = AsyncQdrantClient(":memory:")
model = SentenceTransformer("all-MiniLM-L6-v2")
collection_name = "workflow_collection"
vdb = QdrantVectorDatabase(qdrant_client, model, collection_name)
await vdb.create_collection()

In [None]:
await qdrant_client.collection_exists("workflow_collection")

True

### 3. Design the LLM Client

Since we also need customized functions for the LLM client (like remote MCP DeepWiki search and structured output generation), we will also design an LLM client using `AsyncOpenAI` as a starting point.

We will also add a method with which our LLM client can evaluate the relevance of the retrieved context for our workflow runs.

In [None]:
from dataclasses import dataclass
from openai import AsyncOpenAI
from typing import Optional
from pydantic import BaseModel, Field


class ContextRelevance(BaseModel):
    relevance: int = Field(
        description="Relevance of the context based on the user message, expressed as a number between 1 and 100",
        ge=1,
        le=100,
    )
    reasons: str = Field(description="Reasons for the evaluation")


class DeepWikiOutput(BaseModel):
    summary: str = Field(description="Summary of the research output")
    focal_points: List[str] = Field(description="Focal points of the research output")
    references: List[str] = Field(
        description="References contained in the reseaerch output", default_factory=list
    )
    similar_topics: List[str] = Field(
        description="Topics similar to the one of the research output"
    )


@dataclass
class OpenAILlm:
    llm: AsyncOpenAI

    async def deep_wiki(self, message: str, context: Optional[str] = None) -> str:
        if not context:
            response = await self.llm.responses.create(
                model="gpt-4.1",
                tools=[
                    {
                        "type": "mcp",
                        "server_label": "deepwiki",
                        "server_url": "https://mcp.deepwiki.com/mcp",
                        "require_approval": "never",
                    },
                ],
                input=message,
            )
        else:
            response = await self.llm.responses.create(
                model="gpt-4.1",
                tools=[
                    {
                        "type": "mcp",
                        "server_label": "deepwiki",
                        "server_url": "https://mcp.deepwiki.com/mcp",
                        "require_approval": "never",
                    },
                ],
                input="<context>\n\t"
                + context
                + "\n</context>\n<user_message>\n\t"
                + message
                + "\n</user_message>\n<instructions>\n\tReply to the user message based on the contextual information\n</instructions>",
            )
        return await self.format_deep_wiki_output(response.output_text)

    async def classify_context_relevance(self, message: str, context: str) -> str:
        response = await self.llm.responses.parse(
            model="gpt-4.1",
            input="<context>\n\t"
            + context
            + "\n</context>\n<user_message>\n\t"
            + message
            + "\n</user_message>\n<instructions>\n\tEvaluate the relevance of the context in relation to the user message from 1 to 100, and give reasons for this evaluation\n</instructions>",
            text_format=ContextRelevance,
        )
        return response.output_parsed

    async def format_deep_wiki_output(self, output: str):
        response = await self.llm.responses.parse(
            model="gpt-4.1",
            input="<research_output>\n\t"
            + output
            + "\n</research_output>\n<instructions>\n\tFormat the output so that you highlight a summary, the focal points, the references contained into it and further topics to explore similar to the one in the ouput\n</instructions>",
            text_format=DeepWikiOutput,
        )
        return response.output_parsed

### 4. Create Resources

[Resources](https://docs.llamaindex.ai/en/latest/module_guides/workflow/#resources) are a way, in workflows, to perform dependency injection: you define functions to get external serices and make them available, step-wise, within the workflow itself.

In [None]:
from getpass import getpass

llm = AsyncOpenAI(api_key=getpass("Enter your OpenAI API key: "))

Enter your OpenAI API key: ··········


In [None]:
openai_llm = OpenAILlm(llm)


def get_llm(**kwargs):
    return openai_llm


def get_vdb(**kwargs):
    return vdb

### 5. Define the Workflow

We now need define the workflow: not only the steps themselves, but also the events that will drive the execution.

In [None]:
from workflows.events import StartEvent, Event, StopEvent


class ResearchQuestionEvent(StartEvent):
    question: str


class ResearchEvent(StopEvent, DeepWikiOutput):
    pass


class RetrieveContextEvent(Event):
    context: str


class ContextRelevanceEvent(Event, ContextRelevance):
    context: Optional[str]

In [None]:
from workflows import Workflow, step, Context
from workflows.resource import Resource
from typing import Annotated, Union
import uuid


class WorkflowState(BaseModel):
    question: str = Field(description="Question", default_factory=str)
    summary: str = Field(
        description="Summary of the research output", default_factory=str
    )
    focal_points: List[str] = Field(
        description="Focal points of the research output", default_factory=list
    )
    references: List[str] = Field(
        description="References contained in the reseaerch output", default_factory=list
    )
    similar_topics: List[str] = Field(
        description="Topics similar to the one of the research output",
        default_factory=list,
    )


class DeepWikiWorkflow(Workflow):
    @step
    async def research_question(
        self,
        ev: ResearchQuestionEvent,
        ctx: Context[WorkflowState],
        vdb: Annotated[QdrantVectorDatabase, Resource(get_vdb)],
    ) -> Union[ContextRelevanceEvent, RetrieveContextEvent]:
        ctx.write_event_to_stream(ev)
        async with ctx.store.edit_state() as state:
            state.question = ev.question

        results = await vdb.search(ev.question, 5)
        if not results:
            ctx.write_event_to_stream(
                ContextRelevanceEvent(
                    relevance=1, reasons="No context found", context=None
                )
            )
            return ContextRelevanceEvent(
                relevance=1, reasons="No context found", context=None
            )
        ctx.write_event_to_stream(RetrieveContextEvent(context=results))
        return RetrieveContextEvent(context=results)

    @step
    async def evaluate_context(
        self,
        ev: RetrieveContextEvent,
        ctx: Context[WorkflowState],
        llm: Annotated[OpenAILlm, Resource(get_llm)],
    ) -> ContextRelevanceEvent:
        state = await ctx.store.get_state()
        relevance = await llm.classify_context_relevance(state.question, ev.context)
        if relevance.relevance >= 75:
            ctx.write_event_to_stream(
                ContextRelevanceEvent(
                    relevance=relevance.relevance,
                    reasons=relevance.reasons,
                    context=ev.context,
                )
            )
            return ContextRelevanceEvent(
                relevance=relevance.relevance,
                reasons=relevance.reasons,
                context=ev.context,
            )
        ctx.write_event_to_stream(
            ContextRelevanceEvent(
                relevance=relevance.relevance, reasons=relevance.reasons, context=None
            )
        )
        return ContextRelevanceEvent(
            relevance=relevance.relevance, reasons=relevance.reasons, context=None
        )

    @step
    async def research(
        self,
        ev: ContextRelevanceEvent,
        ctx: Context[WorkflowState],
        llm: Annotated[OpenAILlm, Resource(get_llm)],
        vdb: Annotated[QdrantVectorDatabase, Resource(get_vdb)],
    ) -> ResearchEvent:
        static_state = await ctx.store.get_state()
        result = await llm.deep_wiki(static_state.question, ev.context)
        async with ctx.store.edit_state() as state:
            state.summary = result.summary
            state.focal_points = result.focal_points
            state.references = result.references
            state.similar_topics = result.similar_topics

        await vdb.upload([state.question], [state.model_dump()], [str(uuid.uuid4())])
        return ResearchEvent(
            summary=result.summary,
            focal_points=result.focal_points,
            references=result.references,
            similar_topics=result.similar_topics,
        )

As you can see, the workflow has three key steps:

1. The first one, triggered by a question from the user, retrieves previous workflow states stored in the vector database. As you can see, the context is here retrieved through similarity search on the research question: this is actually the same way [semantic caches](https://qdrant.tech/blog/hitchhikers-guide/#semantic-caching) work - by storing the questions as embedding and the answers and metadata, and finding similar questions to the one the user already asked. If the context is mot there (maybe the workflow is empty, maybe the context is not similar enough), we go directly to step (3), else we perform context evaluation in step (2)
2. The retrieved context is evaluated for its relevance by the LLM using structured output: the LLM is constrained into producing a score from 1 to 100 and to provide reasons for the scoring
3. The research step is performed: the user message and the context (if there) are passed to the LLM, which, leveraging remote MCP connection to DeepWiki, performs the research on the user's question. When the response is ready, we "take a snapshot of the state" and we upload it, with the associated question, to the vector database.

### 6. Run the Workflow

We will now perform two runs of the same workflow, with very similar questions, to see how our state-management system works with the vector database!

In [None]:
wf = DeepWikiWorkflow(timeout=600)
handler = wf.run(
    start_event=ResearchQuestionEvent(
        question="What transport protocols are supported in the 2025-03-26 version of the Model Context Protocol (MCP) spec?"
    )
)

async for event in handler.stream_events():
    if isinstance(event, ResearchQuestionEvent):
        print("Starting working on the question", event.question)
    elif isinstance(event, ContextRelevanceEvent):
        print(f"Context relevance: {event.relevance}")
    elif isinstance(event, RetrieveContextEvent):
        print(f"Context: {event.context}")

result = await handler

Starting working on the question What transport protocols are supported in the 2025-03-26 version of the Model Context Protocol (MCP) spec?
Context relevance: 1


In [None]:
print("Summary:", result.summary)
print("Focal Points:\n- ", "\n- ".join(result.focal_points))
print("References:\n- ", "\n- ".join(result.references))
print("Similar Topics:\n- ", "\n- ".join(result.similar_topics))

Summary: The 2025-03-26 version of the Model Context Protocol (MCP) specification outlines four supported transport protocols—HTTP/HTTPS, WebSocket, gRPC, and MQTT—for various messaging and interoperability needs. Raw TCP has been deprecated due to security and interoperability considerations.
Focal Points:
-  HTTP/HTTPS is the primary MCP messaging transport.
- WebSocket allows for real-time, bidirectional communication.
- gRPC provides an efficient binary protocol with strong typing.
- MQTT, now stable, targets lightweight, IoT, and edge scenarios.
- Raw TCP transport is deprecated and will be removed in future specifications.
References:
-  Section 7, ‘Transport Protocols,’ of the MCP 2025-03-26 spec
Similar Topics:
-  Comparison of transport protocols in machine learning frameworks
- Protocol interoperability and security in distributed systems
- IoT messaging protocols (MQTT, AMQP, CoAP)
- Transition strategies for protocol deprecation in API standards
- Streaming and real-time me

In [None]:
handler = wf.run(
    start_event=ResearchQuestionEvent(
        question="Which transport protocols does the Model Context Protocol specification version 2025-03-26 support?"
    )
)

async for event in handler.stream_events():
    if isinstance(event, ResearchQuestionEvent):
        print("Starting working on the question", event.question)
    elif isinstance(event, ContextRelevanceEvent):
        print(f"Context relevance: {event.relevance}")
    elif isinstance(event, RetrieveContextEvent):
        print(f"Context: {event.context}")

result = await handler

  results = await self.client.search(self.collection_name, query_vector=embedding, limit=limit)


Starting working on the question Which transport protocols does the Model Context Protocol specification version 2025-03-26 support?
Context: [
    {
        "question": "What transport protocols are supported in the 2025-03-26 version of the Model Context Protocol (MCP) spec?",
        "summary": "The 2025-03-26 version of the Model Context Protocol (MCP) specification outlines four supported transport protocols\u2014HTTP/HTTPS, WebSocket, gRPC, and MQTT\u2014for various messaging and interoperability needs. Raw TCP has been deprecated due to security and interoperability considerations.",
        "focal_points": [
            "HTTP/HTTPS is the primary MCP messaging transport.",
            "WebSocket allows for real-time, bidirectional communication.",
            "gRPC provides an efficient binary protocol with strong typing.",
            "MQTT, now stable, targets lightweight, IoT, and edge scenarios.",
            "Raw TCP transport is deprecated and will be removed in future sp

As you can see, we have successfully retrieved the state-as-a-context from the vector database, and we successfully rated it as highly relevant and used it to augment the generation of our research report!

In [None]:
print("Summary:", result.summary)
print("Focal Points:\n- ", "\n- ".join(result.focal_points))
print("References:\n- ", "\n- ".join(result.references))
print("Similar Topics:\n- ", "\n- ".join(result.similar_topics))

Summary: The Model Context Protocol (MCP) specification version 2025-03-26 enumerates its supported transport protocols, emphasizing modern, secure, and efficient messaging. It supports HTTP/HTTPS as the main transport, with WebSocket, gRPC, and a now-stable MQTT option for lightweight and IoT usage. Raw TCP transport is deprecated and set for future removal for security and interoperability reasons.
Focal Points:
-  Supported transport protocols: HTTP/HTTPS, WebSocket, gRPC, MQTT
- MQTT has reached stable status and is aimed at IoT and edge scenarios
- Raw TCP transport is deprecated due to security and interoperability concerns
- Anticipated removal of raw TCP transport in upcoming MCP versions
References:
-  Section 7, ‘Transport Protocols,’ of the MCP 2025-03-26 spec
Similar Topics:
-  Protocol deprecation and migration strategies
- Transport protocol security best practices
- Comparative analysis of transport protocols for IoT
- HTTP/2 and HTTP/3 in modern protocol suites
- Intero