Continuous Vector Ingestion

This template shows you how to continuously ingest documents into a vector store using Apache Kafka. For simplicity, this use case is illustrated by streaming data from small CSV files that represent updates to a book catalog. The descriptive text from the catalog entries is then embedded and then ingested it into a vector store for semantic search. In a production scenario, you might use Change Data Capture (CDC) to ensure that the vector store is in sync with the book catalog database. For more information on the production use cases that is template supports, see the accompanying blog article.

This template uses the following open source libraries:

Quix Streams to produce data to, and consume data from, Apache Kafka.
Qdrant Client to create a database to store embeddings and for basic similarity search

The following screenshot illustrates the architecture of the resulting pipeline in Quix Cloud:

You can also try out a minimal version of this pipeline in a standalone Jupyter notebook.

To run it Google Colab, click .

Trying it out

To try out the pipeline, first clone the vector ingestion template (for more information on how to clone a project template, see the article "How to create a project from a template in Quix).

Before you clone the pipeline, you’ll also need to sign up for a free trial account with Qdrant Cloud (you can sign up with your GitHub or Google account). When you clone the project template in Quix, you’ll be asked for your Qdrant Cloud credentials.

When running the project, you'll ingest content in two passes,

In the first pass, you'll add some initial entries to a "book-catalog" vector store via Kafka, then search the vector store (we've used the example query "book like star wars") to check that the data was ingested correctly.
In the second round you'll go through the whole process again (albeit faster) with new data, and see how the matches change for the same search query .

Run the first ingestion test

Press play on the first job (with the name that starts with “PT1…”)—hover your mouse over the “stopped” button to press play.

This will ingest the first part of the same “sci-fi books” sample dataset that we used in the notebook.
On the “Streamlit Dashboard service”, click the blue “launch” icon to open the search UI.
Search for “book like star wars” — the top result should be “Dune”.

We can assume it matched because the words in the description are semantically similar to the query: “planet" is semantically close to "star" and "struggles" is semantically close to "wars".

Run the second ingestion test

Press play on the second job (with the name that starts with “PT2…”)

This will ingest the second part of the dataset with more relevant matches.
In the Streamlit-based search UI, search for “books like star wars” again—the top result should now be “Old man’s war”, and the second result should be “Dune”.

We can assume that Dune has been knocked off the top spot because the new addition has a more semantically relevant description: the "term" war is almost a direct hit, and "interstellar" is probably semantically closer to the search term "star" than "planet".

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
CSV data		CSV data
Create Embeddings		Create Embeddings
Ingest to Qdrant Cloud VectorDB		Ingest to Qdrant Cloud VectorDB
Ingest to Qdrant VectorDB		Ingest to Qdrant VectorDB
Streamlit dashboard		Streamlit dashboard
README.md		README.md
quix.yaml		quix.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CSV data

CSV data

Create Embeddings

Create Embeddings

Ingest to Qdrant Cloud VectorDB

Ingest to Qdrant Cloud VectorDB

Ingest to Qdrant VectorDB

Ingest to Qdrant VectorDB

Streamlit dashboard

Streamlit dashboard

README.md

README.md

quix.yaml

quix.yaml

Repository files navigation

Continuous Vector Ingestion

Trying it out

Run the first ingestion test

Run the second ingestion test

About

Releases

Packages

Languages

quixio/template-continuous-vector-ingestion

Folders and files

Latest commit

History

Repository files navigation

Continuous Vector Ingestion

Trying it out

Run the first ingestion test

Run the second ingestion test

About

Resources

Stars

Watchers

Forks

Languages