Accessing StateStore #93

ahmadsholehin · 2019-01-27T19:15:15Z

I’m new to Kafka and I’m trying to relate how this article https://www.confluent.io/blog/building-a-microservices-ecosystem-with-kafka-streams-and-ksql/ relates to kafka-streams.
The article talks about being able to interactively query the state store (one of the ways is to use KSQL).
I’ve been reading tons of resources about Kafka and I still can’t put the pieces together on how to use kafka-streams to implement the concepts that I’ve learnt.
For a start, how do you actually use kafka-streams to query the state store? Any help in any direction is greatly appreciated.

Protoss78 · 2019-02-12T08:37:01Z

Hi,

I'm very familiar with Kafka Streams from a Java perspective, but I haven't used this Javascript library yet. However I can explain the functionality behind the interactive queries:
Usually when creating a Kafka Streams topology you do the following:

Consume data from one or more Kafka topics
Transform the data using the Kafka Streams topology (map, filter, join, ...)
Produce the transformed data into one or more Kafka topics

You can however choose a different target for your streaming topology. Instead producing data to another Kafka topic, you could also write it into a so called "State Store". A State Store is basically a RocksDB Key-Value Database that resides in the local file system of your application (usually in the OS temp folder). This state store can be accessed via a Kafka API and gives you the possibility to get the latest message value for a certain key. You could wrap this functionality behind a REST interface and make the content of a Kafka topic available for a Web Frontend. This is basically what they call interactive queries.

However the above description is a little bit simplified. What I've left out is that there are two different kind of state stores. There is the regular state store and there are global state stores. A global state store will always have all the data from a topic. When you start 3 instances of your application the global state store will be created 3 times. When you use a regular state store and your start 3 instances of your application, Kafka will assign a third of the topic partitions to every instance, therefore the whole topic content is spread over 3 Kafka state stores.

If you use such a regular state store and want to provide a REST service to query the state store content, you are now facing the problem that the data is spread over 3 different server instances. Therefore you have to find out which instance of your application is holding which partition of the Kafka topic. There is also an API for that available (at least in the Java version). So that is the second big and important chunk of the term interactive queries.

You will find a great description of the whole concept here: https://docs.confluent.io/current/streams/developer-guide/interactive-queries.html

Hope this helps.

ahmadsholehin · 2019-02-12T11:56:08Z

Hi @Protoss78,

Your explanation is remarkably useful. You’re spot on in your second paragraph detailing what Kafka (at least in the Java perspective) provides in the form of interactive queries. This dispels the magic that Kafka packages, makes us aware of the layers of abstraction and in the end understand that the core of what Kafka gives is publishing and subscribing of topics.

I’m starting to understand that this kafka-streams library sits above the pub/sub topics core of Kafka and augments it with a Most.js syntax for stream processing of topics data. Have to also note that this stream processing (map, filter, etc) occurs locally at the application instance as and when data comes in (eg a reduce operation does not combine between multiple application instances).

The library does not come bundled with a state store nor a global state store that mimics what the Kafka API in the Java world has. That though does not stop anyone from building a state store for every application instance yourself. For eg, I’ve been trying to build a Redis-backed state store that does a subset of what the Java Kafka API does.

I also quickly realised that some non-trivial operations, especially aggregations, eg calculating mean value of a stream of values in a distributed manner using multiple application instances, requires much more planning and thought. This library simply does not give you the required options out of the box, though do correct me if I’m wrong.

Thank you once again for your time in your splendid explanation.

ahmadsholehin closed this as completed Feb 12, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Accessing StateStore #93

Accessing StateStore #93

ahmadsholehin commented Jan 27, 2019

Protoss78 commented Feb 12, 2019

ahmadsholehin commented Feb 12, 2019

Accessing StateStore #93

Accessing StateStore #93

Comments

ahmadsholehin commented Jan 27, 2019

Protoss78 commented Feb 12, 2019

ahmadsholehin commented Feb 12, 2019