Json middleware #104

mjayprateek · 2019-10-16T05:18:03Z

This PR achieves the following:

Introduces JSON middleware
Externalizes Kafka Serde and String encoding configs

theanirudhvyas · 2019-10-17T07:11:36Z

resources/config.test.ci.edn

+                                                                                 :enable-idempotence                    false
+                                                                                 :value-serializer                      "org.apache.kafka.common.serialization.StringSerializer"
+                                                                                 :key-serializer                        "org.apache.kafka.common.serialization.StringSerializer"}}
+                                   :using-string-serde   {:application-id       "test"


why are we duplicating config here, why can't we redef it in the tests? This way the test will also be independent of the config file.

I thought about it. But, adding/deleting existing configs looks messy in the code. Creating a new config simply allows us to just use the keyword everywhere in the tests.

That way config definition can be separated from its usage. But, I agree that more than two would make for a bad config file.

Additionally, these being reference configs I see the presence of multiple configs useful for a new user who can instantly see how Ziggurat supports multiple streams.

how is it messy? There is just one redef that you need to do for it.

Again if we agree that more than 2 will not look good, then what happens when there is a need for another topic-entity in the tests? Do we use redef there? If we do, won't the tests then become inconsistent?

We can always have good documentation for users to understand the usage, instead of duplicating values in our code.

Okay. But, we had two configs (:without-producer) even before this change. At least, this change is consistent with the previous code in the number of configs. So, I would argue that I haven't added a new config here. Just changed the old one.

@theanirudhvyas WDYT?

theanirudhvyas · 2019-10-17T07:11:56Z

Makefile


 setup:
 	docker-compose down
 	lein deps
 	docker-compose up -d
 	sleep 10
 	docker exec -it ziggurat_kafka /opt/bitnami/kafka/bin/kafka-topics.sh --create --topic $(topic) --partitions 3 --replication-factor 1 --zookeeper ziggurat_zookeeper
+	docker exec -it ziggurat_kafka /opt/bitnami/kafka/bin/kafka-topics.sh --create --topic $(another_test_topic) --partitions 3 --replication-factor 1 --zookeeper ziggurat_zookeeper


if we redef the config, this creation of another topic will not be required

theanirudhvyas · 2019-10-17T07:12:07Z

resources/config.test.edn

+                                                                                 :enable-idempotence                    false
+                                                                                 :value-serializer                      "org.apache.kafka.common.serialization.StringSerializer"
+                                                                                 :key-serializer                        "org.apache.kafka.common.serialization.StringSerializer"}}
+                                   :using-string-serde   {:application-id       "test"


theanirudhvyas · 2019-10-17T07:37:37Z

src/ziggurat/streams.clj

@@ -31,6 +31,10 @@
   :oldest-processed-message-in-s      604800
   :changelog-topic-replication-factor 3})

+(def KEY_DESERIALIZER_ENCODING "key.deserializer.encoding")
+(def VALUE_DESERIALIZER_ENCODING "value.deserializer.encoding")
+(def DESERIALIZER_ENCODING "deserializer.encoding")


instead of def, define these using let in the function required

This is done.

theanirudhvyas · 2019-10-17T07:37:55Z

src/ziggurat/streams.clj

+  (if (some? value-deserializer-encoding)
+    (.put properties VALUE_DESERIALIZER_ENCODING value-deserializer-encoding))
+  (if (some? deserializer-encoding)
+    (.put properties DESERIALIZER_ENCODING deserializer-encoding)))


what are the default values for these?

is this config only valid when the serializer is a string serializer?

Default value is "UTF8" which is hard-coded in both StringSerializer and StringDeserializer. Hence, we don't need to provide our own. Please see [1], for example.

Yes, only StringSerializer uses this configuration.

[1] https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/common/serialization/StringDeserializer.java

theanirudhvyas · 2019-10-17T07:39:48Z

src/ziggurat/streams.clj

 (defn- validate-auto-offset-reset-config
  [auto-offset-reset-config]
  (if-not (contains? #{"latest" "earliest" nil} auto-offset-reset-config)
    (throw (ex-info "Stream offset can only be latest or earliest" {:offset auto-offset-reset-config}))))

+(defn- get-serde [default-serde]


you can use the default config obtained using the deep-merge instead of defining a new function here.

theanirudhvyas · 2019-10-17T07:50:52Z

src/ziggurat/middleware/json.clj

+  "This namespace defines middleware methods for parsing JSON strings.
+   Please see [Ziggurat Middleware](https://github.com/gojek/ziggurat#middleware-in-ziggurat) for more details.
+  "
+  (:require [cheshire.core :refer :all]


instead of :refer :all only refer the functions required https://github.com/bbatsov/clojure-style-guide#prefer-require-over-use

theanirudhvyas · 2019-10-17T08:40:17Z

src/ziggurat/middleware/json.clj

+
+   "
+  ([handler-fn topic-entity-name]
+   (parse-json handler-fn topic-entity-name true))


This function is expecting (name topic-entity) i.e. a string rather than the topic-entity. By default users always pass topic-entity. We should start expecting topic-entity and then convert it inside.
Or expect any of string or keyword and handle it here.

Actually, this is topic-entity only. I've added that -name suffix. I'll remove that to prevent any confusion.

This reverts commit 8c0f868.

…d deserializer.encoding configs for use within String Serde

theanirudhvyas · 2019-10-21T11:20:35Z

src/ziggurat/middleware/json.clj

+  (try
+    (parse-string message key-fn)
+    (catch Exception e
+      (let [additional-tags   {:topic_name topic-entity}


the topic-entity that is being passed here, is that a string or a keyword? We want to send a string in the metric right? thats what I was referring to in the previous comment.

Okay. Done.

theanirudhvyas · 2019-10-21T11:27:13Z

src/ziggurat/streams.clj

@@ -51,20 +67,26 @@
                           buffered-records-per-partition
                           commit-interval-ms
                           upgrade-from
-                           changelog-topic-replication-factor]}]
+                           changelog-topic-replication-factor
+                           default-key-serde


can we rename this to something that makes more sense?
If we wish to keep it the same as kafka, then it should be default-key-serde-class.
I'd recommend to keep it to something else like key-serde-class.

Hmm. Actually, default-key-serde will be consistent with default.key.serde config defined here: https://docs.confluent.io/current/streams/developer-guide/config-streams.html.

mjayprateek force-pushed the json_middleware branch 3 times, most recently from 0d3e4ab to 3c484ee Compare October 16, 2019 11:54

mjayprateek requested review from irfn and theanirudhvyas October 17, 2019 05:54

theanirudhvyas reviewed Oct 17, 2019

View reviewed changes

mjayprateek added 6 commits October 17, 2019 18:22

Revert "Converting byte array to string before json deserialization"

6f80b3b

This reverts commit 8c0f868.

Adding configuration options for Kafka Streams key and value serdes

4b6993e

Introducing key.deserializer.encoding, value.deserializer.encoding an…

478645d

…d deserializer.encoding configs for use within String Serde

Adding a new test for stream route with string serde

f88e021

Changing topic-entity-name to topic-entity

64ad284

putting byte array serde configs as defaults

1c3c546

mjayprateek force-pushed the json_middleware branch from 3c484ee to 1c3c546 Compare October 17, 2019 19:04

Referring parse-string instead of :all

989bc7a

theanirudhvyas reviewed Oct 21, 2019

View reviewed changes

converting topic-entity keyword to string while publishing metrics

eb4c565

mjayprateek merged commit 6448a3f into gojek:master Oct 22, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Json middleware #104

Json middleware #104

mjayprateek commented Oct 16, 2019

theanirudhvyas Oct 17, 2019

mjayprateek Oct 17, 2019

theanirudhvyas Oct 17, 2019

mjayprateek Oct 17, 2019 •

edited

Loading

mjayprateek Oct 20, 2019

theanirudhvyas Oct 17, 2019

theanirudhvyas Oct 17, 2019

theanirudhvyas Oct 17, 2019

mjayprateek Oct 17, 2019

theanirudhvyas Oct 17, 2019

theanirudhvyas Oct 17, 2019

mjayprateek Oct 17, 2019

theanirudhvyas Oct 17, 2019

mjayprateek Oct 17, 2019

theanirudhvyas Oct 17, 2019

mjayprateek Oct 17, 2019

theanirudhvyas Oct 17, 2019

mjayprateek Oct 17, 2019

mjayprateek Oct 17, 2019

theanirudhvyas Oct 21, 2019

mjayprateek Oct 22, 2019

theanirudhvyas Oct 21, 2019

mjayprateek Oct 22, 2019

Json middleware #104

Json middleware #104

Conversation

mjayprateek commented Oct 16, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mjayprateek Oct 17, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mjayprateek Oct 17, 2019 •

edited

Loading