Sink should handle records without _id field #8

dbtucker · 2016-12-13T23:21:00Z

The 'put" method simply looks for the "_id" field in the Sink record. But that field might not exist. Most Key/Value connectors support generating their own unique ID as part of writing the record into the datastore. In the simplest case, a unique _id can be generated by concatenating the Kafka Topic, Partition, and Offset from the SinkRecord. Some connectors (eg the DynamoDB connector) allow the user to configure the set of fields that will be assembled into the , defaulting to the value only if the user does not specify anything else.

drsm79 · 2017-01-27T10:57:30Z

Cloudant (and Couch) will generate an _id for you, a GUID, if one isn't provided. Is that sufficient, or does it need to be deterministic from the records data?

dbtucker · 2017-01-27T16:27:28Z

A completely unique ID is sufficient. The advantage of generating one is that the ID could give you some ordering details from the Kafka framework if that was important downstream ... which is why some other connectors take the approach of concatenating topic-name, partition, and offset (which for a given Kafka Cluster will be unique id for all messages that flow through the system).

danielLinke · 2017-05-02T18:34:55Z

The _id record handling can set in the configuration file connect-cloudant-sink.properties via the optional attribute guid.schema. The attribute can have following values:

guid.schema=kafka (default)
generate a new unique _id with the following schema:
<topic_name>_<partition>_<offset>_<sourceCloudantObjectId>

guid.schema=cloudant
using existing object id from the Cloudant source database

guid.schema=others
Cloudant generate a new _id

HolgerKache mentioned this issue Feb 14, 2017

Leverage Cloudant document _id with offset manager #4

Closed

danielLinke mentioned this issue Apr 26, 2017

Daniel #18

Closed

HolgerKache closed this as completed in d680d8b Apr 28, 2017

This was referenced May 12, 2017

Add Parameter “replicate” for setting guid_schema and kafka_schema #23

Closed

Add Parameter “replicate” for setting guid_schema and kafka_schema #23 #24

Closed

Update: Add Parameter “replicate” for setting guid_schema and kafka_schema #23 #25

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sink should handle records without _id field #8

Sink should handle records without _id field #8

dbtucker commented Dec 13, 2016

drsm79 commented Jan 27, 2017

dbtucker commented Jan 27, 2017

danielLinke commented May 2, 2017 •

edited

Loading

Sink should handle records without _id field #8

Sink should handle records without _id field #8

Comments

dbtucker commented Dec 13, 2016

drsm79 commented Jan 27, 2017

dbtucker commented Jan 27, 2017

danielLinke commented May 2, 2017 • edited Loading

danielLinke commented May 2, 2017 •

edited

Loading