Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sink should handle records without _id field #8

Closed
dbtucker opened this issue Dec 13, 2016 · 3 comments
Closed

Sink should handle records without _id field #8

dbtucker opened this issue Dec 13, 2016 · 3 comments

Comments

@dbtucker
Copy link

The 'put" method simply looks for the "_id" field in the Sink record. But that field might not exist. Most Key/Value connectors support generating their own unique ID as part of writing the record into the datastore. In the simplest case, a unique _id can be generated by concatenating the Kafka Topic, Partition, and Offset from the SinkRecord. Some connectors (eg the DynamoDB connector) allow the user to configure the set of fields that will be assembled into the , defaulting to the value only if the user does not specify anything else.

@drsm79
Copy link
Member

drsm79 commented Jan 27, 2017

Cloudant (and Couch) will generate an _id for you, a GUID, if one isn't provided. Is that sufficient, or does it need to be deterministic from the records data?

@dbtucker
Copy link
Author

A completely unique ID is sufficient. The advantage of generating one is that the ID could give you some ordering details from the Kafka framework if that was important downstream ... which is why some other connectors take the approach of concatenating topic-name, partition, and offset (which for a given Kafka Cluster will be unique id for all messages that flow through the system).

@danielLinke
Copy link

danielLinke commented May 2, 2017

The _id record handling can set in the configuration file connect-cloudant-sink.properties via the optional attribute guid.schema. The attribute can have following values:

guid.schema=kafka (default)
generate a new unique _id with the following schema:
<topic_name>_<partition>_<offset>_<sourceCloudantObjectId>

guid.schema=cloudant
using existing object id from the Cloudant source database

guid.schema=others
Cloudant generate a new _id

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants