-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sink should handle records without _id field #8
Comments
Cloudant (and Couch) will generate an _id for you, a GUID, if one isn't provided. Is that sufficient, or does it need to be deterministic from the records data? |
A completely unique ID is sufficient. The advantage of generating one is that the ID could give you some ordering details from the Kafka framework if that was important downstream ... which is why some other connectors take the approach of concatenating topic-name, partition, and offset (which for a given Kafka Cluster will be unique id for all messages that flow through the system). |
The _id record handling can set in the configuration file connect-cloudant-sink.properties via the optional attribute guid.schema. The attribute can have following values: guid.schema=kafka (default) guid.schema=cloudant guid.schema=others |
The 'put" method simply looks for the "_id" field in the Sink record. But that field might not exist. Most Key/Value connectors support generating their own unique ID as part of writing the record into the datastore. In the simplest case, a unique _id can be generated by concatenating the Kafka Topic, Partition, and Offset from the SinkRecord. Some connectors (eg the DynamoDB connector) allow the user to configure the set of fields that will be assembled into the , defaulting to the value only if the user does not specify anything else.
The text was updated successfully, but these errors were encountered: