Skip to content
This repository has been archived by the owner on Nov 11, 2022. It is now read-only.

Allow for user controlled unique id in BigQuery Write #32

Closed
benjumanji opened this issue Jun 5, 2015 · 3 comments
Closed

Allow for user controlled unique id in BigQuery Write #32

benjumanji opened this issue Jun 5, 2015 · 3 comments

Comments

@benjumanji
Copy link

If we do a streaming write, the write connector will generate unique ids and tag rows transparently to avoid duplication. If the user knows something about their data like "the transaction-id column is unique across the table", it would be nice to be able to use that instead of the generated ids.

@q42jaap
Copy link

q42jaap commented Jun 11, 2015

+1

1 similar comment
@dovy
Copy link

dovy commented Sep 2, 2015

+1

@dhalperi dhalperi self-assigned this Sep 23, 2015
@dhalperi dhalperi changed the title Allow user for user controlled unique id in BigQuery Write Allow for user controlled unique id in BigQuery Write Oct 16, 2015
@dhalperi
Copy link
Contributor

My understanding of this FR is that users typically think this will help them de-dupe across jobs or long periods of time. These row-IDs are very short-lived (1 minute). So it does not make sense to let users control them.

If you have some other ID you can de-dupe with, they can use GroupByKey or RemoveDuplicates within your pipeline to do so.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants