Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stream Enrich: ensure a one-to-one relationship between sink and record processor #3745

Closed
asgergb opened this issue Apr 26, 2018 · 5 comments
Assignees

Comments

@asgergb
Copy link

asgergb commented Apr 26, 2018

After upgrading to R102 from R96 following the Upgrade Guide, I see duplicated events in the enriched stream in Kinesis.

The image below which shows the number of events in the raw web stream (prior to enrichment), the good enriched stream and the bad enriched stream. As you can see there are many more events in the enriched good stream than the raw stream after deploying R102 at 9:40. I redeployed R102 at 10:10 and then rolled back to R96 at 10:20. Looking at the enriched good stream, around half or so of the events are duplicated, that is, two identical events are on the stream.

snowplow_r102_duplicate_enriched_events

Due to #3744, I am unable to test R103, and I have not been able to find the cause. Do you have any idea what's happening?

@BenFradet
Copy link
Contributor

BenFradet commented Apr 26, 2018

hey @asgergb , could you expose your issue in our discourse and provide as much info as possible (number of shards of each stream, number of instances for stream enrich, etc) so we can try to reproduce?

@asgergb
Copy link
Author

asgergb commented Apr 26, 2018

Sure, I'll do that as soon as possible. We have a holiday coming up but I'll try to get around to it.

@asgergb
Copy link
Author

asgergb commented Apr 26, 2018

FYI I have tried the official artifacts for R103 (collector 0.13.0 and enricher 0.16.0) as part of #3744, and the result is the same with roughly half of all events being duplicated.

@asgergb
Copy link
Author

asgergb commented Apr 30, 2018

@BenFradet Here you go: https://discourse.snowplowanalytics.com/t/stream-enrich-duplicated-enriched-events-in-r103-3745/1986

@BenFradet BenFradet added this to the R105 milestone Apr 30, 2018
@BenFradet BenFradet added the bug label Apr 30, 2018
@alexanderdean alexanderdean changed the title Stream Enrich: Duplicated enriched events in R102 Stream Enrich: duplicated enriched events in R102 Apr 30, 2018
@BenFradet BenFradet changed the title Stream Enrich: duplicated enriched events in R102 Stream Enrich: ensure a one-to-one relationship between source and sink Apr 30, 2018
@BenFradet BenFradet changed the title Stream Enrich: ensure a one-to-one relationship between source and sink Stream Enrich: ensure a one-to-one relationship between sink and event processor Apr 30, 2018
@BenFradet
Copy link
Contributor

BenFradet commented Apr 30, 2018

The problem was that the same sink was reused across kcl's IRecordProcessors and so the same sink would flush as many times as there were shards.

@BenFradet BenFradet changed the title Stream Enrich: ensure a one-to-one relationship between sink and event processor Stream Enrich: ensure a one-to-one relationship between sink and record processor Apr 30, 2018
BenFradet added a commit that referenced this issue Apr 30, 2018
BenFradet added a commit that referenced this issue Apr 30, 2018
BenFradet added a commit that referenced this issue May 1, 2018
BenFradet added a commit that referenced this issue May 1, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants