-
Notifications
You must be signed in to change notification settings - Fork 297
fix: deduplication logic in agent #3033
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The latest updates on your projects. Learn more about Vercel for GitHub.
|
@keyvankhademi The ci is failing :-? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, please bump the version as well.
@keyvankhademi this wasn't a bug - we made a deliberate decision at the time not to censor any information sent to us by publishers and instead handle this in lazer after the updates are persisted to the db. But yes this will, at least with the publisher the dedup was originally added for, have a big reduction in the number of sent updates. |
@bplatak No there was a real bug. See the example below: |
Summary
This PR fixes the deduplication logic in agent.
Rationale
There is a bug in the current deduplication logic,
dedup_by_key
only removes the consecutive duplicated. It doesn't work if there an update for another feed in the middle of the 2 duplicated update for a feed.We also improve the logic to only includes at most 1 update per feed. This is done by getting the last distinct update, while prioritizing the lower timestamps. This is because the aggregator effectively ignores the previous updates by immediately overriding them with new values.
How has this been tested?
Wrote a test that would fail with the current deduplication logic.