Skip to content

Conversation

keyvankhademi
Copy link
Contributor

Summary

This PR fixes the deduplication logic in agent.

Rationale

There is a bug in the current deduplication logic, dedup_by_key only removes the consecutive duplicated. It doesn't work if there an update for another feed in the middle of the 2 duplicated update for a feed.

We also improve the logic to only includes at most 1 update per feed. This is done by getting the last distinct update, while prioritizing the lower timestamps. This is because the aggregator effectively ignores the previous updates by immediately overriding them with new values.

How has this been tested?

  • Current tests cover my changes
  • Added new tests
  • Manually tested the code

Wrote a test that would fail with the current deduplication logic.

Copy link

vercel bot commented Sep 8, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
api-reference Ready Ready Preview Comment Sep 9, 2025 8:47pm
component-library Ready Ready Preview Comment Sep 9, 2025 8:47pm
developer-hub Ready Ready Preview Comment Sep 9, 2025 8:47pm
entropy-explorer Ready Ready Preview Comment Sep 9, 2025 8:47pm
insights Ready Ready Preview Comment Sep 9, 2025 8:47pm
proposals Ready Ready Preview Comment Sep 9, 2025 8:47pm
staking Ready Ready Preview Comment Sep 9, 2025 8:47pm

@danimhr
Copy link
Contributor

danimhr commented Sep 9, 2025

@keyvankhademi The ci is failing :-?

Copy link
Collaborator

@ali-behjati ali-behjati left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, please bump the version as well.

@bplatak
Copy link
Contributor

bplatak commented Sep 9, 2025

@keyvankhademi this wasn't a bug - we made a deliberate decision at the time not to censor any information sent to us by publishers and instead handle this in lazer after the updates are persisted to the db. But yes this will, at least with the publisher the dedup was originally added for, have a big reduction in the number of sent updates.

@keyvankhademi
Copy link
Contributor Author

@keyvankhademi this wasn't a bug - we made a deliberate decision at the time not to censor any information sent to us by publishers and instead handle this in lazer after the updates are persisted to the db. But yes this will, at least with the publisher the dedup was originally added for, have a big reduction in the number of sent updates.

@bplatak No there was a real bug. See the example below:
feed_id, ts, price
1, 1, 10
2, 2, 20
1, 3, 10
The dedup_by_key will not remove the last entry because it only dedups consecutive elements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants