Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New-offset is off between 0.2.0 and 0.3.0, resulting in reprocessing last record (or many records) on worker restart #48

Closed
2 tasks done
jkgenser opened this issue Nov 30, 2020 · 0 comments

Comments

@jkgenser
Copy link

jkgenser commented Nov 30, 2020

Checklist

  • I have included information about relevant versions
  • I have verified that the issue persists when using the master branch of Faust.

Steps to reproduce

  • Pretty much any time you restart a worker, it will replay the last message it received. So if there were messages [0,1,2,3,4] that a worker processed, then restart, it will re-process item 4. This will mess up any analytics that are based on stateful counts. With a trivial case of incrementing a counter in a table, this can consistently reproduced by simply restarting and starting a worker and finding the last id continue to increment even though there were no new messages to the underlying topic.

  • If using the group_by functionality to re-partition a stream, I am finding that it will replay ALL of the messages resulting in much more duplicates than simply +1 to counts.

Expected behavior

  • Do not replay the most recent message.

Actual behavior

  • Replays messages on restart.

Full traceback

Paste the full traceback (if there is any)

Versions

  • Python version: 3.7
  • Faust version: 0.3.0
  • Operating system: ubuntu 18.04
  • Kafka version: latest
  • RocksDB version (if applicable)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants