Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faust silent failure on startup introduced with PR #403 #453

Closed
austinnichols101 opened this issue Oct 20, 2019 · 3 comments
Closed

Faust silent failure on startup introduced with PR #403 #453

austinnichols101 opened this issue Oct 20, 2019 · 3 comments

Comments

@austinnichols101
Copy link

I have observed faust intermittently failing silently on startup with the 1.8.x series and have been able to trace it back to PR #403, commit c0daac1.

Note: special thanks to @patkivikram for helping me track this down...

Faust failing silently on startup

[2019-10-19 18:22:32,085] [46808] [INFO] [^---Recovery]: Resuming flow... 
[2019-10-19 18:22:32,085] [46808] [INFO] [^---Recovery]: Seek stream partitions to committed offsets. 
[2019-10-19 18:22:35,510] [46808] [INFO] [^--Fetcher]: Starting... 
[2019-10-19 18:22:35,510] [46808] [INFO] [^---Recovery]: Worker ready 
[2019-10-19 18:22:35,511] [46808] [INFO] [^Worker]: Ready 
[1]    46808 killed     faust --app streampunk.app worker --loglevel info

The _add_gap function in faust/transport/consumer.py is being called with a VERY large offset_from / offset_to delta. In the screenshot below a list is populated from 1 to 2,288,752,002 which results in python running out of memory and triggering the failure. Note that in the screenshot I added an exception handler to see if I could trap the error (I could not - python fails silently).

PyCharm debugger output
image

Resource Monitor: Memory Usage
image

PyCharm debugger: Values passed to _add_gap
image

Versions

  • Python version 3.7.4
  • Faust version 1.8.1
  • Kafka version 2.3
This was referenced Oct 20, 2019
@ask
Copy link
Contributor

ask commented Oct 28, 2019

This is fixed in master, and will be part of 1.9 release

@ask ask closed this as completed Oct 30, 2019
@austinnichols101
Copy link
Author

austinnichols101 commented Oct 31, 2019

I'm still seeing this problem in 1.9.0

The gap_for_tp list grows beyond available memory.

image

2,951,296,758 offset_from
3,136,952,810 offset_to

185,656,052 target length of gap_for_tp (OOM)

@austinnichols101
Copy link
Author

state before calling _add_gap

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants