Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reset internal state on fork to prevent deadlocks in worker threads #139

Merged
merged 1 commit into from
May 8, 2021

Conversation

terencehonles
Copy link
Contributor

I'm opening this PR up to start a discussion on behavior while forking. I believe this may be related to #89 and #31. It looks like there has been recent activity on #31 (comment) and this may possibly need some adjustment.

This change allows subprocesses not to deadlock when calling os.fork() because it will reset the logger's internal state on forking (threads don't survive a fork, but self.queue[s].put may be locked while forking). It does seem like boto3 sessions could also end up causing issues, but I'm not sure if the thread safety comment is actually an issue for this library.

This is still not "perfect" because the documentation on os._exit suggests that it should be called when exiting a fork, which bypasses the logging module's default shutdown routine, which includes flushing the logs. If the documentation suggests that watchtower should be flushed before exiting a fork then I believe this change plus making sure flush is called will ensure that the logs are delivered.

Even without documentation suggesting to flush the logs, this change will at least make sure there new threads created to handle the reset queues. Otherwise forked copies of this library believe the threads to be still alive even though they are not. An alternative implementation could check if the thread for the queue is still alive, but that case would still need to handle what to do with the queued logs, and in the case of forking those logs are duplicates.

self.create_log_stream = create_log_stream
self.log_group_retention_days = log_group_retention_days
self._init_state()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be called with the parent's __init__ which creates its lock. Since it's ok to init the state twice I made it more obvious it's being called by calling it directly. I can add a comment on super().__init__ if that seems better (I went with this way in case the stdlib changed, but that seems unlikely).

@kislyuk
Copy link
Owner

kislyuk commented Jan 30, 2021

I agree with this change in principle for reinitializing state in _at_fork_reinit(). I am not convinced that it's safe to reinitialize state in createLock(). Can you please remove those lines?

While reinitializing state in _at_fork_reinit() is unambiguously good as a defensive move, it does not ensure that watchtower handlers can be used safely after forking. So in general (and on Python before 3.9), I would like to add a section to the documentation advising users to initialize watchtower handlers after forking.

@terencehonles
Copy link
Contributor Author

terencehonles commented Apr 30, 2021

Sorry I got pulled away from this, but it looks like we're actually having issues with this on shutdown of uWSGI and my attention was reverted back to this.

I agree with this change in principle for reinitializing state in _at_fork_reinit(). I am not convinced that it's safe to reinitialize state in createLock(). Can you please remove those lines?

That would only allow this code to support Python 3.9 and above mentioned. In other scenarios it would cause a deadlock. If there is something else you'd like to see in createLock() we probably should explore that.

While reinitializing state in _at_fork_reinit() is unambiguously good as a defensive move, it does not ensure that watchtower handlers can be used safely after forking. So in general (and on Python before 3.9), I would like to add a section to the documentation advising users to initialize watchtower handlers after forking.

I'm not sure I follow exactly the issue. Yes, I agree, there may be data loss. However after forking there will not be a thread listening to the queue and it will need to be "restarted". This code resets the internal state so that the child can restart its queue and threads. Alternatively we may be able to store a weakref or just check to see if the thread is alive and if it's not alive try to restart it. The reason I hadn't gone with that approach is it might send too much data. The only time I "really" see a scenario for data loss is if the parent forks, and then execs itself to something else. Otherwise the thread in the parent process will eventually send the messages that are expected to be sent and the child should be able to continue as expected. The only other scenario I can see messing things up is if someone else is calling the createLock() or _at_fork_reinit() code outside of the expected life cycles of these objects.

@kislyuk
Copy link
Owner

kislyuk commented May 8, 2021

OK, thanks. I'll have to think more about this and what kinds of warnings to add in the docs, but on the face of it this seems correct.

The safest thing to do is always to share nothing between threads/processes. That's the gist of what I was trying to get at, and what I'll try to advise in the docs.

@terencehonles terencehonles deleted the handle-forking branch May 10, 2021 20:55
@Flauschbaellchen
Copy link

@kislyuk Are you able to release a new version so this bugfix/PR gets published? Would also help me with #141 - Thanks!

@Flauschbaellchen
Copy link

@kislyuk May I ask again if a more recent release would be possible to finally review and close #141?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants