Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix spurious exit when epoll_wait is interrupted by a signal #125

Merged
merged 1 commit into from
Jul 31, 2023

Conversation

alindima
Copy link
Contributor

I discovered a hang in the pyroscope agent, that is triggered when the Timer thread gets interrupted by a signal.
Instead of retrying the epoll_wait call, the Timer thread simply exits and no data is fed into the server.

This PR checks the epoll_wait error and retries the call if it gets EINTR.

@alindima
Copy link
Contributor Author

@omarabid @korniltsev can I get a review?

@korniltsev
Copy link
Collaborator

I may look into it this week. How do I reproduce? Just SIGSTOP SIGCONT?

Copy link
Collaborator

@korniltsev korniltsev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch. Thanks for investigation.

@korniltsev korniltsev merged commit 3c23037 into grafana:main Jul 31, 2023
4 of 5 checks passed
@korniltsev
Copy link
Collaborator

Not related to the PR and the issue but I noticed timer_fd and epoll_fd are not closed during any error in the epoll thread and seem to be leaking in case of an error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants