Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix 58141 - ThreadPool leak in fqdn lookup #61777

Merged
merged 6 commits into from
Apr 6, 2022
Merged

Conversation

waynew
Copy link
Contributor

@waynew waynew commented Mar 10, 2022

What does this PR do?

What it says on the tin.

What issues does this PR fix or reference?

Fixes #58141

Previous Behavior

If an error occurred within the inner fqdn lookup function then the ThreadPool
would not cleanup, causing a thread leak every time the fqdn lookup happened.
Which could have been a lot, since grains uses this lookup. Especially since
the grain lookup may have crashed, so every call would require a re-compute of
the fqdn.

New Behavior

Changed to ThreadPoolExecutor which has the same API except that it can be used
in a with block to automagically ensure tha the ThreadPool is cleaned up when
it's done.

Merge requirements satisfied?

[NOTICE] Bug fixes or features added to Salt require tests.

- [ ] Docs

Commits signed with GPG?

Yes

waynew added 3 commits March 10, 2022 12:22
I wasn't positive what other errors could be raised here, but we did
have a user reporting thread leaks, and the existing code *could* have
thread leaks if unhandled exceptions are raised on the ThreadPool.

This test *should* work - there might be some weird circumstances where
other threads are magically closed between here? But that seems less
likely.
While working on making the ThreadPool fixes I discovered a slight lack
of tests and opportunities for refactoring. These tests pass with the
existing setup, and will help verify that my refactoring changes do not
harm anything.
Before if a unaccounted for error happened during fqdn lookup then the
ThreadPool would exit without cleaning up. If, for instance you have a
malformed PTR record then something bad would happen with the lookup and
the fqdn function would exit without cleaning up the ThreadPool.

This fixed saltstack#59705 which had this exact problem.

I refactored this to use ThreadPoolExecutor which will actually cleanup
the ThreadPool on with-block exit. I also put a broad exception handler
in the fqdn internal function. This will prevent all but the most
egregious errors (i.e. ones that don't inherit from Exception - e.g.
SystemExit) from crashing the fqdn lookup process.

Also took a couple of opportunities for refactor, such as the fact that
`sorted` takes any iterable and returns a list, so I could remove that
double-call.
@waynew waynew requested a review from a team as a code owner March 10, 2022 18:23
@waynew waynew requested review from Ch3LL and removed request for a team March 10, 2022 18:23
@Ch3LL Ch3LL added the Phosphorus v3005.0 Release code name and version label Mar 10, 2022
@waynew
Copy link
Contributor Author

waynew commented Apr 5, 2022

this is blocked by #61919

@waynew
Copy link
Contributor Author

waynew commented Apr 5, 2022

Other failures appear to just be the test machines failing 🤷 Will update when blocker is merged

@waynew waynew self-assigned this Apr 5, 2022
@waynew
Copy link
Contributor Author

waynew commented Apr 6, 2022

Just merged in the test failures, hopefully that sorts all the test failures

@github-actions
Copy link

github-actions bot commented Apr 6, 2022

Hi! I'm your friendly PR bot!

You might be wondering what I'm doing commenting here on your PR.

Yes, as a matter of fact, I am...

I'm just here to help us improve the documentation. I can't respond to
questions or anything, but what I can do, I do well!

Okay... so what do you do?

I detect modules that are missing docstrings or "CLI Example" on existing docstrings!
When I was created we had a lot of these. The documentation for these
modules need some love and attention to make Salt better for our users.

So what does that have to do with my PR?

I noticed that in this PR there are some files changed that have some of these
issues. So I'm leaving this comment to let you know your options.

Okay, what are they?

Well, my favorite, is that since you were making changes here I'm hoping that
you would be the most familiar with this module and be able to add some other
examples or fix any of the reported issues.

If I can, then what?

Well, you can either add them to this PR or add them to another PR. Either way is fine!

Well... what if I can't, or don't want to?

That's also fine! We appreciate all contributions to the Salt Project. If you
can't add those other examples, either because you're too busy, or unfamiliar,
or you just aren't interested, we still appreciate the contributions that
you've made already.

Whatever approach you decide to take, just drop a comment here letting us know!

Detected Issues (click me)
Check Known Missing Docstrings...........................................Failed
- hook id: invoke
- duration: 1.7s
- exit code: 1

The function 'fqdns' on 'salt/modules/network.py' does not have a 'CLI Example:' in it's docstring
Found 1 errors


Thanks again!

@Ch3LL Ch3LL merged commit 7f99c75 into saltstack:master Apr 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Phosphorus v3005.0 Release code name and version
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Thread leak for salt-minion 3001
3 participants