Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libsubprocess: fix mem-leak on subprocess server teardown #1956

Merged
merged 5 commits into from Jan 25, 2019

Conversation

@chu11
Copy link
Contributor

commented Jan 25, 2019

Fix mem-leak when subprocess server is being torn down but a local subprocess has not yet completed.

chu11 added 4 commits Jan 24, 2019
When the subprocess server is torn down, kill all subprocesses
that are still running.
In the subprocess server, free / destroy message copy via aux
free function instead of manually freeing / destroying the message
copy.
Cleanup subprocesses in the primary server subprocess hash via
a free function.

This refactoring has an important side effect.  When a server is
torn down, the destruction of the subprocesses hash will lead to
all remaining subprocesses memory being destroyed / freed properly.

Fixes #1930
@grondo

This comment has been minimized.

Copy link
Contributor

commented Jan 25, 2019

Nice! Want to throw a fix for #1955 in here as well?

Missing ZMQ_CFLAGS causes build to fail when czmq installs
headers to /usr/include/czmq instead of /usr/include.

Fixes #1955
@chu11

This comment has been minimized.

Copy link
Contributor Author

commented Jan 25, 2019

sure thing!

@codecov-io

This comment has been minimized.

Copy link

commented Jan 25, 2019

Codecov Report

Merging #1956 into master will increase coverage by 0.04%.
The diff coverage is 81.48%.

@@            Coverage Diff             @@
##           master    #1956      +/-   ##
==========================================
+ Coverage   80.01%   80.06%   +0.04%     
==========================================
  Files         195      195              
  Lines       34931    34948      +17     
==========================================
+ Hits        27951    27980      +29     
+ Misses       6980     6968      -12
Impacted Files Coverage Δ
src/common/libsubprocess/subprocess.c 87.72% <100%> (+0.02%) ⬆️
src/common/libsubprocess/server.c 70.91% <80.76%> (+0.91%) ⬆️
src/modules/kvs/kvs.c 66.35% <0%> (+0.14%) ⬆️
src/cmd/flux-module.c 83.96% <0%> (+0.23%) ⬆️
src/common/libflux/message.c 81.64% <0%> (+0.24%) ⬆️
src/modules/connector-local/local.c 74.81% <0%> (+1.03%) ⬆️
src/common/libflux/response.c 82.71% <0%> (+1.23%) ⬆️
@grondo

This comment has been minimized.

Copy link
Contributor

commented Jan 25, 2019

@chu11, [looks at PR title] I think you were doing too much kvs work lately! 😆

@chu11

This comment has been minimized.

Copy link
Contributor Author

commented Jan 25, 2019

@grondo LOL fixed!

@chu11 chu11 changed the title kvs: fix mem-leak on subprocess server teardown libsubprocess: fix mem-leak on subprocess server teardown Jan 25, 2019
@garlick

This comment has been minimized.

Copy link
Member

commented Jan 25, 2019

LGTM. I'll just retry my reproducer...

Looks like subprocesses are SIGKILLed and not waited for. Just spoke with @grondo and this is prob fine for now but later on we might want something like a SIGTERM and wait() with timeout, followed by SIGKILL. We can open an issue for that later though.

@garlick

This comment has been minimized.

Copy link
Member

commented Jan 25, 2019

Passes with flying colors, even running tests with high concurrency to make the timing as unfavorable as possible.

@chu11

This comment has been minimized.

Copy link
Contributor Author

commented Jan 25, 2019

later on we might want something like a SIGTERM and wait() with timeout, followed by SIGKILL. We can open an issue for that later though.

That's a good idea, in case the process needs to clean itself up.

@garlick

This comment has been minimized.

Copy link
Member

commented Jan 25, 2019

Thanks @chu11!

@garlick garlick merged commit 5d0a7a5 into flux-framework:master Jan 25, 2019
3 checks passed
3 checks passed
codecov/patch 81.48% of diff hit (target 80.01%)
Details
codecov/project 80.06% (+0.04%) compared to 34ed5f6
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.