-
-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
service: serialize information about currently executing command #5354
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks pretty good, but a few notes!
src/core/service.c
Outdated
first = s->exec_command[id]; | ||
current = control ? s->control_command : s->main_command; | ||
|
||
/* Figure out where we are in the list be walking back to the beginning */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: be → by
src/core/service.c
Outdated
assert(c); | ||
|
||
index = service_exec_command_index(u, id, control); | ||
args = strv_join(c->argv, " "); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing OOM check...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(or maybe it's even OK if you ignore the OOM condition, as there's little you could do about it. But if you ignore it, then it shouldn't be used as-is in the fprintf() invocation below. Consider simply wrapping it in strempty() there, so that even if this fails we enter a defined state...
src/core/service.c
Outdated
index = service_exec_command_index(u, id, control); | ||
args = strv_join(c->argv, " "); | ||
|
||
fprintf(f, "%s-command=%s %d %s %s\n", type, service_exec_command_to_string(id), index, c->path, args); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, so we might want to protect ourselves here from command lines that contain a newline. Those would result in borked serialization. I think it would be best to use something smarter than strv_join() for serializing the command line: build your own loop, and concatenate the arguments manually, after processing each individually through cescape(), escaping everything weird, including newlines and spaces.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assumed that systemd already throws away new lines and other "unimportant whitespace" while parsing unit files, so argv member never points to string that contains garbage whitespace like new lines.
Btw if above is not the case, shouldn't we fix that instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well, fprintf() just writes stuff to the serialization as it is, including all whitespaces and weird chars. The idea is to escape what needs escaping beforehand.
Note that the serialization really is just a text file, not more, you could view it with a text editor, and that's totally OK.
During deserialization we process line-by-line, \n is our record separator if you so will, and if you insert additional \n within one record (i.e. line), then this will be read as two records...
src/core/service.c
Outdated
@@ -2441,6 +2490,109 @@ static int service_deserialize_item(Unit *u, const char *key, const char *value, | |||
s->watchdog_override_enable = true; | |||
s->watchdog_override_usec = watchdog_override_usec; | |||
} | |||
} else if (streq(key, "main-command") || streq(key, "control-command")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
STR_IN_SET...
src/core/service.c
Outdated
|
||
state = STATE_EXEC_COMMAND_TYPE; | ||
|
||
FOREACH_WORD(word, l, value, parser_state) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please don't use FOREACH_WORD() in new code anymore. Use a look around extract_first_word() instead...
src/core/service.c
Outdated
|
||
temp = strndup(word, l); | ||
if (!temp) | ||
return log_oom(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, so, we can't really recover from real errors here. I think the best we can do on OOM is log, stop processing this line but proceed with the next, and hence not return with -ENOMEM here.
src/core/service.c
Outdated
index++; | ||
|
||
return index; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
given that this function cannot really fail, and can't validly return a negative number I think it would be best to use an unsigned return type for this, such as simply "unsigned"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the return value is still int here....
src/core/service.c
Outdated
state = STATE_EXEC_COMMAND_ID; | ||
break; | ||
case STATE_EXEC_COMMAND_ID: | ||
r = sscanf(temp, "%d", &index); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please use safe_atou() instead!
src/core/service.c
Outdated
temp = NULL; | ||
state = STATE_EXEC_COMMAND_ARGS; | ||
|
||
assert(path_is_absolute(path)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
during deserialization we try to be defensive when it comes to data we read, after all we might do live updates from an older systemd version (or even live downgrades from a newer one!), and hence parse failures should not be fatal, and not even something we log about at a level higher than debug... An assert() is definitely too hard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well, previous systemd version either doesn't have the code for serialization of exec commands and we never reach this code, or it has that code and it better do it properly. If it is serialized "incorrectly" by previous systemd version then something is very broken.
but I can make it non fatal for sure :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well, again, in theory downgrades are possible too, and you never know what the future brings. The idea is to write this stuff as defensibly as possible and just ignore bits we can't process and proceed with the next, instead of choking on them
src/core/service.c
Outdated
else if (command) | ||
s->main_command = command; | ||
else | ||
log_unit_notice(u, "Previously executed command vanished from the unit file, execution of the command list won't be resumed."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
given how complex this is I think this bit should be mostly moved to its own function.
test/TEST-16-ISSUE-518/avocado.conf
Outdated
@@ -0,0 +1,2 @@ | |||
[datadir.paths] | |||
logs_dir = /tmp/job-results |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need avocado-framework
?
This test doesn't run tests inside VM/container
(it uses the installed systemd, not the one from the built source tree). Why should we place this test in test/TEST-*-ISSUE-*
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW https://anonscm.debian.org/cgit/pkg-systemd/systemd.git/tree/debian/tests/upstream can't run this test of course. So, ubuntu-CI will fail
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CentOS CI doesn't run this test too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need avocado-framework?
We don't necessarily. But I think that framework is actually quite nice for writing high-level "black box" tests. It would be nice to agree (or at least advise to people) on the test framework if they need to write "integration test" for systemd. Anyway, I wrote test mostly for myself, in order to make sure that patch looks as intended. We don't have to include it upstream.
This test doesn't run tests inside VM/container (it uses the installed systemd, not the one from the built source tree). Why should we place this test in test/TEST--ISSUE-?
I don't follow? AFAICT, there are no other "issue specific" tests anywhere else in systemd source tree, except test/. This test exercises new code added to fix issue #518 so I added it to test/ directory.
Yes it will use systemd from the system but I don't think it is a job of the test case to prepare test environment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to agree (or at least advise to people) on the test framework if they need to write "integration test" for systemd
Well, test/networkd-test.py
, test/sysv-generator-test.py
use unittest
there are no other "issue specific" tests anywhere else in systemd source tree, except test/
Hm, my bad 09f6f45
Anyway, https://github.com/systemd/systemd/blob/master/test/README.testsuite mentions "the extended testsuite". I think we shouldn't mix that with other frameworks.
Yes it will use systemd from the system but I don't think it is a job of the test case to prepare test environment
Maybe, but we should not place such tests into test/TEST-*
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As another option. There are some "beakerlib"-tests https://github.com/systemd/systemd-centos-ci/tree/master/slave/tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd really stick to the framework we already have instead of adding new infrastructure just for one test...
@poettering hopefully I addressed all issues that you've pointed out. @evverx note that test now uses only |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a couple of more notes! sorry for not noticing them earlier!
src/core/service.c
Outdated
index++; | ||
|
||
return index; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the return value is still int here....
src/core/service.c
Outdated
|
||
args = strv_join(escaped_args, " "); | ||
if (!args) | ||
return -ENOMEM; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, I think instead of preparing a list and then calling strv_join() and throwing the list away again, it would be better to just prepare the string directly. something like this:
…
size_t length = 0, allocated = 0;
char *args = NULL;
…
STRV_FOREACH(arg, c->argv) {
size_t n;
char *e;
e = cescape(arg);
if (!e)
return -ENOMEM;
n = strlen(e);
if (!GREEDY_REALLOC(args, allocated, length + 1 + n + 1))
return -ENOMEM;
if (length > 0)
args[length++] = ' ';
memcpy(args + length, e, n);
length += n;
}
if (!GREEDY_REALLOC(args, allocated, length + 1)
return -ENOMEM;
args[length++] = 0;
…
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, and I just noticed: this should be xunescape(…, WHITESPACE) and not plain cescape() so that the inner whitespace is properly escaped, so that we don't get confused by it while parsing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know that code could have been written more clever but I tried to optimize for the reader and tried to make it easy to grok what is going on. In general, only very small subset of units has main_command or control_command running at reload time hence I didn't bother optimizing too much since this code is going to be skipped for majority of units anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And honestly I don't even think that putting string together as we go is necessarily faster. And again I find proposed code much more elegant and easier to read (less manual juggling with the raw memory).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the return value is still int here....
Shouldn't we turn this into error. I can't really think of situation when I'd actually want this mismatched return types and possible silent misinterpretation of data to occur.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well, the direct version isn't even that much longer... And faster too. malloc() is very expensive after all...
To me as a read this just raises all kinds of questions: why turn this into an strv first, if we never actually need that... I am pretty sure it's reasier to grok if we don't take extra detours...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and the return code of that function is still int, but it can't actually return any error, and we blindly return an "unsigned". I am pretty sure we should just stick to one type here, or make the cast explicit. But I think it's actually nice to just return "unsigned" here, simply to indicate that failure isn't possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think both approaches are fine. And like @msekletar said, efficiency isn't an issue here.
src/core/service.c
Outdated
if (!args) | ||
return -ENOMEM; | ||
|
||
fprintf(f, "%s-command=%s %d %s %s\n", type, service_exec_command_to_string(id), index, c->path, args); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
c->path needs escaping too, as it might container weird chars too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure
src/core/service.c
Outdated
for (;;) { | ||
_cleanup_free_ char *arg = NULL; | ||
|
||
r = extract_first_word(&value, &arg, NULL, EXTRACT_CUNESCAPE_RELAX); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there any specific reason for using EXTRACT_CUNESPCAE_RELAX here? I fiugre we should use that only if we are not the ones defining the format. But in this case we are...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No specific reason, EXTRACT_CUNESCAPE_RELAX looked appropriate.
src/core/service.c
Outdated
if (!candidate_args) | ||
return -ENOMEM; | ||
|
||
found = streq(args, candidate_args) && streq(command->path, path); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, why first format the arguments into a string? please use strv_equal() to compare the arguments directly.
src/core/service.c
Outdated
return -ENOMEM; | ||
|
||
if (streq(args, candidate_args) && streq(command->path, path)) | ||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forgot about strv_equal()
. Thanks!
src/core/service.c
Outdated
else if (command) | ||
s->main_command = command; | ||
else | ||
log_unit_notice(u, "Previously executed command vanished from the unit file, execution of the command list won't be resumed."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think this should actually be upgraded to warning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, warning is more appropriate. I figure this will happen rarely anyway.
@msekletar , I see, thanks Anyway,
https://github.com/systemd/systemd/blob/master/test/Makefile
We shouldn't place random tests there. Try to put into CI failed. See #5354 (comment) |
Or keep it as You should be able to run the upstream test with something like
Try it on an exising test first to ensure you have QEMU et al installed, then try it on your new one.
Yeah, if it's going to be a custom one like test/networkd-test.py I can add a new autopkgtest for it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love the testing framework. Not sure where it should live, but that's a minor issue.
I know this is a bit of a bike-shed issue, but still: why do you want to compare the command line including the command line args? In practice, command line args might include ephemeral stuff that changes after reload. For example some command line options that you changed and are reloading the unit for. This checking of arguments makes everything more complicated, but I don't see the gain. Why not just make the rules simpler and only compare the argv0?
test/TEST-16-ISSUE-518/test.py
Outdated
class ExecutionResumeTest(unittest.TestCase): | ||
def setUp(self): | ||
self.unit = 'test-issue-518.service' | ||
self.unitfile_path = '/run/systemd/system/{0}'.format(self.unit) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No "0" needed.
test/TEST-16-ISSUE-518/test.py
Outdated
import os | ||
import tempfile | ||
|
||
from pydbus import SystemBus |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a new dep. Please add to README.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or perhaps use gi.repository import Gio, GLib
directly, i. e. the GDBus API: like Gio.bus_get_sync(Gio.BusType.SYSTEM,) None)
or Gio.DBusProxy.new_sync()
etc. pydbus just wraps that, and at least for these simple cases using a wrapper doesn't really buy much.
test/TEST-16-ISSUE-518/test.py
Outdated
import tempfile | ||
|
||
from pydbus import SystemBus | ||
from enum import Enum |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, that's an external dep in Python 2 that needs adding to the README.The python2-enum34 package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, i thought we settled on strictly requiring python3 for everything, no? shouldn't we do the same here, too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, right; I wrote that because there is no Python 3 in Centos 7 (where we run the "default" test), but there we just run the complete build without Python support:
checking for a Python interpreter with version >= 3... none
So I guess py 3 only is fine.
src/core/service.c
Outdated
|
||
args = strv_join(escaped_args, " "); | ||
if (!args) | ||
return -ENOMEM; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think both approaches are fine. And like @msekletar said, efficiency isn't an issue here.
test/TEST-16-ISSUE-518/test.py
Outdated
@@ -0,0 +1,186 @@ | |||
#!/usr/bin/python |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All other python scripts are using python3. #4761
I am currently working on systemd/systemd#5354 and PR also introduces first python based test to TEST-* series. This commit is prerequisite for running that test in Cent OS CI. https://www.softwarecollections.org/en/scls/rhscl/rh-python35/
I am ready to resubmit the PR but I have a problem with new test. If we strictly require that all our python code runs under python3 then I will drop the test from PR. I am using pydbus library that requires gobject introspection python binding and this binding is not available on CentOS 7. |
@martinpitt Any suggestions what to do about the test? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest to change the build system to only run the test if HAVE_PYTHON
.
test/TEST-16-ISSUE-518/test.py
Outdated
import os | ||
import tempfile | ||
|
||
from pydbus import SystemBus |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or perhaps use gi.repository import Gio, GLib
directly, i. e. the GDBus API: like Gio.bus_get_sync(Gio.BusType.SYSTEM,) None)
or Gio.DBusProxy.new_sync()
etc. pydbus just wraps that, and at least for these simple cases using a wrapper doesn't really buy much.
test/TEST-16-ISSUE-518/test.py
Outdated
self.write_unit_file(UnitFileChange.REMOVAL) | ||
self.reload_unit() | ||
|
||
self.assertTrue(not os.path.exists(self.output_file)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick, but there's assertFalse
which avoids the not
.
test/TEST-16-ISSUE-518/test.py
Outdated
os.remove(self.unitfile_path) | ||
except OSError: | ||
# ignore error if log file doesn't exist | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can it happen that output_file
doesn't exist, but unitfile_path
does? Then you need to catch/ignore them independently.
@msekletar , what is the problem to name the test Or are you going to edit
too? |
Well, I don't understand why we need
. No? https://github.com/systemd/systemd/blob/master/test/networkd-test.py does exactly that. Try |
Sure I can do that. As you said we don't need pydbus for this specific case. But we (internally at Red Hat) started working on systemd DBus API test suite. We were looking for nice and easy to use dbus library for python and we've chosen pydbus. I figured it would be easier to bring this dependency in before we post our DBus API test suite. |
I've force pushed my branch. I've changed the test as suggested by @evverx. Note CentOS CI failure seems unrelated. Looks like jenkins slave was unable to communicate with Duffy API service and obtain test bed. |
test/test-exec-deserialization.py
Outdated
# Lesser General Public License for more details. | ||
# | ||
# You should have received a copy of the GNU Lesser General Public License | ||
# along with python-systemd; If not, see <http://www.gnu.org/licenses/>. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/python-systemd/systemd/
test/test-exec-deserialization.py
Outdated
# | ||
# You should have received a copy of the GNU Lesser General Public License | ||
# along with python-systemd; If not, see <http://www.gnu.org/licenses/>. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please add a comment explaining how to run the test? I have to run something like
make && make install && systemctl daemon-reexec && ./test-exec-deserialization
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw networkd-test
contains
# ATTENTION: This uses the *installed* networkd, not the one from the built
# source tree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't notice this comment. Sure I can add similar comment to my test. Even though I think it is kinda obvious that if you don't do make && sudo make install && sudo init u
then of course you will be testing your old systemd version and not the git version.
self.check_output(expected_output) | ||
|
||
def test_added_after(self): | ||
expected_output = 'foo\nbar\n' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I expect foo\n
here. Why should we resume an execution?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I altered the test
diff --git a/test/test-exec-deserialization.py b/test/test-exec-deserialization.py
index 8dbd153..719101f 100755
--- a/test/test-exec-deserialization.py
+++ b/test/test-exec-deserialization.py
@@ -67,9 +67,9 @@ class ExecutionResumeTest(unittest.TestCase):
unit_file_content = '''
[Service]
Type=oneshot
- ExecStart=/bin/sleep 2
- ExecStart=/bin/bash -c "echo foo >> {0}"
- ExecStart=/bin/bash -c "echo bar >> {0}"
+ ExecStart=-/bin/sleep 2
+ ExecStart=-/bin/bash -c "echo foo >> {0}"
+ ExecStart=-/bin/bash -c "echo bar >> {0}"
'''.format(self.output_file)
self.unit_files[UnitFileChange.COMMAND_ADDED_AFTER] = unit_file_content
The test passed. Why should we ignore -
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I expect foo\n here. Why should we resume an execution?
Sorry, I was wrong. I expect ''
(not foo\nbar\n
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I expect foo\n here. Why should we resume an execution?
In setup_unit() we start following unit,
[Service]
Type=oneshot
ExecStart=/bin/sleep 2
ExecStart=/bin/bash -c "echo foo >> /tmp/test12345"
then we immediately replace unit file with following content and reload,
[Service]
Type=oneshot
ExecStart=/bin/sleep 2
ExecStart=/bin/bash -c "echo foo >> /tmp/test12345"
ExecStart=/bin/bash -c "echo bar >> /tmp/test12345"
Before reload unit's main-command
was sleep 2
we store that information reload and since command is still there after reload at expected place we then continue executing command list as usual and hence both echo commands get executed.
Well, it seems wrong to resume altered units. Has this approach been discussed somewhere? |
AFAICT, this approach was first proposed here, Except I am not serializing line numbers, instead I am using offsets in respective command list. Since those are not affected by addition of whitespace or configuration to unrelated sections ( |
Well |
I have fixed the comments now. |
Thanks for the link. I'll take a look.
Well, my point is that I don't want to run anything if some unit is altered somehow. Anyway, I'm reading #3202 (comment)
Why should we make an exception for |
This boils down to fundamental design trait of systemd. We have two state engines, one for jobs and one for units. However job state engine doesn't work with its own view of the world instead it uses units dependency graph and unit objects in that graph are affected by daemon-reload. Maybe the other model in which jobs would see their own copy of units they operate on would be better. However, I think we need to do something about this, at least for Honestly I'd want not to be in shoes of a sysadmin performing data recovery just to realize that last backup that was supposed to happen actually didn't. Because systemd didn't execute half of the procedure just because puppet at the same time decided to re-enable some random service that someone else disabled by accident. |
src/core/service.c
Outdated
Service *s = SERVICE(u); | ||
ServiceExecCommand id; | ||
ExecCommand *c; | ||
int index; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be unsigned
.
src/core/service.c
Outdated
if (!p) | ||
return -ENOMEM; | ||
|
||
fprintf(f, "%s-command=%s %d %s %s\n", type, service_exec_command_to_string(id), index, p, args); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... and %u
here.
src/core/service.c
Outdated
|
||
/* Let's check whether exec command on given offset matches data that we just deserialized */ | ||
for (command = s->exec_command[id], i = 0; command; command = command->command_next, i++) { | ||
if (i != index) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not that it matters much in practice, but index
could be uninitialized here. Also path
could be NULL.
In fact, using a state machine for this seems overkill, since there's only a single trivial pass possible. Why not do something like this:
_cleanup_free_ char *idstr = NULL, *indexstr = NULL, *path = NULL;
char *arg;
r = extract_first_word(&value, &idstr, NULL, EXTRACT_CUNESCAPE);
if (r <= 0)
return r < 0 ? r : -EUCLEAN;
id = service_exec_command_from_string(idstr);
if (id < 0)
return -EINVAL;
r = extract_first_word(&value, &indexstr, NULL, EXTRACT_CUNESCAPE);
if (r <= 0)
return r < 0 ? r : -EUCLEAN;
r = safe_atou(indexstr, &index);
if (r < 0)
return -EINVAL;
r = extract_first_word(&value, &path, NULL, EXTRACT_CUNESCAPE);
if (r <= 0)
return r < 0 ? r : -EUCLEAN;
if (!path_is_absolute(path))
return -EINVAL;
while ((r = extract_first_word(&value, &arg, NULL, EXTRACT_CUNESCAPE) > 0) {
r = strv_push(&argv, arg);
if (r < 0)
break;
}
if (r < 0)
return r;
...
That's both shorter and imho more bullet-proof.
src/core/service.c
Outdated
else if (command) | ||
s->main_command = command; | ||
else | ||
log_unit_warning(u, "Previously executed command vanished from the unit file, execution of the command list won't be resumed."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Previously executed", hm, it's more like "currently executed". Maybe "Current command vanished from the unit file, execution of the command list won't be continued"?
src/core/service.c
Outdated
} else if (STR_IN_SET(key, "main-command", "control-command")) { | ||
r = service_deserialize_exec_command(u, key, value); | ||
if (r < 0) | ||
log_unit_debug(u, "Failed to parse previously executed command: %s", value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
log_unit_debug_errno(u, r, "Failed to parse serialized command \"%s\": %m", value);
Makefile.am
Outdated
@@ -5961,6 +5961,7 @@ EXTRA_DIST += \ | |||
units/systemd-networkd.service.m4.in \ | |||
units/systemd-networkd-wait-online.service.in \ | |||
test/networkd-test.py | |||
test/test-exec-deserialization.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing backslash!
src/core/service.c
Outdated
@@ -2140,6 +2141,83 @@ _pure_ static bool service_can_reload(Unit *u) { | |||
return !!s->exec_command[SERVICE_EXEC_RELOAD]; | |||
} | |||
|
|||
static unsigned service_exec_command_index(Unit *u, ServiceExecCommand id, bool control) { | |||
Service *s = SERVICE(u); | |||
unsigned index = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick, and sorry for noticing this earlier: there's a public glibc function "index()", I'd thus recommend not overloading the name locally here, but instead pick a different name, for example "idx" or just "i", or something like that. (old gcc even used to warn about this)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
two minor other issues, you are welcome to ignore, as they are suggestions, not more.
src/core/service.c
Outdated
@@ -2140,6 +2141,83 @@ _pure_ static bool service_can_reload(Unit *u) { | |||
return !!s->exec_command[SERVICE_EXEC_RELOAD]; | |||
} | |||
|
|||
static unsigned service_exec_command_index(Unit *u, ServiceExecCommand id, bool control) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think it would be nicer to drop the "control" argument here, and instead simply take an ExecCommand* parameter, which is then either passed as s->control_command
or s->main_command
.
src/core/service.c
Outdated
} else { | ||
type = "main"; | ||
c = s->main_command; | ||
id = SERVICE_EXEC_START; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similar as above, instead of taking a "control" bool, I'd just take these three values as function parameters, so that the caller picks them the right way
Stored information will help us to resume execution after the daemon-reload. This commit implements following scheme, * On serialization: - we count rank of the currently executing command - we store command type, its rank and command line arguments * On deserialization: - configuration is parsed and loaded - we deserialize stored data, command type, rank and arguments - we look at the given rank in the list and if command there has same arguments then we restore execution at that point - otherwise we search respective command list and we look for command that has the same arguments - if both methods fail we do not do not resume execution at all To better illustrate how does above scheme works, please consider following cases (<<< denotes position where we resume execution after reload) ; Original unit file [Service] ExecStart=/bin/true <<< ExecStart=/bin/false ; Swapped commands ; Second command is not going to be executed [Service] ExecStart=/bin/false ExecStart=/bin/true <<< ; Commands added before ; Same commands are problematic and execution could be restarted at wrong place [Service] ExecStart=/bin/foo ExecStart=/bin/bar ExecStart=/bin/true <<< ExecStart=/bin/false ; Commands added after ; Same commands are not an issue in this case [Service] ExecStart=/bin/true <<< ExecStart=/bin/false ExecStart=/bin/foo ExecStart=/bin/bar ; New commands interleaved with old commands ; Some new commands will be executed while others won't ExecStart=/bin/foo ExecStart=/bin/true <<< ExecStart=/bin/bar ExecStart=/bin/false As you can see, above scheme has some drawbacks. However, in most cases (we assume that in most common case unit file command list is not changed while some other command is running for the same unit) it should cause that systemd does the right thing, which is restoring execution exactly at the point we were before daemon-reload. Fixes systemd#518
I force pushed my feature branch with new version. Please have a look. Thanks! @keszybz I left state machine in place (I mean pass is not trivial after all since there is loop over arguments). But if you insist I can remove it. Honestly, I like it because I immediately see from the code what is a structure of serialized data. |
i think all issues raised have been addressed in some form. Merging. |
This PR is trying to address long standing issue that execution of commands is not resumed after daemon-reload.
Fixes #518