service: serialize information about currently executing command #5354

msekletar · 2017-02-15T14:56:55Z

This PR is trying to address long standing issue that execution of commands is not resumed after daemon-reload.

Fixes #518

poettering

looks pretty good, but a few notes!

poettering · 2017-02-15T15:28:24Z

src/core/service.c

+        first = s->exec_command[id];
+        current = control ? s->control_command : s->main_command;
+
+        /* Figure out where we are in the list be walking back to the beginning */


nitpick: be → by

poettering · 2017-02-15T15:31:34Z

src/core/service.c

+        assert(c);
+
+        index = service_exec_command_index(u, id, control);
+        args = strv_join(c->argv, " ");


missing OOM check...

(or maybe it's even OK if you ignore the OOM condition, as there's little you could do about it. But if you ignore it, then it shouldn't be used as-is in the fprintf() invocation below. Consider simply wrapping it in strempty() there, so that even if this fails we enter a defined state...

poettering · 2017-02-15T15:35:16Z

src/core/service.c

+        index = service_exec_command_index(u, id, control);
+        args = strv_join(c->argv, " ");
+
+        fprintf(f, "%s-command=%s %d %s %s\n", type, service_exec_command_to_string(id), index, c->path, args);


hmm, so we might want to protect ourselves here from command lines that contain a newline. Those would result in borked serialization. I think it would be best to use something smarter than strv_join() for serializing the command line: build your own loop, and concatenate the arguments manually, after processing each individually through cescape(), escaping everything weird, including newlines and spaces.

I assumed that systemd already throws away new lines and other "unimportant whitespace" while parsing unit files, so argv member never points to string that contains garbage whitespace like new lines.

Btw if above is not the case, shouldn't we fix that instead?

well, fprintf() just writes stuff to the serialization as it is, including all whitespaces and weird chars. The idea is to escape what needs escaping beforehand.

Note that the serialization really is just a text file, not more, you could view it with a text editor, and that's totally OK.

During deserialization we process line-by-line, \n is our record separator if you so will, and if you insert additional \n within one record (i.e. line), then this will be read as two records...

poettering · 2017-02-15T15:35:42Z

src/core/service.c

@@ -2441,6 +2490,109 @@ static int service_deserialize_item(Unit *u, const char *key, const char *value,
                        s->watchdog_override_enable = true;
                        s->watchdog_override_usec = watchdog_override_usec;
                }
+        } else if (streq(key, "main-command") || streq(key, "control-command")) {


STR_IN_SET...

poettering · 2017-02-15T15:36:15Z

src/core/service.c

+
+                state = STATE_EXEC_COMMAND_TYPE;
+
+                FOREACH_WORD(word, l, value, parser_state) {


please don't use FOREACH_WORD() in new code anymore. Use a look around extract_first_word() instead...

poettering · 2017-02-15T15:38:05Z

src/core/service.c

+
+                        temp = strndup(word, l);
+                        if (!temp)
+                                return log_oom();


hmm, so, we can't really recover from real errors here. I think the best we can do on OOM is log, stop processing this line but proceed with the next, and hence not return with -ENOMEM here.

poettering · 2017-02-15T15:39:38Z

src/core/service.c

+                index++;
+
+        return index;
+}


given that this function cannot really fail, and can't validly return a negative number I think it would be best to use an unsigned return type for this, such as simply "unsigned"

the return value is still int here....

poettering · 2017-02-15T15:39:57Z

src/core/service.c

+                                state = STATE_EXEC_COMMAND_ID;
+                                break;
+                        case STATE_EXEC_COMMAND_ID:
+                                r = sscanf(temp, "%d", &index);


please use safe_atou() instead!

poettering · 2017-02-15T15:41:20Z

src/core/service.c

+                                temp = NULL;
+                                state = STATE_EXEC_COMMAND_ARGS;
+
+                                assert(path_is_absolute(path));


during deserialization we try to be defensive when it comes to data we read, after all we might do live updates from an older systemd version (or even live downgrades from a newer one!), and hence parse failures should not be fatal, and not even something we log about at a level higher than debug... An assert() is definitely too hard.

well, previous systemd version either doesn't have the code for serialization of exec commands and we never reach this code, or it has that code and it better do it properly. If it is serialized "incorrectly" by previous systemd version then something is very broken.

but I can make it non fatal for sure :)

well, again, in theory downgrades are possible too, and you never know what the future brings. The idea is to write this stuff as defensibly as possible and just ignore bits we can't process and proceed with the next, instead of choking on them

poettering · 2017-02-15T15:41:56Z

src/core/service.c

+                else if (command)
+                        s->main_command = command;
+                else
+                        log_unit_notice(u, "Previously executed command vanished from the unit file, execution of the command list won't be resumed.");


given how complex this is I think this bit should be mostly moved to its own function.

evverx · 2017-02-15T16:00:27Z

test/TEST-16-ISSUE-518/avocado.conf

@@ -0,0 +1,2 @@
+[datadir.paths]
+logs_dir = /tmp/job-results


Why do we need avocado-framework?

This test doesn't run tests inside VM/container (it uses the installed systemd, not the one from the built source tree). Why should we place this test in test/TEST-*-ISSUE-*?

BTW https://anonscm.debian.org/cgit/pkg-systemd/systemd.git/tree/debian/tests/upstream can't run this test of course. So, ubuntu-CI will fail

CentOS CI doesn't run this test too

Why do we need avocado-framework?

We don't necessarily. But I think that framework is actually quite nice for writing high-level "black box" tests. It would be nice to agree (or at least advise to people) on the test framework if they need to write "integration test" for systemd. Anyway, I wrote test mostly for myself, in order to make sure that patch looks as intended. We don't have to include it upstream.

This test doesn't run tests inside VM/container (it uses the installed systemd, not the one from the built source tree). Why should we place this test in test/TEST--ISSUE-?

I don't follow? AFAICT, there are no other "issue specific" tests anywhere else in systemd source tree, except test/. This test exercises new code added to fix issue #518 so I added it to test/ directory.

Yes it will use systemd from the system but I don't think it is a job of the test case to prepare test environment.

It would be nice to agree (or at least advise to people) on the test framework if they need to write "integration test" for systemd

Well, test/networkd-test.py, test/sysv-generator-test.py use unittest

there are no other "issue specific" tests anywhere else in systemd source tree, except test/

Hm, my bad 09f6f45

Anyway, https://github.com/systemd/systemd/blob/master/test/README.testsuite mentions "the extended testsuite". I think we shouldn't mix that with other frameworks.

Yes it will use systemd from the system but I don't think it is a job of the test case to prepare test environment

Maybe, but we should not place such tests into test/TEST-*.

As another option. There are some "beakerlib"-tests https://github.com/systemd/systemd-centos-ci/tree/master/slave/tests

I'd really stick to the framework we already have instead of adding new infrastructure just for one test...

msekletar · 2017-02-20T16:53:20Z

@poettering hopefully I addressed all issues that you've pointed out. @evverx note that test now uses only unittest.

poettering

a couple of more notes! sorry for not noticing them earlier!

poettering · 2017-02-20T17:08:05Z

src/core/service.c

+                index++;
+
+        return index;
+}


the return value is still int here....

poettering · 2017-02-20T17:16:23Z

src/core/service.c

+
+        args = strv_join(escaped_args, " ");
+        if (!args)
+                return -ENOMEM;


hmm, I think instead of preparing a list and then calling strv_join() and throwing the list away again, it would be better to just prepare the string directly. something like this:

… size_t length = 0, allocated = 0; char *args = NULL; … STRV_FOREACH(arg, c->argv) { size_t n; char *e; e = cescape(arg); if (!e) return -ENOMEM; n = strlen(e); if (!GREEDY_REALLOC(args, allocated, length + 1 + n + 1)) return -ENOMEM; if (length > 0) args[length++] = ' '; memcpy(args + length, e, n); length += n; } if (!GREEDY_REALLOC(args, allocated, length + 1) return -ENOMEM; args[length++] = 0; …

Oh, and I just noticed: this should be xunescape(…, WHITESPACE) and not plain cescape() so that the inner whitespace is properly escaped, so that we don't get confused by it while parsing.

I know that code could have been written more clever but I tried to optimize for the reader and tried to make it easy to grok what is going on. In general, only very small subset of units has main_command or control_command running at reload time hence I didn't bother optimizing too much since this code is going to be skipped for majority of units anyway.

And honestly I don't even think that putting string together as we go is necessarily faster. And again I find proposed code much more elegant and easier to read (less manual juggling with the raw memory).

the return value is still int here....

Shouldn't we turn this into error. I can't really think of situation when I'd actually want this mismatched return types and possible silent misinterpretation of data to occur.

well, the direct version isn't even that much longer... And faster too. malloc() is very expensive after all...

To me as a read this just raises all kinds of questions: why turn this into an strv first, if we never actually need that... I am pretty sure it's reasier to grok if we don't take extra detours...

and the return code of that function is still int, but it can't actually return any error, and we blindly return an "unsigned". I am pretty sure we should just stick to one type here, or make the cast explicit. But I think it's actually nice to just return "unsigned" here, simply to indicate that failure isn't possible.

I think both approaches are fine. And like @msekletar said, efficiency isn't an issue here.

poettering · 2017-02-20T17:17:45Z

src/core/service.c

+        if (!args)
+                return -ENOMEM;
+
+        fprintf(f, "%s-command=%s %d %s %s\n", type, service_exec_command_to_string(id), index, c->path, args);


c->path needs escaping too, as it might container weird chars too

poettering · 2017-02-20T17:19:25Z

src/core/service.c

+        for (;;) {
+                _cleanup_free_ char *arg = NULL;
+
+                r = extract_first_word(&value, &arg, NULL, EXTRACT_CUNESCAPE_RELAX);


is there any specific reason for using EXTRACT_CUNESPCAE_RELAX here? I fiugre we should use that only if we are not the ones defining the format. But in this case we are...

No specific reason, EXTRACT_CUNESCAPE_RELAX looked appropriate.

poettering · 2017-02-20T17:20:47Z

src/core/service.c

+                if (!candidate_args)
+                        return -ENOMEM;
+
+                found = streq(args, candidate_args) && streq(command->path, path);


hmm, why first format the arguments into a string? please use strv_equal() to compare the arguments directly.

poettering · 2017-02-20T17:20:58Z

src/core/service.c

+                                return -ENOMEM;
+
+                        if (streq(args, candidate_args) && streq(command->path, path))
+                                break;


I forgot about strv_equal(). Thanks!

poettering · 2017-02-20T17:21:20Z

src/core/service.c

+        else if (command)
+                s->main_command = command;
+        else
+                log_unit_notice(u, "Previously executed command vanished from the unit file, execution of the command list won't be resumed.");


i think this should actually be upgraded to warning.

Yep, warning is more appropriate. I figure this will happen rarely anyway.

evverx · 2017-02-21T01:20:07Z

@evverx note that test now uses only unittest.

@msekletar , I see, thanks

Anyway, test/TEST-*-* are special directories:
https://github.com/systemd/systemd/blob/master/test/README.testsuite

The extended testsuite only works with uid=0. It contains of several
subdirectories named "test/TEST-??-*", which are run one by one.

https://github.com/systemd/systemd/blob/master/test/Makefile

	$(MAKE) -C .. all
	@for i in TEST-[0-9]*; do \
		[ -d $$i ] || continue ; \
		[ -f $$i/Makefile ] || continue ; \
		make -C $$i all ; \

We shouldn't place random tests there.

Try to put into test/test-reload-exec-serialization or so.
Also, it make sense to run this test for every PR. Maybe, @martinpitt can help

CI failed. See #5354 (comment)

martinpitt · 2017-02-21T07:05:49Z

Try to put into test/test-reload-exec-serialization or so.

Or keep it as test/TEST* and copy the boilerplate from the other tests so that it can actually run as part of the "upstream" tests. It would be the first one written in Python, but as long as it's bilingual (CentOS only has Python 2, Ubuntu minimal installations only Python 3) and you call it through $(which python3 || which python2) mytest.py it should work.

You should be able to run the upstream test with something like

sudo make -C test/TEST-01-BASIC clean setup run

Try it on an exising test first to ensure you have QEMU et al installed, then try it on your new one.

Also, it make sense to run this test for every PR. Maybe, @martinpitt can help

Yeah, if it's going to be a custom one like test/networkd-test.py I can add a new autopkgtest for it.

keszybz

I love the testing framework. Not sure where it should live, but that's a minor issue.

I know this is a bit of a bike-shed issue, but still: why do you want to compare the command line including the command line args? In practice, command line args might include ephemeral stuff that changes after reload. For example some command line options that you changed and are reloading the unit for. This checking of arguments makes everything more complicated, but I don't see the gain. Why not just make the rules simpler and only compare the argv0?

keszybz · 2017-02-21T15:27:28Z

test/TEST-16-ISSUE-518/test.py

+class ExecutionResumeTest(unittest.TestCase):
+    def setUp(self):
+        self.unit = 'test-issue-518.service'
+        self.unitfile_path = '/run/systemd/system/{0}'.format(self.unit)


No "0" needed.

keszybz · 2017-02-21T15:28:27Z

test/TEST-16-ISSUE-518/test.py

+import os
+import tempfile
+
+from pydbus import SystemBus


That's a new dep. Please add to README.

or perhaps use gi.repository import Gio, GLib directly, i. e. the GDBus API: like Gio.bus_get_sync(Gio.BusType.SYSTEM,) None) or Gio.DBusProxy.new_sync() etc. pydbus just wraps that, and at least for these simple cases using a wrapper doesn't really buy much.

keszybz · 2017-02-21T15:29:32Z

test/TEST-16-ISSUE-518/test.py

+import tempfile
+
+from pydbus import SystemBus
+from enum import Enum


Hm, that's an external dep in Python 2 that needs adding to the README.The python2-enum34 package.

hmm, i thought we settled on strictly requiring python3 for everything, no? shouldn't we do the same here, too?

Hm, right; I wrote that because there is no Python 3 in Centos 7 (where we run the "default" test), but there we just run the complete build without Python support:

checking for a Python interpreter with version >= 3... none

So I guess py 3 only is fine.

keszybz · 2017-02-21T15:38:22Z

src/core/service.c

+
+        args = strv_join(escaped_args, " ");
+        if (!args)
+                return -ENOMEM;


I think both approaches are fine. And like @msekletar said, efficiency isn't an issue here.

fsateler · 2017-02-23T13:08:51Z

test/TEST-16-ISSUE-518/test.py

@@ -0,0 +1,186 @@
+#!/usr/bin/python


All other python scripts are using python3. #4761

I am currently working on systemd/systemd#5354 and PR also introduces first python based test to TEST-* series. This commit is prerequisite for running that test in Cent OS CI. https://www.softwarecollections.org/en/scls/rhscl/rh-python35/

msekletar · 2017-03-14T14:18:46Z

I am ready to resubmit the PR but I have a problem with new test. If we strictly require that all our python code runs under python3 then I will drop the test from PR. I am using pydbus library that requires gobject introspection python binding and this binding is not available on CentOS 7.

msekletar · 2017-03-14T14:19:13Z

@martinpitt Any suggestions what to do about the test?

martinpitt

I suggest to change the build system to only run the test if HAVE_PYTHON.

martinpitt · 2017-03-14T14:55:02Z

test/TEST-16-ISSUE-518/test.py

+import os
+import tempfile
+
+from pydbus import SystemBus


or perhaps use gi.repository import Gio, GLib directly, i. e. the GDBus API: like Gio.bus_get_sync(Gio.BusType.SYSTEM,) None) or Gio.DBusProxy.new_sync() etc. pydbus just wraps that, and at least for these simple cases using a wrapper doesn't really buy much.

martinpitt · 2017-03-14T14:55:28Z

test/TEST-16-ISSUE-518/test.py

+        self.write_unit_file(UnitFileChange.REMOVAL)
+        self.reload_unit()
+
+        self.assertTrue(not os.path.exists(self.output_file))


nitpick, but there's assertFalse which avoids the not.

martinpitt · 2017-03-14T14:55:54Z

test/TEST-16-ISSUE-518/test.py

+            os.remove(self.unitfile_path)
+        except OSError:
+            # ignore error if log file doesn't exist
+            pass


Can it happen that output_file doesn't exist, but unitfile_path does? Then you need to catch/ignore them independently.

evverx · 2017-03-14T22:31:42Z

@msekletar , what is the problem to name the test test/test-exec-deserialization.py?

Or are you going to edit

too?

evverx · 2017-03-14T23:03:53Z

I am using pydbus library that requires gobject introspection python binding and this binding is not available on CentOS 7.

Well, I don't understand why we need pydbus at all. We can run

subprocess.check_call(['systemctl', 'daemon-reload'])
subprocess.check_call(['systemctl', 'start', $unit, '--job-mode', 'replace'])

. No?

https://github.com/systemd/systemd/blob/master/test/networkd-test.py does exactly that. Try git grep systemctl test/networkd-test.py. Or am I missing something?

msekletar · 2017-03-16T11:43:22Z

https://github.com/systemd/systemd/blob/master/test/networkd-test.py does exactly that. Try git grep systemctl test/networkd-test.py. Or am I missing something?

Sure I can do that.

As you said we don't need pydbus for this specific case. But we (internally at Red Hat) started working on systemd DBus API test suite. We were looking for nice and easy to use dbus library for python and we've chosen pydbus. I figured it would be easier to bring this dependency in before we post our DBus API test suite.

msekletar · 2017-03-16T16:27:49Z

I've force pushed my branch. I've changed the test as suggested by @evverx.

Note CentOS CI failure seems unrelated. Looks like jenkins slave was unable to communicate with Duffy API service and obtain test bed.

evverx · 2017-03-16T21:22:38Z

test/test-exec-deserialization.py

+#  Lesser General Public License for more details.
+#
+#  You should have received a copy of the GNU Lesser General Public License
+#  along with python-systemd; If not, see <http://www.gnu.org/licenses/>.


s/python-systemd/systemd/

evverx · 2017-03-16T21:35:03Z

test/test-exec-deserialization.py

+#
+#  You should have received a copy of the GNU Lesser General Public License
+#  along with python-systemd; If not, see <http://www.gnu.org/licenses/>.
+


Could you please add a comment explaining how to run the test? I have to run something like

make && make install && systemctl daemon-reexec && ./test-exec-deserialization

btw networkd-test contains

# ATTENTION: This uses the *installed* networkd, not the one from the built # source tree.

I didn't notice this comment. Sure I can add similar comment to my test. Even though I think it is kinda obvious that if you don't do make && sudo make install && sudo init u then of course you will be testing your old systemd version and not the git version.

evverx · 2017-03-16T21:46:33Z

test/test-exec-deserialization.py

+        self.check_output(expected_output)
+
+    def test_added_after(self):
+        expected_output = 'foo\nbar\n'


Well, I expect foo\n here. Why should we resume an execution?

I altered the test

diff --git a/test/test-exec-deserialization.py b/test/test-exec-deserialization.py index 8dbd153..719101f 100755 --- a/test/test-exec-deserialization.py +++ b/test/test-exec-deserialization.py @@ -67,9 +67,9 @@ class ExecutionResumeTest(unittest.TestCase): unit_file_content = ''' [Service] Type=oneshot - ExecStart=/bin/sleep 2 - ExecStart=/bin/bash -c "echo foo >> {0}" - ExecStart=/bin/bash -c "echo bar >> {0}" + ExecStart=-/bin/sleep 2 + ExecStart=-/bin/bash -c "echo foo >> {0}" + ExecStart=-/bin/bash -c "echo bar >> {0}" '''.format(self.output_file) self.unit_files[UnitFileChange.COMMAND_ADDED_AFTER] = unit_file_content

The test passed. Why should we ignore -?

Well, I expect foo\n here. Why should we resume an execution?

Sorry, I was wrong. I expect '' (not foo\nbar\n)

Well, I expect foo\n here. Why should we resume an execution?

In setup_unit() we start following unit,

[Service] Type=oneshot ExecStart=/bin/sleep 2 ExecStart=/bin/bash -c "echo foo >> /tmp/test12345"

then we immediately replace unit file with following content and reload,

[Service] Type=oneshot ExecStart=/bin/sleep 2 ExecStart=/bin/bash -c "echo foo >> /tmp/test12345" ExecStart=/bin/bash -c "echo bar >> /tmp/test12345"

Before reload unit's main-command was sleep 2 we store that information reload and since command is still there after reload at expected place we then continue executing command list as usual and hence both echo commands get executed.

evverx · 2017-03-16T23:22:55Z

Well, it seems wrong to resume altered units. Has this approach been discussed somewhere?

msekletar · 2017-03-17T09:39:39Z

Well, it seems wrong to resume altered units. Has this approach been discussed somewhere?

AFAICT, this approach was first proposed here,

#518 (comment)

Except I am not serializing line numbers, instead I am using offsets in respective command list. Since those are not affected by addition of whitespace or configuration to unrelated sections ([Unit], [Install]).

msekletar · 2017-03-17T09:51:14Z

The test passed. Why should we ignore -?

Well - in front of the command affects how state machine works in case that command fails but has no effect on what is going to be executed next in non-failure case, right?

msekletar · 2017-03-17T09:55:01Z

I have fixed the comments now.

evverx · 2017-03-17T10:12:15Z

AFAICT, this approached was first proposed here

Thanks for the link. I'll take a look.

Well - in front of the command affects how state machine works in case that command fails but has no effect on what is going to be executed next in non-failure case, right?

Well, my point is that I don't want to run anything if some unit is altered somehow.

Anyway, I'm reading #3202 (comment)

However, the thing is that we have tons of options on all types of units that take no immediate effect on a "systemctl daemon-reload", and if we wanted to do this we really should do this properly, and expose a new property on all units that indicates whether there are config changes detected due to a "systemctl daemon-reload" that have not taken effect because the unit was not restarted yet.

Why should we make an exception for Exec*? Should we document that?

msekletar · 2017-03-20T14:17:06Z

Why should we make an exception for Exec*? Should we document that?

This boils down to fundamental design trait of systemd. We have two state engines, one for jobs and one for units. However job state engine doesn't work with its own view of the world instead it uses units dependency graph and unit objects in that graph are affected by daemon-reload. Maybe the other model in which jobs would see their own copy of units they operate on would be better.

However, I think we need to do something about this, at least for Exec* option because currently systemd jobs are completely unreliable and stray reload can cause issues that are very hard to debug.

Honestly I'd want not to be in shoes of a sysadmin performing data recovery just to realize that last backup that was supposed to happen actually didn't. Because systemd didn't execute half of the procedure just because puppet at the same time decided to re-enable some random service that someone else disabled by accident.

keszybz · 2017-03-26T22:58:10Z

src/core/service.c

+        Service *s = SERVICE(u);
+        ServiceExecCommand id;
+        ExecCommand *c;
+        int index;


This should be unsigned.

keszybz · 2017-03-26T22:59:39Z

src/core/service.c

+        if (!p)
+                return -ENOMEM;
+
+        fprintf(f, "%s-command=%s %d %s %s\n", type, service_exec_command_to_string(id), index, p, args);


... and %u here.

keszybz · 2017-03-26T23:23:55Z

src/core/service.c

+
+        /* Let's check whether exec command on given offset matches data that we just deserialized */
+        for (command = s->exec_command[id], i = 0; command; command = command->command_next, i++) {
+                if (i != index)


Not that it matters much in practice, but index could be uninitialized here. Also path could be NULL.

In fact, using a state machine for this seems overkill, since there's only a single trivial pass possible. Why not do something like this:

_cleanup_free_ char *idstr = NULL, *indexstr = NULL, *path = NULL; char *arg; r = extract_first_word(&value, &idstr, NULL, EXTRACT_CUNESCAPE); if (r <= 0) return r < 0 ? r : -EUCLEAN; id = service_exec_command_from_string(idstr); if (id < 0) return -EINVAL; r = extract_first_word(&value, &indexstr, NULL, EXTRACT_CUNESCAPE); if (r <= 0) return r < 0 ? r : -EUCLEAN; r = safe_atou(indexstr, &index); if (r < 0) return -EINVAL; r = extract_first_word(&value, &path, NULL, EXTRACT_CUNESCAPE); if (r <= 0) return r < 0 ? r : -EUCLEAN; if (!path_is_absolute(path)) return -EINVAL; while ((r = extract_first_word(&value, &arg, NULL, EXTRACT_CUNESCAPE) > 0) { r = strv_push(&argv, arg); if (r < 0) break; } if (r < 0) return r; ...

That's both shorter and imho more bullet-proof.

keszybz · 2017-03-26T23:27:25Z

src/core/service.c

+        else if (command)
+                s->main_command = command;
+        else
+                log_unit_warning(u, "Previously executed command vanished from the unit file, execution of the command list won't be resumed.");


"Previously executed", hm, it's more like "currently executed". Maybe "Current command vanished from the unit file, execution of the command list won't be continued"?

keszybz · 2017-03-26T23:28:12Z

src/core/service.c

+        } else if (STR_IN_SET(key, "main-command", "control-command")) {
+                r = service_deserialize_exec_command(u, key, value);
+                if (r < 0)
+                        log_unit_debug(u, "Failed to parse previously executed command: %s", value);


log_unit_debug_errno(u, r, "Failed to parse serialized command \"%s\": %m", value);

keszybz · 2017-03-26T23:30:08Z

Makefile.am

@@ -5961,6 +5961,7 @@ EXTRA_DIST += \
 	units/systemd-networkd.service.m4.in \
 	units/systemd-networkd-wait-online.service.in \
 	test/networkd-test.py
+	test/test-exec-deserialization.py


Missing backslash!

poettering · 2017-03-30T13:02:36Z

src/core/service.c

@@ -2140,6 +2141,83 @@ _pure_ static bool service_can_reload(Unit *u) {
        return !!s->exec_command[SERVICE_EXEC_RELOAD];
 }

+static unsigned service_exec_command_index(Unit *u, ServiceExecCommand id, bool control) {
+        Service *s = SERVICE(u);
+        unsigned index = 0;


nitpick, and sorry for noticing this earlier: there's a public glibc function "index()", I'd thus recommend not overloading the name locally here, but instead pick a different name, for example "idx" or just "i", or something like that. (old gcc even used to warn about this)

poettering

two minor other issues, you are welcome to ignore, as they are suggestions, not more.

poettering · 2017-03-30T13:05:32Z

src/core/service.c

@@ -2140,6 +2141,83 @@ _pure_ static bool service_can_reload(Unit *u) {
        return !!s->exec_command[SERVICE_EXEC_RELOAD];
 }

+static unsigned service_exec_command_index(Unit *u, ServiceExecCommand id, bool control) {


i think it would be nicer to drop the "control" argument here, and instead simply take an ExecCommand* parameter, which is then either passed as s->control_command or s->main_command.

poettering · 2017-03-30T13:06:20Z

src/core/service.c

+        } else {
+                type = "main";
+                c = s->main_command;
+                id = SERVICE_EXEC_START;


similar as above, instead of taking a "control" bool, I'd just take these three values as function parameters, so that the caller picks them the right way

Stored information will help us to resume execution after the daemon-reload. This commit implements following scheme, * On serialization: - we count rank of the currently executing command - we store command type, its rank and command line arguments * On deserialization: - configuration is parsed and loaded - we deserialize stored data, command type, rank and arguments - we look at the given rank in the list and if command there has same arguments then we restore execution at that point - otherwise we search respective command list and we look for command that has the same arguments - if both methods fail we do not do not resume execution at all To better illustrate how does above scheme works, please consider following cases (<<< denotes position where we resume execution after reload) ; Original unit file [Service] ExecStart=/bin/true <<< ExecStart=/bin/false ; Swapped commands ; Second command is not going to be executed [Service] ExecStart=/bin/false ExecStart=/bin/true <<< ; Commands added before ; Same commands are problematic and execution could be restarted at wrong place [Service] ExecStart=/bin/foo ExecStart=/bin/bar ExecStart=/bin/true <<< ExecStart=/bin/false ; Commands added after ; Same commands are not an issue in this case [Service] ExecStart=/bin/true <<< ExecStart=/bin/false ExecStart=/bin/foo ExecStart=/bin/bar ; New commands interleaved with old commands ; Some new commands will be executed while others won't ExecStart=/bin/foo ExecStart=/bin/true <<< ExecStart=/bin/bar ExecStart=/bin/false As you can see, above scheme has some drawbacks. However, in most cases (we assume that in most common case unit file command list is not changed while some other command is running for the same unit) it should cause that systemd does the right thing, which is restoring execution exactly at the point we were before daemon-reload. Fixes systemd#518

msekletar · 2017-04-11T08:36:31Z

I force pushed my feature branch with new version. Please have a look. Thanks!

@keszybz I left state machine in place (I mean pass is not trivial after all since there is loop over arguments). But if you insist I can remove it. Honestly, I like it because I immediately see from the code what is a structure of serialized data.

poettering · 2017-04-24T17:51:25Z

i think all issues raised have been addressed in some form. Merging.

msekletar added the pid1 label Feb 15, 2017

poettering requested changes Feb 15, 2017

View reviewed changes

evverx reviewed Feb 15, 2017

View reviewed changes

keszybz added the reviewed/needs-rework 🔨 PR has been reviewed and needs another round of reworks label Feb 17, 2017

msekletar force-pushed the issue-518 branch from ff644ad to 6499387 Compare February 20, 2017 16:51

msekletar removed the reviewed/needs-rework 🔨 PR has been reviewed and needs another round of reworks label Feb 20, 2017

poettering requested changes Feb 20, 2017

View reviewed changes

evverx added the ci-fails/needs-rework 🔥 Please rework this, the CI noticed an issue with the PR label Feb 21, 2017

keszybz reviewed Feb 21, 2017

View reviewed changes

fsateler reviewed Feb 23, 2017

View reviewed changes

test/TEST-16-ISSUE-518/test.py Outdated

@@ -0,0 +1,186 @@

#!/usr/bin/python

Copy link

Member

fsateler Feb 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All other python scripts are using python3. #4761

msekletar mentioned this pull request Mar 13, 2017

bootstrap: install python3 from software collections systemd/systemd-centos-ci#13

Closed

martinpitt requested changes Mar 14, 2017

View reviewed changes

msekletar force-pushed the issue-518 branch from 6499387 to 440dfd9 Compare March 16, 2017 15:50

msekletar removed the ci-fails/needs-rework 🔥 Please rework this, the CI noticed an issue with the PR label Mar 16, 2017

evverx requested changes Mar 16, 2017

View reviewed changes

msekletar force-pushed the issue-518 branch from 440dfd9 to c61aa5c Compare March 17, 2017 09:54

evverx mentioned this pull request Mar 21, 2017

core: when deserializing a unit, fully restore its cgroup state #5619

Merged

keszybz requested changes Mar 26, 2017

View reviewed changes

poettering reviewed Mar 30, 2017

View reviewed changes

poettering added the reviewed/needs-rework 🔨 PR has been reviewed and needs another round of reworks label Mar 30, 2017

msekletar added 2 commits April 11, 2017 09:22

tests: add new test for issue systemd#518

123d672

msekletar force-pushed the issue-518 branch from c61aa5c to 123d672 Compare April 11, 2017 08:31

msekletar removed the reviewed/needs-rework 🔨 PR has been reviewed and needs another round of reworks label Apr 11, 2017

evverx mentioned this pull request Apr 11, 2017

Update bootstrap.sh and testsuite.sh to run test-exec-deserialization.py systemd/systemd-centos-ci#14

Closed

poettering merged commit 8ea9aa9 into systemd:master Apr 24, 2017

a-detiste mentioned this pull request Sep 15, 2017

cron-after-var.service hard-codes wrong path to systemctl systemd-cron/systemd-cron#54

Closed

evverx mentioned this pull request Jul 4, 2019

CentOS CI improvements systemd/systemd-centos-ci#36

Closed

8 tasks


		state = STATE_EXEC_COMMAND_TYPE;

		FOREACH_WORD(word, l, value, parser_state) {

service: serialize information about currently executing command #5354

service: serialize information about currently executing command #5354

Conversation

msekletar commented Feb 15, 2017 • edited Loading

poettering left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

evverx Feb 15, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

evverx Feb 15, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

msekletar commented Feb 20, 2017

poettering left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

evverx commented Feb 21, 2017 • edited Loading

martinpitt commented Feb 21, 2017

keszybz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

msekletar commented Mar 14, 2017

msekletar commented Mar 14, 2017

martinpitt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

evverx commented Mar 14, 2017

evverx commented Mar 14, 2017

msekletar commented Mar 16, 2017

msekletar commented Mar 16, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

msekletar commented Feb 15, 2017 •

edited

Loading

evverx Feb 15, 2017 •

edited

Loading

evverx Feb 15, 2017 •

edited

Loading

evverx commented Feb 21, 2017 •

edited

Loading

evverx Mar 16, 2017 •

edited

Loading

msekletar commented Mar 17, 2017 •

edited

Loading

msekletar commented Mar 17, 2017 •

edited

Loading