Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

honor-first-shutdown-request #15400

Merged
merged 1 commit into from Jun 24, 2020

Conversation

ejb42
Copy link
Contributor

@ejb42 ejb42 commented Apr 10, 2020

@keszybz
Here is the new pull request I spoke of in the previous one I just closed.
I went ahead with the use of log_info instead of using log_unit_info since using manager_state() instead of manager_get_unit() precludes the use of log_unit_info. If you wish that instead I can change it but I think the use of manager_state is cleaner.
I also added a test in the test directory which incorporates the use of the test service file and script.

@ejb42 ejb42 mentioned this pull request Apr 13, 2020
@ejb42
Copy link
Contributor Author

ejb42 commented Apr 28, 2020

@keszybz Is there anything I need to do for this pull request? Is there something wrong?

Copy link
Member

@poettering poettering left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, so you are only doing this for shutdowns initiated via the emergency action logic. That is intended, yeah? I think I am fine with this, I am just wondering if you are aware that this does not affect "systemctl poweroff" and so on, as those are regular, human requested shutdowns and not emergency ones...

if ((manager_state(m) == MANAGER_STOPPING) &&
((action == EMERGENCY_ACTION_REBOOT) ||
(action == EMERGENCY_ACTION_POWEROFF))) {
log_info("EmergencyAction: Shutdown is already active Skipping %s request",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, so manager_state() does a lot of things internally, most of which aren't interesting here. I think I would make the test differently:

Unit *u;
u = manager_get_unit(m, SPECIAL_SHUTDOWN_TARGET);
if (u && unit_active_or_pending(u) &&
        IN_SET(action, EMERGENCY_ACTION_REBOOT, EMERGENCY_ACTION_POWEROFF, EMERGENCY_ACTION_EXIT)) {
…

Also, I think this should be log_notice(), not log_info()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ANd yeah, EMERGENCY_ACTION_EXIT should be covered too here, it's just another way to shutdown (one you would only use in a container, but that's a detail)


[Service]
ExecStart=/test-honor-first-shutdown.sh
ExecStop=pkill -9 test-honor-first-shutdown.sh
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ExecStop=kill -KILL $MAINPID

should just work, and not pull in pkill

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pkill was the method I was using to cause termination of the loop inside the script and to cause the FailureAction=reboot emergency action during the shutdown. Otherwise the shutdown would just SIGTERM the script and no failure would be seen correct? I was thinking pkill might be a handy tool to have for the test environment. In any case, I will work on a different way so I won't have to pull in pkill.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the line above is equivalent to the pkill. It uses the $MAINPID env var systemd passes to ExecStop= calls anyway, so the pkill is not necessary. The effect of the line I proposed and yours i the same: signal 9 (SIGKILL) is sent to the process invoked via ExecStart=

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, got it. Will do. Thanks

@poettering poettering added reviewed/needs-rework 🔨 PR has been reviewed and needs another round of reworks pid1 labels May 15, 2020
@ejb42
Copy link
Contributor Author

ejb42 commented May 15, 2020

Hmm, so you are only doing this for shutdowns initiated via the emergency action logic. That is intended, yeah? I think I am fine with this, I am just wondering if you are aware that this does not affect "systemctl poweroff" and so on, as those are regular, human requested shutdowns and not emergency ones...

My idea was that a shutdown has already been triggered via human or other way and the fix was to prevent another shutdown, via the emergency path, from interrupting that. I was thinking it would be rather difficult to manually initiate a second shutdown if one is in progress. I suppose someone could have some service that could have an ExecStop that tries to do a systemctl reboot(etc). Would that be something you would want to guard against or should we leave that to user error?

Thank you for your comments, I will work on your change suggestions.

@poettering
Copy link
Member

I think your approach is fine. If people place a "systemctl reboot" in ExecStop= they get to live with the effect...

@ejb42
Copy link
Contributor Author

ejb42 commented May 21, 2020

@poettering, I believe I have addressed all of your comments, please review when you get a chance.

[EMERGENCY_ACTION_POWEROFF_FORCE] = "poweroff-force",
[EMERGENCY_ACTION_POWEROFF_IMMEDIATE] = "poweroff-immediate",
[EMERGENCY_ACTION_EXIT] = "exit",
[EMERGENCY_ACTION_EXIT_FORCE] = "exit-force",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please align the rhs values to the same column. It's much easier to scan by eye then.

IN_SET(action, EMERGENCY_ACTION_REBOOT,
EMERGENCY_ACTION_POWEROFF, EMERGENCY_ACTION_EXIT)) {
log_notice("EmergencyAction: Shutdown is already active Skipping %s request",
emergency_action_table[action]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indentation, and the sentences run together. Maybe "Shutdown is already active. Skipping emergency action request %s."


[Service]
ExecStart=/test-honor-first-shutdown.sh
ExecStop=/bin/sh -x -c 'kill -SIGKILL $MAINPID'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The shell is not nedeed here. Just ExecStop=kill -SIGKILL $MAINPID .

Copy link
Contributor Author

@ejb42 ejb42 Jun 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@keszybz I have tried to use this ExecStop without the shell but simply cannot make it work. I found that testsuite-09 uses the same shell to implement a kill in it's ExecStop. When I remove the shell and run the test I get the following error when running the image.

TEST-48-HONORFIRSTSHUTDOWN RUN: testing honor first shutdown
unit_file_build_name_map: normal unit file: /usr/lib/systemd/tests/testdata/units/test-honor-first-shutdown.service
test-honor-first-shutdown.service: Trying to enqueue job test-honor-first-shutdown.service/start/replace
Sent message type=error sender=org.freedesktop.systemd1 destination=n/a path=n/a interface=n/a member=n/a cookie=1 reply_cookie=1 signature=s error-name=org.freedesktop.systemd1.BadUnitSetting error-message=Unit test-honor-first-shutdown.service has a bad unit file setting.
Failed to process message type=method_call sender=n/a destination=org.freedesktop.systemd1 path=/org/freedesktop/systemd1 interface=org.freedesktop.systemd1.Manager member=StartUnit cookie=1 reply_cookie=0 signature=ss error-name=n/a error-message=n/a: Unit test-honor-first-shutdown.service has a bad unit file setting.

I changed it to be more like what is in testsuite-09 as follows:
ExecStop=sh -c 'kill -SIGKILL $MAINPID'

If using the shell operation is not acceptable please help me understand how to fix it.

echo "Honor first shutdown test script"
while true; do
sleep 3;
done
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe sleep infinity instead of the loop?


# setup the testsuite service
cp ../testsuite-48.units/testsuite-48.service $initdir/etc/systemd/system
cp ../testsuite-48.units/testsuite-48.sh $initdir/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please please just do what the other tests do. There is no need to copy files.

elif [[ "$UNIFIED_CGROUP_HIERARCHY" = "default" ]]; then
_nspawn_pre=("${nspawn_pre[@]}" env --unset=UNIFIED_CGROUP_HIERARCHY --unset=SYSTEMD_NSPAWN_UNIFIED_HIERARCHY)
_nspawn_pre=("${_nspawn_pre[@]}" env --unset=UNIFIED_CGROUP_HIERARCHY --unset=SYSTEMD_NSPAWN_UNIFIED_HIERARCHY)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bug should be fixed in a separate commit.

Description=Testsuite service

[Service]
ExecStart=/testsuite-48.sh
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No files in root. Just do what the other tests do.

@poettering
Copy link
Member

poettering commented May 26, 2020

please generate "perfect" PRs, i.e. where each individual commit is a logical step that makes sense, and not a historical one. We want perfect bisectability, and that means that commits in a PR shouldn't fix up commits in the same PR. (or in other words, please squash the commits so that only logical steps remain)

@poettering
Copy link
Member

also, plese provide a useful commit msg in the first place

keszybz added a commit to keszybz/systemd that referenced this pull request May 26, 2020
@ejb42 ejb42 force-pushed the honor-first-shutdown-request branch from 0da704e to e89406d Compare June 15, 2020 01:36
@ejb42
Copy link
Contributor Author

ejb42 commented Jun 15, 2020

Additional note with this push: The tests have been failing due to a timeout issue. It may be due to my fork being so far behind the source. I will look into any failures that may occur with the current tests.

Create unit tests per established norm at position 52

check in_set first before getting unit
@ejb42 ejb42 force-pushed the honor-first-shutdown-request branch from e89406d to 9f1fa85 Compare June 23, 2020 00:25
@ejb42
Copy link
Contributor Author

ejb42 commented Jun 23, 2020

@poettering @keszybz The last commit was done to fix a numbering conflict of the test files with the main repo.

@keszybz keszybz removed the reviewed/needs-rework 🔨 PR has been reviewed and needs another round of reworks label Jun 24, 2020
@keszybz
Copy link
Member

keszybz commented Jun 24, 2020

LGTM. I think the test could be still simplified a bit, but this isn't terribly important. Let's merge.

@keszybz keszybz merged commit a1ba8c5 into systemd:master Jun 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

Successfully merging this pull request may close these issues.

None yet

3 participants