Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tasks: Check directly if NetworkManager is running #219

Closed
wants to merge 1 commit into from

Conversation

tyll
Copy link
Member

@tyll tyll commented May 6, 2020

The service_facts module broke for Fedora 32. Instead of relying on it,
just check directly if NetworkManager is running to determine if it is
the current provider.

closes #218

@tyll tyll requested review from richm and pcahyna May 6, 2020 20:29
@coveralls
Copy link

coveralls commented May 6, 2020

Pull Request Test Coverage Report for Build 794

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage remained the same at 39.353%

Totals Coverage Status
Change from base Build 793: 0.0%
Covered Lines: 876
Relevant Lines: 2226

💛 - Coveralls

shell: |
for f in /proc/*/exe;
do [[ "$(readlink -f "${f}")" == /usr/sbin/NetworkManager ]] && echo nm && break
done || echo -n initscripts
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm - what about this:

command: systemctl is-enabled NetworkManager.service

that will tell you if the service is enabled. If you want to find out if it is active:

command: systemctl is-active NetworkManager.service

man systemctl

       is-active PATTERN...
           Check whether any of the specified units are active (i.e. running).
           Returns an exit code 0 if at least one is active, or non-zero
           otherwise. Unless --quiet is specified, this will also print the
           current unit state to standard output.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was my initial idea, too, however there is no systemctl on CentOS 6, so it would require more checks...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I tried to avoid too many extra tools to make it work in containers, too. Currently, it only depends on coreutils.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the approach in timesync also uses systemctl or service ... is it causing real problems? How would starting and enabling the service work in a container that does not have systemctl or service?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using systemctl means more complexity:

  1. If systemctl is not installed -> initscripts
  2. If systemctl is installeed
    2.1. If NetworkManager is running -> nm
    2.2 If NetworkManager is not running -> initscripts

The note about avoiding extra dependencies is why I chose that approach instead of using ps or grep etc.

However, it would be great if you describe the problems with the current approach, so we can find a solution for the problem instead of discussing alternative solutions without a clear explanation what problem they solve that the proposed solution does not.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you actually worried about NetworkManager running on EL 6? From 1. it seems that you are not and if this is the case, it simplifies things, because if systemctl is not installed, it will fail, just as in case systemctl is-active is invoked on newer systems with NetworkManager not running.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you actually worried about NetworkManager running on EL 6? From 1. it seems that you are not and if this is the case, it simplifies things, because if systemctl is not installed, it will fail, just as in case systemctl is-active is invoked on newer systems with NetworkManager not running.

No, I don't care about NetworkManager running on EL 6. What is the full pattern with systemctl is-active that you propose, maybe this will clarify things. For, the problem with 1. was that it would fail the command module while 2.2 would report a different return code. But it seems that a different return code fails the module as well.

Something like this seems to work, too. Do you like this better? Do you prefer a construct with failing module that is ignored and then the failure is checked?

ansible -i rhel6, -m shell -a "systemctl is-enabled NetworkManager.service &>/dev/null  && echo nm || echo initscripts" all

@pcahyna
Copy link
Member

pcahyna commented May 6, 2020

Sorry, I am against reimplementing pieces of Ansible in shell code. If Ansible is broken for Fedora 32, we have to disable it in the CI.

@tyll
Copy link
Member Author

tyll commented May 6, 2020

Sorry, I am against reimplementing pieces of Ansible in shell code.

The module is now broken for the second time since it was being used in this role. Also the timesync role uses a shell module for a similar approach: https://github.com/linux-system-roles/timesync/blob/master/library/timesync_provider.sh
What is the actual problem here with this patch? The code is a lot simpler than the Ansible code. Once we can drop support for CentOS 6, we can easily adjust it to use systemctl.

If Ansible is broken for Fedora 32, we have to disable it in the CI.

Fedora 32 was released recently, so it is too late to ignore it in the CI. Leaving the role intentionally broken for Fedora users is very bad in my opinion.

@pcahyna
Copy link
Member

pcahyna commented May 7, 2020

Leaving the role intentionally broken for Fedora users is very bad in my opinion.

It is primarily the Ansible module that is broken for Fedora users, the role being broken is just a corollary. I filed https://bugzilla.redhat.com/show_bug.cgi?id=1832625 and a PR, which will help everyone, not just the network role users.

@pcahyna
Copy link
Member

pcahyna commented May 7, 2020

If the goal is to check whether NetworkManager is running, isn't a good option to ask NetworkManager directly using dbus, instead of checking the existence of the process or asking the init/service system?

@pcahyna
Copy link
Member

pcahyna commented May 7, 2020

    dbus-send --system --print-reply \
        --dest=org.freedesktop.DBus \
        /org/freedesktop/DBus \
        org.freedesktop.DBus.GetNameOwner \
        string:"org.freedesktop.NetworkManager" >/dev/null 2>&1

(from /etc/sysconfig/network-scripts/network-functions)

https://github.com/fedora-sysv/initscripts/blob/3ac46fd2fd5222917efb02eaa6abd9eae90086b0/network-scripts/network-functions#L285

@tyll
Copy link
Member Author

tyll commented May 7, 2020

    dbus-send --system --print-reply \
        --dest=org.freedesktop.DBus \
        /org/freedesktop/DBus \
        org.freedesktop.DBus.GetNameOwner \
        string:"org.freedesktop.NetworkManager" >/dev/null 2>&1

(from /etc/sysconfig/network-scripts/network-functions)

https://github.com/fedora-sysv/initscripts/blob/3ac46fd2fd5222917efb02eaa6abd9eae90086b0/network-scripts/network-functions#L285

This adds a dependency on dbust-tools which is not installed by default and actually seems to be missing as a dependency for network-scripts, too.

[root@f32-cloud ~]# yum install dbus-tools
Last metadata expiration check: 0:02:15 ago on Thu 07 May 2020 12:50:30 PM UTC.
Dependencies resolved.
============================================================================================================================================================================================================
 Package                                          Architecture                                 Version                                                   Repository                                    Size
============================================================================================================================================================================================================
Installing:
 dbus-tools                                       x86_64                                       1:1.12.16-4.fc32                                          fedora                                        56 k

Transaction Summary
============================================================================================================================================================================================================
Install  1 Package

Total download size: 56 k
Installed size: 134 k
Is this ok [y/N]: y
Downloading Packages:
dbus-tools-1.12.16-4.fc32.x86_64.rpm                                                                                                                                        2.4 MB/s |  56 kB     00:00    
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Total                                                                                                                                                                       104 kB/s |  56 kB     00:00     
Running transaction check
Transaction check succeeded.
Running transaction test
Transaction test succeeded.
Running transaction
  Preparing        :                                                                                                                                                                                    1/1 
  Installing       : dbus-tools-1:1.12.16-4.fc32.x86_64                                                                                                                                                 1/1 
  Running scriptlet: dbus-tools-1:1.12.16-4.fc32.x86_64                                                                                                                                                 1/1 
  Verifying        : dbus-tools-1:1.12.16-4.fc32.x86_64                                                                                                                                                 1/1 

Installed:
  dbus-tools-1:1.12.16-4.fc32.x86_64                                                                                                                                                                        

Complete!
[root@f32-cloud ~]# (. /etc/sysconfig/network-scripts/network-functions; is_nm_running && echo "yes")
yes

@tyll
Copy link
Member Author

tyll commented May 7, 2020

Filed a Fedora bug for network-scripts: https://bugzilla.redhat.com/show_bug.cgi?id=1832897

@tyll
Copy link
Member Author

tyll commented May 7, 2020

@tyll
Copy link
Member Author

tyll commented May 7, 2020

If the goal is to check whether NetworkManager is running, isn't a good option to ask NetworkManager directly using dbus, instead of checking the existence of the process or asking the init/service system?

IMHO, checking for the process is good enough. What's the problem that you see with this check?

@pcahyna
Copy link
Member

pcahyna commented May 7, 2020

Sorry, I am against reimplementing pieces of Ansible in shell code.

The module is now broken for the second time since it was being used in this role.

It definitely then warrants a separate discussion on how to prevent it in the future, thanks for bringing it up.

Also the timesync role uses a shell module for a similar approach: https://github.com/linux-system-roles/timesync/blob/master/library/timesync_provider.sh

Note that I also advocated using service_facts at that time if you look at the history of that module. But the situation is not analogous because service_facts did not provide the functionality needed (at least not at that time) and the logic is more complex (also it is encapsulated in a module which is where complex shell(/python/whatever programming language) logic properly belongs instead of "one-liners" in the playbook, and it uses documented interfaces like systemctl and service.)

What is the actual problem here with this patch? The code is a lot simpler than the Ansible code.

Simplicity must be in the eye of the beholder, because to me accessing a data structure which already holds the information that we need is much simpler than invoking a complex shell command riddled with implicit implementation assumptions on how things work internally (are you sure that NetworkManager will always be at /usr/sbin/NetworkManager or that it will be the running executable or that shell is actually invoking bash [*] ?). The use of shell is also contrary to Ansible and ours best practices (oasis-roles/meta_standards#21) which advise to use dedicated modules and if not possible, prefer command over shell. Application to our situation: the dedicated module is service_facts, or we can ask a documented command which knows the answer (I suggested dbus-send but something like nmcli --terse --fields state general could be also a viable option, maybe even for the network-scripts package problem).

Concerning more specific problems: how will this work in check mode or how will it avoid reporting changes when the role does not need to make any? What does when: true mean (literally a truism)?

Once we can drop support for CentOS 6, we can easily adjust it to use systemctl.

It is not just CentOS 6, it it RHEL 6 too, right? Which is years ahead (EUS etc.)

If Ansible is broken for Fedora 32, we have to disable it in the CI.

Fedora 32 was released recently, so it is too late to ignore it in the CI.

Quite to the contrary, it is too early to add it, since Ansible itself does not seem to have Fedora 32 in their CI, if I understand correctly ansible/ansible#69222 . (There is a nice expressions in French: "aller plus vite que la musique".)

[*] while this role supports only distributions where those assumptions may be to some extent warranted, there is interest on extending our roles to support other distributions, so this sets a bad example for others.

@tyll tyll mentioned this pull request May 11, 2020
@tyll
Copy link
Member Author

tyll commented May 11, 2020

What is the actual problem here with this patch? The code is a lot simpler than the Ansible code.

Simplicity must be in the eye of the beholder, because to me accessing a data structure which already holds the information that we need is much simpler than invoking a complex shell command riddled with implicit implementation assumptions on how things work internally (are you sure that NetworkManager will always be at /usr/sbin/NetworkManager or that it will be the running executable or that shell is actually invoking bash [*] ?). The use of shell is also contrary to Ansible and ours best practices (oasis-roles/meta_standards#21) which advise to use dedicated modules and if not possible, prefer command over shell. Application to our situation: the dedicated module is service_facts, or we can ask a documented command which knows the answer (I suggested dbus-send but something like nmcli --terse --fields state general could be also a viable option, maybe even for the network-scripts package problem).

The complexity is in the service_facts module that optimizes for a different use-case (getting the information for all services) while the role is only interested in one service. Using busctl --system call org.freedesktop.DBus /org/freedesktop/DBus org.freedesktop.DBus GetNameOwner s org.freedesktop.NetworkManager >/dev/null 2>&1 would work for me, too.

Concerning more specific problems: how will this work in check mode or how will it avoid reporting changes when the role does not need to make any? What does when: true mean (literally a truism)?

Good points, I will take a look into this.

If Ansible is broken for Fedora 32, we have to disable it in the CI.

Fedora 32 was released recently, so it is too late to ignore it in the CI.

Quite to the contrary, it is too early to add it, since Ansible itself does not seem to have Fedora 32 in their CI, if I understand correctly ansible/ansible#69222 . (There is a nice expressions in French: "aller plus vite que la musique".)

In my opinion, the beat comes from Fedora not from Ansible. So when Fedora is released, the role should be ready instead of starting to develop against it. This is what Fedora Rawhide or Branched are there for.

[*] while this role supports only distributions where those assumptions may be to some extent warranted, there is interest on extending our roles to support other distributions, so this sets a bad example for others.

I am flexible to adjust and adapt when this happens. I do not see a need to adjust to potential requirements that might not be valid anymore when this will happen.

The service_facts module broke for Fedora 32. Instead of relying on it,
just check directly if NetworkManager is running to determine if it is
the current provider.
@tyll tyll added the incomplete Work in Progress - not ready to be merged label Jun 3, 2020
@tyll tyll closed this Jul 10, 2020
@tyll tyll reopened this Jul 10, 2020
@tyll tyll closed this Jul 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
incomplete Work in Progress - not ready to be merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Service facts seem to fail on Fedora 32
4 participants