Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make worker depend on network-online.target to avoid networking errors #3673

Closed
wants to merge 1 commit into from

Conversation

Martchus
Copy link
Contributor

* This hopefully avoids the worker being stuck with the error
  "Address family for hostname not supported"
  (see https://progress.opensuse.org/issues/78390#note-38)
* According to the documentation "its primary purpose is network client
  software that cannot operate without network"; I suppose our worker falls
  into that category
  (from https://www.freedesktop.org/wiki/Software/systemd/NetworkTarget)
Copy link
Member

@kraih kraih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've looked over the systemd docs, and this seems to be the correct solution.

Copy link
Member

@okurz okurz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The correct documentation for systemd also explains that code should always be designed to cope with a startup before network is ready. This is the wrong approach. If you say "perl is to blame" then we still need to change this

@kraih
Copy link
Member

kraih commented Jan 14, 2021

Nobody said Perl was to blame. As far as we've been able to track it down so far getaddrinfo was responsible. Something we only have limited control over.

@Martchus
Copy link
Contributor Author

Martchus commented Jan 14, 2021

I also don't like this solution. It is certainly bad that our process is stuck in an error state even though the error is gone. That's also why I took so much time to investigate this further before creating this PR.

However, I'm not sure what causes this problem. Most likely it is some internal caching of glibc's getaddrinfo() function. In this case Perl is not to blame. I've looked a little bit into the code of getaddrinfo() and there's indeed some caching involved so it is at least plausible. This also means we have only limited control over the problem. I've asked on the glibc IRC channel but got no reply so far. Maybe it is worth creating a simple C program to reproduce this and then create a glibc ticket. Not sure whether this would be successful. In the meantime it is likely a good idea to simply add the systemd dependency as a workaround.

By the way, as far as I understand the documentation they are mainly talking about servers here. For clients (and this PR is about a client) the quote I've put in the commit message seems more relevant.

@kraih
Copy link
Member

kraih commented Jan 14, 2021

That's how i interpret that section as well. The possible approaches mentioned are all server specific.

@kraih
Copy link
Member

kraih commented Jan 14, 2021

I just stumbled over After=nss-lookup.target, that might also fix the specific case with getaddrinfo. And we don't have to argue about the intention of a warning in the systemd docs. 😉

@okurz
Copy link
Member

okurz commented Jan 18, 2021

As discussed in the last weekly meeting I think there are other potential problems that this approach could cause but if you want to try that out on o3 or osd first and have good results then we can also accept the change in openQA repo here. Otherwise I suggest to go with #3676 first

@Martchus
Copy link
Contributor Author

Let's go with #3676 first and let's not try two things at the same time. So I'm closing this for now.

@Martchus Martchus closed this Jan 18, 2021
@Martchus
Copy link
Contributor Author

I'm not sure whether this would work anyways. The documentation mentions that the right "wait" service needs to be enabled as well but they only consider NetworkManager-wait-online.service and systemd-networkd-wait-online.service. This PR does not take care of this because I don't know what the corresponding service for Wicked is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants