Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenStack: set correct hostnames for masters and workers #964

Merged
merged 1 commit into from
Jul 25, 2019

Conversation

Fedosin
Copy link
Contributor

@Fedosin Fedosin commented Jul 12, 2019

This commit sets hostnames equal to OpenStack node names.

@openshift-ci-robot openshift-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Jul 12, 2019

decoded_data = json.loads(data)

with open('/etc/hostname', 'w') as f:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add this to https://github.com/coreos/afterburn/ instead.

Copy link
Member

@cgwalters cgwalters Jul 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See e.g. coreos/afterburn#199 which landed exactly this for Azure. By contributing to afterburn, you're also contributing to Fedora CoreOS, and the functionality works early from the OS start.

Oh also, doing this from a static pod will run straight into openshift/origin#22826
EDIT: Nevermind on the last bit, that only applies to traffic exiting from non-hostNetwork pods but you wrote a systemd service.

@cgwalters
Copy link
Member

In fact afterburn has OpenStack support today except...what's up with this change which calls the provider openstack-metadata? Is it not the same thing as the ignition.platform.id?

@openshift-ci-robot openshift-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jul 12, 2019
@lucab
Copy link

lucab commented Jul 13, 2019

@cgwalters indeed. As Openstack is not an homogeneous platform and the metadata component is only optional, the static Ignition platform ID is not useful here (i.e. it can't signal whether the metadata endpoint exists or not).
Passing a --provider openstack-metadata is the correct way to hardcode the fact that you are now expecting/requiring the metadata endpoint.

@cgwalters
Copy link
Member

Passing a --provider openstack-metadata is the correct way to hardcode the fact that you are now expecting/requiring the metadata endpoint.

To do that we'd have to have the MCO ship a drop-in for the systemd unit file that overrides ExecStart=? A little ugly. How about an overdrop file afterburn/provider so we'd just echo openstack-metadata > /usr/lib/afterburn/provider in the RHCOS build?

(And for FCOS + OpenStack...people would use Ignition to write to /etc/afterburn/provider)

@cgwalters
Copy link
Member

An implicit thing in this discussion that we should probably make explicit is this PR is trying to unconditionally set the metadata for OpenStack, but you are implicitly arguing to not to do it an require users (of FCOS) to manually enable it.

I am curious about more background behind your latter argument; it seems...easier...if we just try to reach the metadata service always, and require users of OpenStack-without-metadata (the unusual case, surely) to disable it.

@cgwalters
Copy link
Member

In this PR though, if metadata isn't available, then we'll have a failed systemd unit which...to my knowledge, nothing will react to or care about - something probably for the MCO to roll up in its status? Not sure.

@tomassedovic
Copy link
Contributor

For what it's worth, this fixes the issue we're having with OpenStack.

A worker's hostname and therefore node & machine name is now e.g. tsedovic-qt2cq-worker-6p9xp as opposed to host-172-24-0-14 (the behaviour without this patch).

We need the Nova names and node names to match for the kubernetes cloud provider to work:

https://kubernetes.io/docs/concepts/cluster-administration/cloud-providers/#node-name-4

@Fedosin
Copy link
Contributor Author

Fedosin commented Jul 16, 2019

@cgwalters Maybe I'm wrong, but as I see the metadata service must be enabled to deploy ocp anyway. We pass Ignition configs to MCO using metadata service:

So, I wouldn't worry about the absence of the metadata service.

@Fedosin
Copy link
Contributor Author

Fedosin commented Jul 16, 2019

/test e2e-aws-upgrade

@Fedosin
Copy link
Contributor Author

Fedosin commented Jul 16, 2019

/test e2e-aws

@lucab
Copy link

lucab commented Jul 19, 2019

I have three high-level concerns with the PR in its current form:

  1. at least in some cases, the installer seems to already own /etc/hostname. This PR now blurs the definition of "who owns the static /etc/hostname"
  2. if the hostname without this change was host-172-24-0-14, then this is introducing/amplifying a skew within infrastructure inventory, which will have side-effects on other components trying to match infrastructure services to nodes (e.g. infrastructure metrics/logging analyzers)
  3. ultimately, in a cluster of networked cattle nodes, the DHCP is the source of truth for the hostnames. This guarantees that the whole infrastructure is aligned on resources naming/inventory. A static hostname (either via Ignition or via Afterburn) is the last resort option when DHCP does not provide an hostname, which does not seem to be the case here.

@tomassedovic
Copy link
Contributor

@lucab

Regarding your first point: presumably the change here would apply to the initial master nodes created by the installer as well. If that is the case, we will remove the installer code that sets /etc/hostname because it is no longer necessary.

If not, it would have to stay because it fixes the exact same issue on those nodes.

To your points 2 and 3: getting the hostname from DHCP is simply not how OpenStack is expected to operate. Yes the host-172-24-0-14 value does seem to come from DHCP, but it is not the correct value.

A cluster of networked OpenStack cattle nodes is expected to read the cloud metadata and use the hostname values specified there. Which is what this pull request does. This is what every single OpenStack cloud image I'm aware of does.

I have just launched two extra Nova servers (one CirrOS and one CentOS) in my deployed OCP cluster and their hostnames ended up being identical to their names in Nova. The hostname is initially set by the value from DHCP and it is then immediately overridden by the value from the metadata.

It seems to me that we are using Afterburn exactly what it was designed for: to update the system's state by querying the cloud provider. And we do it to get the behaviour that the OpenStack users know and expect.

I suppose the ultimate fix would be to run Afterburn (or something that does the same thing) in the OpenStack CoreOS images directly. I expect that will take longer and our cloud provider work will be blocked in the meantime.

@Fedosin
Copy link
Contributor Author

Fedosin commented Jul 22, 2019

/test e2e-aws

@bgilbert
Copy link
Contributor

To do that we'd have to have the MCO ship a drop-in for the systemd unit file that overrides ExecStart=? A little ugly.

The unit files are arranged so that the dropin can be:

[Service]
Environment=AFTERBURN_OPT_PROVIDER="--provider openstack-metadata"

That's how CLCT handles it, when invoked with -platform openstack-metadata.

I am curious about more background behind your latter argument; it seems...easier...if we just try to reach the metadata service always, and require users of OpenStack-without-metadata (the unusual case, surely) to disable it.

I think the idea is that it's insecure to request metadata from a random link-local IP address unless we explicitly know that it's safe.

cc @ajeddeloh

@Fedosin
Copy link
Contributor Author

Fedosin commented Jul 23, 2019

Folks, let's get this straight.
This patch is needed to activate the cloud provider. The goal of this patch is to set the hostname equal to the node name. As I understood from @derekwaynecarr words, this is a mandatory requirement in OpenShift 4.2.
The patch works this way: the node name is stored in Nova, we take the value through the metadata service and write this value in /etc/hostname

My original commit contained a service that executed a small script that queried the metadata, read the node name, and wrote to the file. Then @cgwalters told me about the Afterburn utility, which does just that.
So instead of my script, I started calling this utility and everything works fine.

The only issue in this case is that Afterburn does not correctly identify the name of the provider (like "openstack", but it is actually called "openstack-metadata"). That's why I explicitly added "--provider openstack-metadata" to fix it.

The Metadata service should be available because we actively use it in kubelet. If the administrator is worried about security, he can always switch to https.

What other improvements can I make to this patch so that we can merge it?

@cgwalters
Copy link
Member

/approve
/lgtm

@openshift-ci-robot openshift-ci-robot added lgtm Indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Jul 25, 2019
@cgwalters
Copy link
Member

Although, now that I look more closely this could just go in the templates/common/ directory right?

@cgwalters
Copy link
Member

/hold per above

This commit sets hostnames equal to OpenStack node names.
@openshift-ci-robot openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Jul 25, 2019
@cgwalters
Copy link
Member

/lgtm
/hold cancel

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jul 25, 2019
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cgwalters, Fedosin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants