OpenStack: set correct hostnames for masters and workers #964

Fedosin · 2019-07-12T15:44:19Z

This commit sets hostnames equal to OpenStack node names.

cgwalters · 2019-07-12T15:50:43Z

templates/master/00-master/openstack/files/set-hostname.yaml

+
+    decoded_data = json.loads(data)
+
+    with open('/etc/hostname', 'w') as f:


Please add this to https://github.com/coreos/afterburn/ instead.

See e.g. coreos/afterburn#199 which landed exactly this for Azure. By contributing to afterburn, you're also contributing to Fedora CoreOS, and the functionality works early from the OS start.

~~Oh also, doing this from a static pod will run straight into~~ openshift/origin#22826
EDIT: Nevermind on the last bit, that only applies to traffic exiting from non-hostNetwork pods but you wrote a systemd service.

cgwalters · 2019-07-12T21:17:33Z

In fact afterburn has OpenStack support today except...what's up with this change which calls the provider openstack-metadata? Is it not the same thing as the ignition.platform.id?

lucab · 2019-07-13T10:04:39Z

@cgwalters indeed. As Openstack is not an homogeneous platform and the metadata component is only optional, the static Ignition platform ID is not useful here (i.e. it can't signal whether the metadata endpoint exists or not).
Passing a --provider openstack-metadata is the correct way to hardcode the fact that you are now expecting/requiring the metadata endpoint.

cgwalters · 2019-07-15T21:24:03Z

Passing a --provider openstack-metadata is the correct way to hardcode the fact that you are now expecting/requiring the metadata endpoint.

To do that we'd have to have the MCO ship a drop-in for the systemd unit file that overrides ExecStart=? A little ugly. How about an overdrop file afterburn/provider so we'd just echo openstack-metadata > /usr/lib/afterburn/provider in the RHCOS build?

(And for FCOS + OpenStack...people would use Ignition to write to /etc/afterburn/provider)

cgwalters · 2019-07-15T21:27:34Z

An implicit thing in this discussion that we should probably make explicit is this PR is trying to unconditionally set the metadata for OpenStack, but you are implicitly arguing to not to do it an require users (of FCOS) to manually enable it.

I am curious about more background behind your latter argument; it seems...easier...if we just try to reach the metadata service always, and require users of OpenStack-without-metadata (the unusual case, surely) to disable it.

cgwalters · 2019-07-15T21:34:17Z

In this PR though, if metadata isn't available, then we'll have a failed systemd unit which...to my knowledge, nothing will react to or care about - something probably for the MCO to roll up in its status? Not sure.

tomassedovic · 2019-07-16T09:53:17Z

For what it's worth, this fixes the issue we're having with OpenStack.

A worker's hostname and therefore node & machine name is now e.g. tsedovic-qt2cq-worker-6p9xp as opposed to host-172-24-0-14 (the behaviour without this patch).

We need the Nova names and node names to match for the kubernetes cloud provider to work:

https://kubernetes.io/docs/concepts/cluster-administration/cloud-providers/#node-name-4

Fedosin · 2019-07-16T10:39:32Z

@cgwalters Maybe I'm wrong, but as I see the metadata service must be enabled to deploy ocp anyway. We pass Ignition configs to MCO using metadata service:

We generate the config here: https://github.com/openshift/cluster-api-provider-openstack/blob/master/pkg/cloud/openstack/machine/actuator.go#L155
And then inject the string when we boot an instance https://github.com/openshift/cluster-api-provider-openstack/blob/master/pkg/cloud/openstack/machine/actuator.go#L218

So, I wouldn't worry about the absence of the metadata service.

Fedosin · 2019-07-16T10:43:45Z

/test e2e-aws-upgrade

Fedosin · 2019-07-16T11:27:08Z

/test e2e-aws

lucab · 2019-07-19T07:44:37Z

I have three high-level concerns with the PR in its current form:

at least in some cases, the installer seems to already own /etc/hostname. This PR now blurs the definition of "who owns the static /etc/hostname"
if the hostname without this change was host-172-24-0-14, then this is introducing/amplifying a skew within infrastructure inventory, which will have side-effects on other components trying to match infrastructure services to nodes (e.g. infrastructure metrics/logging analyzers)
ultimately, in a cluster of networked cattle nodes, the DHCP is the source of truth for the hostnames. This guarantees that the whole infrastructure is aligned on resources naming/inventory. A static hostname (either via Ignition or via Afterburn) is the last resort option when DHCP does not provide an hostname, which does not seem to be the case here.

tomassedovic · 2019-07-19T10:24:33Z

@lucab

Regarding your first point: presumably the change here would apply to the initial master nodes created by the installer as well. If that is the case, we will remove the installer code that sets /etc/hostname because it is no longer necessary.

If not, it would have to stay because it fixes the exact same issue on those nodes.

To your points 2 and 3: getting the hostname from DHCP is simply not how OpenStack is expected to operate. Yes the host-172-24-0-14 value does seem to come from DHCP, but it is not the correct value.

A cluster of networked OpenStack cattle nodes is expected to read the cloud metadata and use the hostname values specified there. Which is what this pull request does. This is what every single OpenStack cloud image I'm aware of does.

I have just launched two extra Nova servers (one CirrOS and one CentOS) in my deployed OCP cluster and their hostnames ended up being identical to their names in Nova. The hostname is initially set by the value from DHCP and it is then immediately overridden by the value from the metadata.

It seems to me that we are using Afterburn exactly what it was designed for: to update the system's state by querying the cloud provider. And we do it to get the behaviour that the OpenStack users know and expect.

I suppose the ultimate fix would be to run Afterburn (or something that does the same thing) in the OpenStack CoreOS images directly. I expect that will take longer and our cloud provider work will be blocked in the meantime.

Fedosin · 2019-07-22T10:16:04Z

/test e2e-aws

bgilbert · 2019-07-22T23:34:36Z

To do that we'd have to have the MCO ship a drop-in for the systemd unit file that overrides ExecStart=? A little ugly.

The unit files are arranged so that the dropin can be:

[Service]
Environment=AFTERBURN_OPT_PROVIDER="--provider openstack-metadata"

That's how CLCT handles it, when invoked with -platform openstack-metadata.

I am curious about more background behind your latter argument; it seems...easier...if we just try to reach the metadata service always, and require users of OpenStack-without-metadata (the unusual case, surely) to disable it.

I think the idea is that it's insecure to request metadata from a random link-local IP address unless we explicitly know that it's safe.

cc @ajeddeloh

Fedosin · 2019-07-23T14:14:20Z

Folks, let's get this straight.
This patch is needed to activate the cloud provider. The goal of this patch is to set the hostname equal to the node name. As I understood from @derekwaynecarr words, this is a mandatory requirement in OpenShift 4.2.
The patch works this way: the node name is stored in Nova, we take the value through the metadata service and write this value in /etc/hostname

My original commit contained a service that executed a small script that queried the metadata, read the node name, and wrote to the file. Then @cgwalters told me about the Afterburn utility, which does just that.
So instead of my script, I started calling this utility and everything works fine.

The only issue in this case is that Afterburn does not correctly identify the name of the provider (like "openstack", but it is actually called "openstack-metadata"). That's why I explicitly added "--provider openstack-metadata" to fix it.

The Metadata service should be available because we actively use it in kubelet. If the administrator is worried about security, he can always switch to https.

What other improvements can I make to this patch so that we can merge it?

cgwalters · 2019-07-25T13:02:32Z

/approve
/lgtm

cgwalters · 2019-07-25T13:04:17Z

Although, now that I look more closely this could just go in the templates/common/ directory right?

cgwalters · 2019-07-25T13:09:50Z

/hold per above

This commit sets hostnames equal to OpenStack node names.

cgwalters · 2019-07-25T13:13:59Z

/lgtm
/hold cancel

openshift-ci-robot · 2019-07-25T13:14:10Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cgwalters, Fedosin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [cgwalters]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Jul 12, 2019

openshift-ci-robot requested review from kikisdeliveryservice and runcom July 12, 2019 15:44

cgwalters reviewed Jul 12, 2019

View reviewed changes

Fedosin force-pushed the openstack_hostnames branch from b41bba4 to 4bfe39b Compare July 12, 2019 22:31

openshift-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jul 12, 2019

Fedosin mentioned this pull request Jul 16, 2019

data/data/openstack: set correct hostnames for machines during bootstrap openshift/installer#1980

Merged

Fedosin force-pushed the openstack_hostnames branch from 4bfe39b to 245dc9a Compare July 18, 2019 22:48

This was referenced Jul 19, 2019

docs: mention platform naming skew coreos/afterburn#251

Closed

config: consider sourcing options from files coreos/afterburn#252

Closed

openshift-ci-robot assigned cgwalters Jul 25, 2019

openshift-ci-robot added lgtm Indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Jul 25, 2019

OpenStack: set correct hostnames for masters and workers

76e932b

This commit sets hostnames equal to OpenStack node names.

Fedosin force-pushed the openstack_hostnames branch from 245dc9a to 76e932b Compare July 25, 2019 13:12

openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Jul 25, 2019

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jul 25, 2019

openshift-merge-robot merged commit edf8d9f into openshift:master Jul 25, 2019

Fedosin mentioned this pull request Jul 26, 2019

Re-enable OpenStack cloud provider openshift/library-go#488

Merged

lucab mentioned this pull request Sep 3, 2019

docs: record cloud-specific quirks coreos/afterburn#94

Open

jamescassell mentioned this pull request Mar 13, 2020

Modify Afterburn openstack defaults to try config-drive, then metadata service coreos/fedora-coreos-tracker#422

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OpenStack: set correct hostnames for masters and workers #964

OpenStack: set correct hostnames for masters and workers #964

Fedosin commented Jul 12, 2019

cgwalters Jul 12, 2019

cgwalters Jul 12, 2019 •

edited

Loading

cgwalters commented Jul 12, 2019

lucab commented Jul 13, 2019

cgwalters commented Jul 15, 2019

cgwalters commented Jul 15, 2019

cgwalters commented Jul 15, 2019

tomassedovic commented Jul 16, 2019

Fedosin commented Jul 16, 2019

Fedosin commented Jul 16, 2019

Fedosin commented Jul 16, 2019

lucab commented Jul 19, 2019

tomassedovic commented Jul 19, 2019

Fedosin commented Jul 22, 2019

bgilbert commented Jul 22, 2019

Fedosin commented Jul 23, 2019

cgwalters commented Jul 25, 2019

cgwalters commented Jul 25, 2019

cgwalters commented Jul 25, 2019

cgwalters commented Jul 25, 2019

openshift-ci-robot commented Jul 25, 2019


		decoded_data = json.loads(data)

		with open('/etc/hostname', 'w') as f:

OpenStack: set correct hostnames for masters and workers #964

OpenStack: set correct hostnames for masters and workers #964

Conversation

Fedosin commented Jul 12, 2019

cgwalters Jul 12, 2019

Choose a reason for hiding this comment

cgwalters Jul 12, 2019 • edited Loading

Choose a reason for hiding this comment

cgwalters commented Jul 12, 2019

lucab commented Jul 13, 2019

cgwalters commented Jul 15, 2019

cgwalters commented Jul 15, 2019

cgwalters commented Jul 15, 2019

tomassedovic commented Jul 16, 2019

Fedosin commented Jul 16, 2019

Fedosin commented Jul 16, 2019

Fedosin commented Jul 16, 2019

lucab commented Jul 19, 2019

tomassedovic commented Jul 19, 2019

Fedosin commented Jul 22, 2019

bgilbert commented Jul 22, 2019

Fedosin commented Jul 23, 2019

cgwalters commented Jul 25, 2019

cgwalters commented Jul 25, 2019

cgwalters commented Jul 25, 2019

cgwalters commented Jul 25, 2019

openshift-ci-robot commented Jul 25, 2019

cgwalters Jul 12, 2019 •

edited

Loading