Skip to content
This repository has been archived by the owner on Sep 6, 2023. It is now read-only.

metal_device: add hostname_prefix attribute #165

Closed
wants to merge 11 commits into from

Conversation

zimbatm
Copy link

@zimbatm zimbatm commented Jul 22, 2021

This allows making sure that every new device will have a unique hostname. There are a number of distributed systems like Kubernetes, Ceph, Redpanda, .. that assume that each host is unique by its hostname.

Using random_string to add a suffix to the hostname is not enough because it doesn't guarantee that the random_string will be re-created if the metal_device is (eg: user_data change, resource was deleted manually, or tainted).

This PR was inspired by the aws_s3_bucket resource.

This allows to make sure that every new device will have a unique device
name, which often maps back to the OS hostname.

This is especially important for Kubernetes where each node needs a new
name.
Force the device to be re-created for the new value to propagate.
@displague
Copy link
Member

displague commented Jul 22, 2021

Thanks for opening this PR, @zimbatm!

I see how this adds value beyond the random prefix in cases where the hostname should not be (or is troublesome if) reused, as in kubernetes nodes or logging data.

There are a few changes I would like to consider:

  • metal/resource_metal_spot_market_request.go follows the same patterns as device creation, it would be helpful if these changes were also made to the spot market request resource
  • making hostname ForceNew is a breaking change. I think we should avoid this by adding similar hostname/ hostname_prefix changing code to the Update function.
  • hostname and hostname_prefix can both be optional. The API does not require a hostname field, one will be generated if not provided.
    • we should not use resource.UniqueID since the API default may be desirable
  • With hostname being a computed field that is also optional, we may have to change how we detect change for this field.
  • The hostname field here is used within the OS, where a 63 character limit is imposed. The helper function provides 26 characters of those 63 characters. I think we can leave it to the API to reject invalid hostnames, but the description may be improved by stating that a 26 character suffix will be appended. This may not be desirable.

@displague
Copy link
Member

displague commented Jul 22, 2021

If we drop the ForceNew behavior, we should also add steps to the test case that Update code by toggling between hostname_prefix, hostname, and back to hostname_prefix.

@zimbatm
Copy link
Author

zimbatm commented Jul 22, 2021

Hey, thanks for the great feedback.

metal/resource_metal_spot_market_request.go follows the same patterns as device creation, it would be helpful if these changes were also made to the spot market request resource

sure can do

making hostname ForceNew is a breaking change. I think we should avoid this by adding similar hostname/ hostname_prefix changing code to the Update function.

The main issue I have is that even if the device hostname gets updated, I don't think that it will propagate to the OS hostname automatically. That's why I think it's easier to replace the device entirely, but I understand the back-compat argument.

hostname and hostname_prefix can both be optional. The API does not require a hostname field, one will be generated if not provided. we should not use resource.UniqueID since the API default may be desirable

Sounds good. I will update the code to update the hostname after the device creation.

With hostname being a computed field that is also optional, we may have to change how we detect change for this field. The hostname field here is used within the OS, where a 63 character limit is imposed. The helper function provides 26 characters of those 63 characters. I think we can leave it to the API to reject invalid hostnames, but the description may be improved by stating that a 26 character suffix will be appended. This may not be desirable.

I'm having the same concern. Another issue is that the hostname is so long that it gets clipped off in the web console:
image

The S3 bucket actually has some code logic to check the bucket name length on the Terraform side. I could add that easily to check max(63) chars.

@displague
Copy link
Member

displague commented Jul 22, 2021

@zimbatm These hostnames are a little ridiculous :-)

The EM API returns a short id (the first part of the device uuid), which is also used in the API generated hostnames. These short ids would have enough uniqueness, I think, but they wouldn't be known until after Device.Create is called. If the Create succeded, but the Update failed, we would rely on a custom diff function to determine if the correct hostname was in use (prefix + short id) and heal the hostname in the API. I like this approach because the suffix is meaningful, as it relates to resources used within the EM. This also provides uniqueness.

Alternatively, since this prefix is just a long timestamp integer, we could translate it to a base64 string to carry the same detail in a more condensed (albeit less human readable) format. I made an example at https://play.golang.org/p/e6qwPnjT_G9:

20210722123456000100000000 -> ELfJcRJwI0Kh4QA

@displague
Copy link
Member

but they wouldn't be known until after Device.Create is called.

The OS hostname would have the desired hostname so long as the Device.Update was called before the metadata service was inspected during the provisioning steps. The Update call would happen less than a second after the Create() call, so we can nearly guarantee that this will succeed, but it is still a race condition.

Perhaps the base64 approach would be safer.

@displague
Copy link
Member

we could translate it to a base64 string to carry the same detail in a more condensed (albeit less human readable) format.

well.. All of our suffixes would start with "ELf" for the next few months before moving on to "ELo". Base64 also includes "/" (and others, potentially padding characters too) that we would have to removed (effectively we need base62 [A-Za-z0-9]).

@displague
Copy link
Member

displague commented Jul 22, 2021

hex.EncodeToString avoids the need for base62 with a longer string that also has a few months of repeated prefixes.

In all but the short id approach, I find it confusing to have uniqueids that are mostly the same save a few bytes which are hard to discern on sight or sound.

@zimbatm
Copy link
Author

zimbatm commented Jul 22, 2021

Would you agree to extend the EM API to take the hostname_prefix as an argument? I'm a bit concerned about potential race conditions on the update.

Otherwise, I propose to add our own random ID generator that outputs base36. domain-related names generally aren't case sensitive so it's best to avoid the potential collision.

@displague
Copy link
Member

displague commented Jul 23, 2021

Would you agree to extend the EM API to take the hostname_prefix as an argument? I'm a bit concerned about potential race conditions on the update.

The EM API supports a hostname prefix for batch device creation requests, including spot market. The suffix is an autoincrementing integer in those cases. I'm not sure about the precise format, but I suspect %.2d. The field is named hostname, which differs from the hostnames field where the name for each batch instance can be given a specific name. I'll bring up the idea of a hostname_prefix field to supplement this field and offer a unique hostname tool for single instance creation. I don't know what to expect with this suggestion, so I think we may want to implement a Terraform behavioral field.

Otherwise, I propose to add our own random ID generator that outputs base36. domain-related names generally aren't case sensitive so it's best to avoid the potential collision.

This sounds good.

@zimbatm
Copy link
Author

zimbatm commented Jul 23, 2021

^ what do you think of this? https://github.com/rs/xid

docs/resources/device.md Outdated Show resolved Hide resolved
metal/id_test.go Outdated Show resolved Hide resolved
id = PrefixedUniqueId(prefix)

if _, ok := ids[id]; ok {
t.Fatalf("Got duplicated id! %s", id)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible but improbable for this to fail. I suppose we can keep the sanity check.

metal/resource_metal_device.go Outdated Show resolved Hide resolved
@@ -524,6 +546,8 @@ func resourceMetalDeviceCreate(d *schema.ResourceData, meta interface{}) error {
return retErr
}

// Pick the hostname validated by metal
d.Set("hostname", newDevice.Hostname)
Copy link
Member

@displague displague Jul 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think setting the hostname here is needed. SSH provisioners will not execute until after waitForActiveDevice and resourceMetalDeviceRead (which picks up the hostname) below.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the first part of this function, the desired hostname is computed and then passed to the Metal API to create the new device.

Then here we sync back the hostname attribute with what we got back from the Metal API.

That way the Metal API can decide to name the hostname otherwise, and the resource hostname attribute is computed based on that.

Type: schema.TypeString,
Description: "The device name",
Optional: true,
ForceNew: true,
Copy link
Member

@displague displague Jul 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm reluctant to accept the ForceNews. Why do you suppose this is needed?

@t0mk any thoughts?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I believe it would be better for the provider if that field was immutable and replace the resource when it gets changed. In terms of backward compatibility, even if it changes the API, I don't believe that it will have much of an impact on users.

It removes a bit of logic on Update to rename the device on hostname change. In practice, I haven't seen devices being renamed to I suspect that this part of the code is buggy already.

If the device hostname changes, it won't reflect on the OS hostname automatically. I would rather see the device getting replaced and both map 1:1 all of the time.

zimbatm and others added 3 commits July 25, 2021 10:44
Co-authored-by: Marques Johansson <github@displague.com>
Co-authored-by: Marques Johansson <github@displague.com>
Co-authored-by: Marques Johansson <github@displague.com>
@t0mk
Copy link
Contributor

t0mk commented Jul 25, 2021

Hey @zimbatm @displague, I think I understand the valid use case for a random prefix, but I think creating a resource field inside the provider for this is a bad idea:

  • It doesn't have any corresponding API attribute
  • it's unprecedented in any other terraform compute resource (aws_s3_bucket is not comparable to metal_device)
  • it will make the resource hard to import - an arbitrary input attr which is not readable from the API

Are you sure you can't achieve the random prefix outside of the provider internals with Terraform functions? I imagine there must be a way. I know you looked at the random provider (you mentioned random_string), but maybe the random_pet with wisely specified keepers would work for you?

https://registry.terraform.io/providers/hashicorp/random/latest/docs/resources/pet#example-usage

  • do yo really need to have the randomness before creation? couldn't you just read the short_id after the device is queued for creation?

Overall I am against this PR. It looks like adding unnecessary complexity for sake of a single use case. But I will submit if there is business reasoning from Equinix.

@zimbatm
Copy link
Author

zimbatm commented Jul 26, 2021

Are you sure you can't achieve the random prefix outside of the provider internals with Terraform functions? I imagine there must be a way. I know you looked at the random provider (you mentioned random_string), but maybe the random_pet with wisely specified keepers would work for you?

It's a limitation of Terraform. It's not possible to bind the lifecycle of two resources together without creating a cycle.

resource "random_id" "hostname_suffix" {
  keepers = {
    // this creates a cycle
    device_id = metal_device.my_host.id
  }
}

resource "metal_device" "my_host" {
  hostname = "my-prefix-${random_id.hostname_suffix.hex}"
  # ...
}

@zimbatm
Copy link
Author

zimbatm commented Jul 26, 2021

Would you accept if I made another PR where the hostname argument was Computed, Optional? I would still generate a xid in that case since the hostname attribute is required in the Metal API.

@t0mk
Copy link
Contributor

t0mk commented Jul 26, 2021

@zimbatm I think yes, if it would (at least somehow) mimic the Javascript UI of the Web Console. In there, c3.medium.x86 device in metro da is automatically called da-c3-medium-x86-%s where the last portion is number, but in our case it could be a radom 8-char hex string.

I still think it's not great, but at least it's similar to the Web Console and to Scaleway server:
https://github.com/scaleway/terraform-provider-scaleway/blob/master/scaleway/resource_instance_server.go#L34
I looked through their issues and they didn't seem to have any problem with random names.

Also, we try to keep the dependencies low, so I'd be against adding xid (I know xid doesn't itself depend on anything but still). Can you argue what it brings over using just "crypto/rand" and "encoding/hex"? Assuming you could call rand.Seed(time.Now().UnixNano()) somwhere in the provider init.

@displague what do you think about this?

@zimbatm
Copy link
Author

zimbatm commented Jul 27, 2021

waiting on @displague before doing any more work

@displague
Copy link
Member

@zimbatm There was some interest from the API team in exposing the short id as a template variable in the hostname. There is already limited support for such things so it would be a matter of formalizing that feature and perhaps expanding on it.

@ocobles
Copy link
Contributor

ocobles commented Jul 1, 2022

NOTE: This project is Deprecated. The Equinix provider has full support for existing Terraform managed Equinix Metal resources once Terraform configuration and state are adapted. The Equinix provider manages resources including Network Edge and Fabric in addition to Metal. Please review the Metal to Equinix provider migration guide.

During the deprecation process we have migrated issues and PRs to the Equinix provider repository. However this PR contains content that I consider is more appropriate to continue in a github discussion instead of an issue/pr before resuming efforts.

equinix/terraform-provider-equinix#213

Please, I encourage you to continue the conversation there. I think it's an interesting topic with good content that can help us define some rules that we want to follow for future features that may be different from the API behavior but useful for users.

@ocobles ocobles closed this Jul 1, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants