You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Since recent update Custom Identifier (eg. AWS instance id) check is introduced( #405 ). This check ensures that the Custom Identifier of the current instance matches the one previously stored on the mackerel.io server for the HostID. In our setup, this check prevents mackerel-agent to start up.
The Custom Identifier of the current instance is fetched from the cloud meta-data service (i.e. http://169.25.169.254 in case of AWS). The meta-data service is expected to return response within 100ms:
When the meta-data service is unable to respond within 100ms, an empty string is used as the Custom Identifier of the current instance, which does not match the one successfully stored on the server in the previous agent run.
iffsStorage, ok:=conf.HostIDStorage.(*config.FileSystemHostIDStorage); ok {
returnnil, fmt.Errorf("custom identifiers mismatch: this host = \"%s\", the host whose id is \"%s\" on mackerel.io = \"%s\" (File \"%s\" may be copied from another host. Try deleting it and restarting agent)", customIdentifier, hostID, result.CustomIdentifier, fsStorage.HostIDFile())
}
returnnil, fmt.Errorf("custom identifiers mismatch: this host = \"%s\", the host whose id is \"%s\" on mackerel.io = \"%s\" (Host ID file may be copied from another host. Try deleting it and restarting agent)", customIdentifier, hostID, result.CustomIdentifier)
}
(customIdentifier = "" and result.CustomIdentifier = (previously stored instance-id))
We use OpenStack, which provides AWS-compatible meta-data service at 169.254.169.254, but in our setup it sometimes take a bit longer than 100ms to respond. Due to the reason described above, we are having trouble that mackerel-agent often fails to restart.
I have some idea to deal with this problem,
Add an option to extend the HTTP timeout for the meta-data service,
Add an option to disable cloud meta-data retrieval, or
Skip the check when the Custom Identifier cannot be fetched for some reason.
Which one is preferred? I can provide a patch for the solutions if necessary.
The text was updated successfully, but these errors were encountered:
hanazuki
changed the title
Unable to start mackerel-agent with slow cloud meta-data service
Unable to restart mackerel-agent with slow cloud meta-data service
Sep 7, 2017
Thank you. I think we can simply extend the default timeout (adding an option for this would not be a good idea). IMO 5 seconds for timeout is acceptable. What do you think? @mechairoi@Songmu
I have missed #401#398 . My description above seems wrong.
Before #401#398, VM instances on our OpenStack pass isEC2 check since the AWS-compatible meta-data service returns ami-id. After the change, they don't because the kernel do not have /sys/hypervisor/uuid (we use qemu-kvm instead of Xen) and thus Custom Identifier seems not fetched from the meta-data service.
Since recent update Custom Identifier (eg. AWS instance id) check is introduced( #405 ). This check ensures that the Custom Identifier of the current instance matches the one previously stored on the mackerel.io server for the HostID. In our setup, this check prevents mackerel-agent to start up.
The Custom Identifier of the current instance is fetched from the cloud meta-data service (i.e. http://169.25.169.254 in case of AWS). The meta-data service is expected to return response within 100ms:
mackerel-agent/spec/cloud.go
Line 47 in 6798d53
When the meta-data service is unable to respond within 100ms, an empty string is used as the Custom Identifier of the current instance, which does not match the one successfully stored on the server in the previous agent run.
mackerel-agent/command/command.go
Lines 108 to 113 in 369b932
(
customIdentifier = ""
andresult.CustomIdentifier = (previously stored instance-id)
)We use OpenStack, which provides AWS-compatible meta-data service at 169.254.169.254, but in our setup it sometimes take a bit longer than 100ms to respond. Due to the reason described above, we are having trouble that mackerel-agent often fails to restart.
I have some idea to deal with this problem,
Which one is preferred? I can provide a patch for the solutions if necessary.
The text was updated successfully, but these errors were encountered: