Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Agent] Agent UUID #14439

Closed
elasticmachine opened this issue May 6, 2019 · 8 comments
Closed

[Agent] Agent UUID #14439

elasticmachine opened this issue May 6, 2019 · 8 comments
Assignees

Comments

@elasticmachine
Copy link
Collaborator

Original comment by @ph:

As Fleet API, I want to make sure the machine talking to me is the same machine that was enrolled.

To do that we want to generate a fingerprint of the machine and use that fingerprint as a machine-id,
I think we could use theses source to create the ID, we could also add the mac address to it.

  • OS X uses IOPlatformUUID
  • BSD uses /etc/hostid (smbios.system.uuid fallback)
  • Linux uses /var/lib/dbus/machine-id
  • Windows LINK REDACTED

We should not expose that key directly to the api, we should cryptographically hash the key.

@elasticmachine
Copy link
Collaborator Author

Original comment by @tsg:

This is potentially interesting for the SIEM team as well, where we have an interest in establishing the "identity" of a host. Pinging @elastic/secops.

@elasticmachine
Copy link
Collaborator Author

Original comment by @tsg:

Also, we've found out that the linux machine-id is not great, because terraform + GCP creates all VMs with the same machine-id :).

@elasticmachine
Copy link
Collaborator Author

Original comment by @cwurm:

The add_host_metadata processor adds a host.id field to every Beat event by default. The value is retrieved in go-sysinfo:

  • Linux: Reads /etc/machine-id, /var/lib/dbus/machine-id, and /var/db/dbus/machine-id. (machineid.go)
  • macOS: IOPlatformUUID via the gethostuuid API call. (machineid_darwin_amd64.go)
  • Windows: Registry key HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Cryptography\MachineGuid (machineid_windows.go)

That is the best we could come up with. Out of those, only macOS seems to be guaranteed unique so far. We've seen real-world duplicates of the Linux machine-id as Tudor says, and there are reports of MachineGuid not being unique either (e.g. when a Windows machine is cloned or backup/restored).

System.Identity.UniqueID seems tricky (impossible?) to actually retrieve. Would be handy if it's unique even when MachineGuid is not.

We tried using the host.id as the host identifier in the SIEM app but switched to host.name (the machine hostname by default, but it can be overwritten in the configuration for the add_host_metadata processor) because it seemed that in practice, that one is more likely to be unique. We played around with the idea of using the FQDN instead of the hostname by default (still allowing for an overwrite), but haven't implemented anything (it's a bit tricky because it would probably require to reach out to a DNS outside the host).

On the whole, I'm not convinced it's possible to 100% guarantee a unique ID (or hostname, or any other identifier) per host. But that's ok in most (all?) cases I think. Some edge cases might be annoying, but the user can always fix it (e.g. changing to a unique hostname, re-generating a duplicate host ID to be unique, or specifying a custom name in the Beats config as the last resort).

I'm curious, I would think that in the Fleet API we have to go beyond the host ID? Since there will often be multiple Beats on a machine, even multiple Beats of the same type (e.g. multiple Filebeat instances for different log files). So we'd have to somehow identify each one of these, too?

@elasticmachine
Copy link
Collaborator Author

Original comment by @ph:

I'm curious, I would think that in the Fleet API we have to go beyond the host ID? Since there will often be multiple Beats on a machine, even multiple Beats of the same type (e.g. multiple Filebeat instances for different log files). So we'd have to somehow identify each one of these, too?

Yes this is a good point, currently in beats we have two ids, we have one that we persist on disk on the uuid.json file but we also have what we call an ephemeral_id, which is generate at boot time.

Also, we've found out that the linux machine-id is not great, because terraform + GCP creates all VMs with the same machine-id :).

Heh this is a bummer, which is for sure we cannot only use that information, we need a combination machine + network information probably.

@ph ph added the Agent label Nov 11, 2019
@ph ph changed the title [Fleet] Define how we are fingerprinting the agent host to generate the machine-id [Agent] Generate a UUID Nov 19, 2019
@ph ph changed the title [Agent] Generate a UUID [Agent] Agent UUID Nov 19, 2019
@ph
Copy link
Contributor

ph commented Nov 19, 2019

Instead of Generating a fingerprint of the machine where a fingerprint could generated twice we should instead do the following:

  • Use a random UUID, like all beats does today.
  • Allow Fleet to ask to regenerate that ID on conflict (TBD)

@ph
Copy link
Contributor

ph commented Dec 19, 2019

@michalpristas can you update the issue with a TODO / checkbox? On top of my head I see we need to persist it to disk.

@michalpristas
Copy link
Contributor

yes this pretty much sums it up.

  • Agent ID generation and propagation
  • ID persistance (using storage api we got in this week)

@ph
Copy link
Contributor

ph commented Dec 20, 2019

@michalpristas One more thing we also need to have the Generation of the UUID done at enroll time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants