Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix DNS SRV port not used in cluster.join-addresses #766

Closed
wants to merge 0 commits into from

Conversation

devsunb
Copy link

@devsunb devsunb commented May 4, 2024

PR Description

Fixed an issue where DNS SRV record's port is not used in cluster.join-addresses.

Which issue(s) this PR fixes

Notes to the Reviewer

PR Checklist

  • CHANGELOG.md updated
  • Documentation added

@rfratto
Copy link
Member

rfratto commented May 14, 2024

Thanks for opening this!

@tpaschalis @wildum can one of you take a look at this one? You know more about this code than I do.

@rfratto rfratto added the backport-to-agent PR should be backported to the agent repo. label May 14, 2024
@clayton-cornell clayton-cornell requested a review from a team May 23, 2024 16:21
@@ -107,6 +107,7 @@ If `--cluster.advertise-interfaces` isn't explicitly set, {{< param "PRODUCT_NAM
Since Windows doesn't use the interface names `eth0` or `en0`, Windows users must explicitly pass at least one valid network interface for `--cluster.advertise-interfaces` or a value for `--cluster.advertise-address`.

The comma-separated list of addresses provided in `--cluster.join-addresses` can either be IP addresses with an optional port, or DNS records to lookup.
If the address is a DNS record, it will be looked up as an A/AAAA record if the port is explicitly provided, otherwise it will be looked up as a SRV record and the port in the SRV record will be used.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if the intention was different here, but we still always perform a LookupSRV.

Also, depending on how addresses are discovered (one example is a Headless kubernetes service, but it can be any number of ways), the port is not guaranteed to be present or set to the same port that nodes use as their listen addresses. So that's why I feel making use of the default port on the local node's listen address is a sensible default here.

@tpaschalis
Copy link
Member

Hey @devsunb, how's it going? First off, apologies for the belated response here, it shouldn't have taken so long for me to get back to you here.

Could you explain more about the intentions of this PR? As I said in the previous comment, I feel this would depend on DNS SRV records that are explicitly defined for the usecase of Alloy clustering and may break usecases where a more generic record is used for discovery.

@clayton-cornell clayton-cornell added the type/docs Docs Squad label across all Grafana Labs repos label May 27, 2024
@devsunb devsunb closed this May 28, 2024
@devsunb
Copy link
Author

devsunb commented May 28, 2024

@tpaschalis Thank you for your response. Let me give you some background on this PR.

I am using AWS ECS and AWS Service Discovery for internal DNS between containers.
One of the issues with ECS and Service Discovery is that only one Service Discovery can be configured for one ECS Service, so it was impossible to configure SRV records for multiple ports in one Service, or to configure SRV record and A record together.
Therefore, I was configuring services using only A record. (For example, for running an etcd cluster, I created an init script that looks up the IP list of running containers based on A record and passes it as option).

In this context, I wanted to cluster Alloy using A record, but when I passed just the domain name with the --cluster.join-addresses option, it didn't work, so I checked the documentation but I couldn't find any information about A record support, so I checked the code.

I found that the appendJoinAddr function in the internal/alloycli/cluster_builder.go file behaves as follows for values of the --cluster.join-addresses option

  1. If port is specified -> Use as is as peer
  2. If port is not specified
    1. If IP -> Use <IP>:<Default Port> as peer
    2. If Domain Name -> Use <SRV Record Targets>:<Default Port> as peer

At first, I thought it was because Alloy didn't define behavior for A record in 2-ii, so I wrote code for A record support in that part and it worked fine, so I was going to create a PR, but when I looked a little further into the code, I found that hashicorp/memberlist (which Alloy uses for peer clustering) already has code for A record support.
https://github.com/hashicorp/memberlist/blob/3f82dc10a89f82efe300228752f7077d0d9f87e4/memberlist.go#L417

So I reverted the change and instead specified the port while passing the domain name as --cluster.join-addresses option. Then, Alloy will use the domain name as peer and look up the A record in hashicorp/memberlist, so I expected it to work fine, and it did. I thought this option structure made sense because when using A record, unlike SRV record, the port is not in the record, so it should be specified.

And I decided to change the direction of the PR and make it a PR to document the specifics of the --cluster.join-addresses option's support for A record.

However, there was one question that remained, in 2-ii, since the SRV record contains the port, why is it using the default port instead of using it, so I fixed it and included it in the PR.
Now I understand from your response that the port in the SRV record will not always be correct, so it makes sense to keep using the default port.

To summarize, there are the following cases when using the --cluster.join-addresses option for clustering Alloy.

  1. <IP> -> Used as <IP>:<Default Port>
  2. <IP>:<Port> -> Used as <IP>:<Port>
  3. <Domain Name> -> Used as <SRV Record Targets>:<Default Port>
  4. <Domain Name>:<Port> -> Used as <A Record IPs>:<Port>

Since it seems to be documented by #910 that Alloy does not support case 4, why not include in the documentation that there is a way to use A record for Alloy clustering as well, for people in similar situations to mine?

In any case, I closed this PR because I understood why the main request won't merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-to-agent PR should be backported to the agent repo. type/docs Docs Squad label across all Grafana Labs repos
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants