Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LOG-4589: vector fails to start on a node with IPv6 disabled #2200

Merged
merged 1 commit into from Oct 12, 2023

Conversation

syedriko
Copy link
Contributor

@syedriko syedriko commented Oct 6, 2023

Description

On an IPv4-only cluster, vector fails to start because its metrics endpoint attempts to create an IPv6 socket. This PR adds a an IPv6 socket creation check and selects the right listen-to address.

/cc @cahartma
/assign @jcantrill

Links

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Oct 6, 2023
@openshift-ci-robot
Copy link

openshift-ci-robot commented Oct 6, 2023

@syedriko: This pull request references LOG-4589 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.8.0" version, but no target version was set.

In response to this:

Description

On an IPv4-only cluster, vector fails to start because its metrics endpoint attempts to create an IPv6 socket. This PR adds a an IPv6 socket creation check and selects the right listen-to address.

/cc @cahartma
/assign @jcantrill

Links

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot requested a review from cahartma October 6, 2023 20:48
@syedriko
Copy link
Contributor Author

syedriko commented Oct 6, 2023

/test e2e-ocp-target-minus-one

1 similar comment
@syedriko
Copy link
Contributor Author

syedriko commented Oct 7, 2023

/test e2e-ocp-target-minus-one

@jcantrill
Copy link
Contributor

Please rebase this to 5.7 and we'll pick it forward

@syedriko
Copy link
Contributor Author

syedriko commented Oct 9, 2023

Please rebase this to 5.7 and we'll pick it forward

Here you go: #2201

var PrometheusExporterAddress string

func init() {
if fd, err := unix.Socket(unix.AF_INET6, unix.SOCK_STREAM, unix.IPPROTO_IP); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having this run during init of the package bothers me a bit, because it makes it intransparent that this code is running and host-configuration is affecting the results. I think we're also testing for the output generation, which would now fail on non-IPv6 enabled hosts (because the output would have the IPv4 address in it instead of the IPv6 one we expect).

I have two ideas for this:

  • Does vector maybe support leaving out the address completely (just provide :24231)? Then we could avoid this whole code completely.
  • Can we put this code somewhere "more explicit" and pass the result to the generator? That way we could reliably set "IPv6-enabled/disabled in our tests" and an explicit function call is probably also less surprising when debugging.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Nope
$ grep address ~/syslog/vector_ip.toml | grep -v '^#'
address = ":24231"
$ target/debug/vector -c ~/syslog/vector_ip.toml
...
2023-10-09T13:59:02.169084Z  INFO vector::app: Loading configs. paths=["/home/syedriko/syslog/vector_ip.toml"]
2023-10-09T13:59:02.170997Z ERROR vector::cli: Configuration error. error=invalid socket address syntax
in `sinks.prometheus_output`
$ 
  1. ...would now fail on non-IPv6 enabled hosts
    We don't have any, not even in QA, otherwise we would've caught this issue earlier. All the CI tests pass.
    I like this approach exactly because it's unavoidable and unconditional. We can tell an IPv4-only host if we ever have it automated, in the tests and expect the correct config gen output accordingly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@syedriko Yes, this is nothing that "will happen immediately". Having the code in init() that changes the value of the variable before it is used the first might just be surprising if you look on the code and don't know about this yet (or have forgotten it in a few months), so this is a construct I'd try to avoid.

But it's not wrong, so consider my feedback on this optional.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PrometheusExporterAddress used to be a const, what I'm doing is the best approximation to a const I could come up with.

@openshift-ci-robot
Copy link

openshift-ci-robot commented Oct 9, 2023

@syedriko: This pull request references LOG-4589 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.8.0" version, but no target version was set.

In response to this:

Description

On an IPv4-only cluster, vector fails to start because its metrics endpoint attempts to create an IPv6 socket. This PR adds a an IPv6 socket creation check and selects the right listen-to address.

/cc @cahartma
/assign @jcantrill

Links

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@jcantrill
Copy link
Contributor

/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 11, 2023
@jcantrill
Copy link
Contributor

/approve

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 11, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jcantrill, syedriko

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 11, 2023

func init() {
if fd, err := unix.Socket(unix.AF_INET6, unix.SOCK_STREAM, unix.IPPROTO_IP); err != nil {
PrometheusExporterAddress = `0.0.0.0`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tried using an empty hostname `":12345" rather than an IP literal? The empty hostname is supposed to listen on all local addresses for ipv6 or ipv4.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep ^^^ #2200 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Argh. I have long hated getaddrinfo in C, and now I hate the equivalent in Rust as well. Is a simple "listen to everything" so much to ask?
Have you seen https://issues.redhat.com/browse/LOG-4608 ? I think we have to solve and test both together. QE may have test clusters now, and I guess we can force the ipv settings on a CRC or SNO cluster to test all scenarios.

@jcantrill
Copy link
Contributor

/hold cancel

@jcantrill
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 11, 2023
@jcantrill
Copy link
Contributor

/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 11, 2023
@jcantrill
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Oct 11, 2023

func init() {
if fd, err := unix.Socket(unix.AF_INET6, unix.SOCK_STREAM, unix.IPPROTO_IP); err != nil {
PrometheusExporterAddress = `0.0.0.0`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Argh. I have long hated getaddrinfo in C, and now I hate the equivalent in Rust as well. Is a simple "listen to everything" so much to ask?
Have you seen https://issues.redhat.com/browse/LOG-4608 ? I think we have to solve and test both together. QE may have test clusters now, and I guess we can force the ipv settings on a CRC or SNO cluster to test all scenarios.

@syedriko
Copy link
Contributor Author

/unhold

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 12, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 12, 2023

@syedriko: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-ci openshift-ci bot merged commit 78a1e1a into openshift:master Oct 12, 2023
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. release/5.8
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants