Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 8914 x509sni #9289

Merged
merged 3 commits into from Jun 22, 2021
Merged

Issue 8914 x509sni #9289

merged 3 commits into from Jun 22, 2021

Conversation

jjh74
Copy link
Contributor

@jjh74 jjh74 commented May 21, 2021

Required for all PRs:

  • Updated associated README.md. (no need to update README)
  • Wrote appropriate unit tests. (tests already exist)

resolves #8914
resolves #9278
resolves #9384

  1. commit 448eac0 tries to ensure that fileglobs are not tried for remote uris (sources starting with tcp://, udp:// or https://). This is because strings.Index(source, ":\\") != 1 (https://github.com/influxdata/telegraf/blob/master/plugins/inputs/x509_cert/x509_cert.go#L70) matches almost all sources -> fileglobs are tried for remote sources.
  2. commit 4e1e6f3 tries to ensure that c.tlsCfg.ServerName is reset between multiple certs/sources. Without this telegraf uses same SNI for multiple certs. Leading to telegraf requesting wrong certificates and failing validation because SNI/SAN doesn't match. (Longer discussion in: inputs.x509_cert unexpected behavior with SNI #8914)

@helenosheaa helenosheaa added the fix pr to fix corresponding bug label May 24, 2021
Copy link
Contributor

@srebhan srebhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jjh74 thanks for digging into this and submitting this PR. I added a comment and a question in the code, so it would be nice if you can comment on those.

However, taking a quick look at the code I think the issue arises from a conceptual weakness in the code. We reuse c.tlsCfg for multiple locations and are thus stateful in the code. IMO we should restructure the code such that we create one c.tlsCfg for each location in an Init() function making it an array. This way we can setup everything correctly there and do not need to clear fields in the loop over the locations. @jjh74 what is your opinion on this suggestion?

@srebhan srebhan self-assigned this May 27, 2021
@akrantz01
Copy link
Contributor

In addition to what @srebhan said, I think it'd be a good idea to add a unit test to make sure that URLs and file paths/globs are parsed correctly, so this doesn't come up again in the future. Also, please fix the issue raised by the linter.

Copy link
Contributor

@srebhan srebhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have one remaining question regarding uriregexp in the code...

@jjh74
Copy link
Contributor Author

jjh74 commented Jun 1, 2021

@jjh74 thanks for digging into this and submitting this PR. I added a comment and a question in the code, so it would be nice if you can comment on those.

However, taking a quick look at the code I think the issue arises from a conceptual weakness in the code. We reuse c.tlsCfg for multiple locations and are thus stateful in the code. IMO we should restructure the code such that we create one c.tlsCfg for each location in an Init() function making it an array. This way we can setup everything correctly there and do not need to clear fields in the loop over the locations. @jjh74 what is your opinion on this suggestion?

I have no strong opinion one way or another.
Perhaps using c.tlsCfg for each location would make it easier to add other TLS configurations (like tls min version or cipher suites) ? (But writing configuration where you'll add lot's of parameters for each sources seems quite unreadable and it's probably easier to use multiple [[inputs.x509_cert]])

I think tests are missing a test case where you have more than one https/tcp sources and the test would check that correct SNI is used for each one.

@srebhan
Copy link
Contributor

srebhan commented Jun 2, 2021

Currently we are using the user configuration and create one tlsCfg as a kind of template. However, this way we need to take care for resetting location-dependent properties of that template while we gather.
My suggestion is to use the user configuration and create multiple tlsCfg, one per location. We then initialize the location-dependent properties once (e.g. at Init()) and don't need to do anything during gather...

jjh74 added a commit to jjh74/telegraf that referenced this pull request Jun 9, 2021
…e>:\ sources.

This expects that globpath works on windows.
See: influxdata#9289 for more information.
@srebhan
Copy link
Contributor

srebhan commented Jun 10, 2021

@jjh74 seems like the windows test is failing (even for non-globbing tests)... :-( Maybe conversion of path-separators might be involved...

@aslgithub
Copy link

Is there any progress on this ? It seems to be the pro-offered fix for the fact that #6952 has fundamentally broken x509_cert for sources other than file ?
I've logged #9384

jjh74 added a commit to jjh74/telegraf that referenced this pull request Jun 17, 2021
(not on sources starting with <driveletter>:\)
See: influxdata#9289 for more information.
@jjh74
Copy link
Contributor Author

jjh74 commented Jun 17, 2021

@srebhan I just pushed a commit changing sourcesToURLs() so globpaths are tried only for file:// and / sources (ie. expect that globpath is not working on windows). (You can still use sources like file:///c:/WINDOWS/Temp/*.pem to try globpath on windows). Hopefully CI now builds/works on windows.

Copy link
Contributor

@srebhan srebhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. The CI error is unrelated (in plugins/inputs/tcp_listener). Thanks for caring @jjh74.

@srebhan
Copy link
Contributor

srebhan commented Jun 18, 2021

!retry-failed

jjh74 added a commit to jjh74/telegraf that referenced this pull request Jun 21, 2021
…e>:\ sources.

This expects that globpath works on windows.
See: influxdata#9289 for more information.
jjh74 added a commit to jjh74/telegraf that referenced this pull request Jun 21, 2021
(not on sources starting with <driveletter>:\)
See: influxdata#9289 for more information.
@jjh74
Copy link
Contributor Author

jjh74 commented Jun 21, 2021

@srebhan now should be rebased to current master

@srebhan
Copy link
Contributor

srebhan commented Jun 22, 2021

@jjh74 seems like something went wrong with the rebase as you now have 139 files changed. 8-/

(not on sources starting with <driveletter>:\)
See: influxdata#9289 for more information.
this multiple remote sources (sources = [ "tcp://remote1.example.org:443",
"tcp://remote2.example.org:443" ]) reuse first SNI. Telegraf will
request wrong certificate and can fail validation when SAN/SNI doesn't match.
@jjh74
Copy link
Contributor Author

jjh74 commented Jun 22, 2021

@srebhan new attempt with current master. I hope it goes better this time.

Copy link
Contributor

@sspaink sspaink left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! Thank you for fixing these issues and explaining everything so clearly.

@sspaink sspaink merged commit ac9bf5a into influxdata:master Jun 22, 2021
srebhan pushed a commit to HRI-EU/telegraf that referenced this pull request Jun 22, 2021
…e>:\ sources.

This expects that globpath works on windows.
See: influxdata#9289 for more information.
srebhan pushed a commit to HRI-EU/telegraf that referenced this pull request Jun 22, 2021
(not on sources starting with <driveletter>:\)
See: influxdata#9289 for more information.
@Nkmol
Copy link

Nkmol commented Jun 24, 2021

Any ETA for a new release of this? I noticed this is essential for a multi-chain certificate solution. For example, the common Traefik LetsEncrypt setup uses a multichain setup that uses the server name to correctly pick the chain (LetsEncrypt vs Default Cert).

Thanks for these fixes btw :)

@reimda
Copy link
Contributor

reimda commented Jun 24, 2021

Any ETA for a new release of this?

The next scheduled patch release is July 7. We've been considering making an early patch release to fix a few regressions in 1.19.0, including this one, but we haven't settled on a date yet. It might be June 30.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fix pr to fix corresponding bug
Projects
None yet
8 participants