Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

purell module is no longer maintained #103068

Closed
gkarthiks opened this issue Jun 21, 2021 · 15 comments · Fixed by #103801
Closed

purell module is no longer maintained #103068

gkarthiks opened this issue Jun 21, 2021 · 15 comments · Fixed by #103801
Assignees
Labels
area/code-organization Issues or PRs related to kubernetes code organization area/kubeadm kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@gkarthiks
Copy link
Contributor

What happened:

This is a proactive issue opened to explore the path forward with the purell module that we have been using as a dependency for kubeadm in here

if tmpVersionURL, err := purell.NormalizeURLString(versionURL, purell.FlagRemoveDuplicateSlashes); err != nil {

This module is looking for a new maintainer.

What you expected to happen:

Discussions around the path forward, potentially eliminate the module from usage and do this operation in-house, as we use it in only one place for specific functionality.

How to reproduce it (as minimally and precisely as possible):

Ref Issue from purell repo: PuerkitoBio/purell#33

Anything else we need to know?:

Slack Thread: https://kubernetes.slack.com/archives/C09R23FHP/p1624250397087400

@gkarthiks gkarthiks added the kind/bug Categorizes issue or PR as related to a bug. label Jun 21, 2021
@k8s-ci-robot k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jun 21, 2021
@neolit123
Copy link
Member

neolit123 commented Jun 22, 2021

in kubeadm the usage of purell.NormalizeURLString does a number of things:

  1. applies Unicode normalization on the URL host according to "Unicode Normalization Form C".
  2. applies Unicode transformation to "fold" characters to canonical width on the host.
  3. converts the host from Unicode to ASCII (IDNA encoding, but 1 and 2 and needed due to a non-conformant x/net/idna at the time?)
  4. removes duplicate "//" from the URL path.
  5. escapes the whole URL as per RFC 3986 (looks like PuerkitoBio/urlesc forks golang/net/url).

IMO, we should:

  • replace the usage of purell.NormalizeURLString with this trimmed local version:
func normalizeURLString(s string) (string, error) {
	u, err := url.Parse(s)
	if err != nil {
		return "", err
	}
	if len(u.Path) > 0 {
		u.Path = strings.ReplaceAll(u.Path, "//", "/")
	}
	return u.String(), nil
}

then add a release note to the PR:

kubeadm: external etcd endpoints passed in the ClusterConfiguration that have Unicode characters are no longer IDNA encoded (converted to Punycode). They are now just URL encoded as per Go's implementation of RFC-3986, have duplicate "/" removed from the URL paths and passed like that directly to the kube-apiserver --etcd-servers flag. If you have etcd endpoints that have Unicode characters, it is advisable to encode them in advance with tooling that is fully IDNA compliant. If you don't do that, the Go standard library (used in k8s and etcd) would do it for you when making requests to the endpoints.

we could import golang.org/x/*, and do the punycode conversion, but i do not understand why it should be kubeadm's responsibility to do that. it feels like IDNA is a host reachability problem that should be solved pre-k8s deployment.

i can see this being a breaking change to users having Unicode etcd endpoints in kubeadm, but i'm hoping the same users were already aware to not rely on k8s components to manage this for them.

@neolit123
Copy link
Member

/sig cluster-lifecycle
/area kubeadm
/remove-kind bug
/kind cleanup

@k8s-ci-robot k8s-ci-robot added sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. area/kubeadm kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. kind/bug Categorizes issue or PR as related to a bug. labels Jun 22, 2021
@BenTheElder BenTheElder added area/code-generation area/code-organization Issues or PRs related to kubernetes code organization and removed area/code-generation labels Jun 22, 2021
@gkarthiks
Copy link
Contributor Author

@neolit123 cool, that was helpful. I'll take up on this and create a PR.

@gkarthiks
Copy link
Contributor Author

/assign

@randomvariable
Copy link
Member

randomvariable commented Jun 25, 2021

From a Cluster API perspective:

It's highly unlikely from the CAPI side that we'll specify IDNA in the kubeadm config. There is a chance on CAPV (vSphere) that a host picks up the DHCP option for the domain name suffix, and that it's using an IDNA. E.g., in my home environment, i have DHCP option 15 set to home.internal.randomvariable.co.uk, and these end up in the etcd pods as parameters, e.g.
- --initial-cluster=mgmt-control-plane-pb5tc.home.internal.randomvariable.co.uk=https://192.168.192.227:2380

Is IDNA encoding a consideration for input in etcd's command line, or are unicode args fine?

For the public clouds, it's even more unlikely we'll hit it as the choice of domain name suffix is highly constricted to meet node identity requirements in the CPI.

My own take, is that I'm not wed to keeping this around at all, I haven't seen a use case for it.

@randomvariable
Copy link
Member

cc @yastij and @gab-satchi for any further vsphere comments.

@neolit123
Copy link
Member

@SataQiu @pacoxu have you seen DHCP with IDNA domains used for external etcd with kubeadm?

@neolit123
Copy link
Member

Is IDNA encoding a consideration for input in etcd's command line, or are unicode args fine?

unicode args should be fine, also apparently Go's HTTP stack does IDNA encoding:

res, err := http.Get("https://example.台灣")
if err != nil {
	log.Fatal(err)
}
// https://example.台灣 is not a real website
// output: 2009/11/10 23:00:00 Get "https://example.%E5%8F%B0%E7%81%A3": dial tcp: lookup example.xn--kpry57d on 169.254.169.254:53: dial udp 169.254.169.254:53: connect: no route to host

i will update the proposed release note above.

but IMO, if one would like a domain to be encoded when passed to all the tools in the k8s stack, maybe it should be done in advance.

it feels like IDNA is a host reachability problem that should be solved pre-k8s deployment.

or leave it to the Go standard library to do the encoding. i saw comments that it might not be fully compliant yet, but i don't know how up to date these comments are.

@neolit123
Copy link
Member

/triage accepted
i didn't see any PRs for this for 1.22. looks like we can update it in 1.23.

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jul 19, 2021
@gkarthiks
Copy link
Contributor Author

@neolit123 Apologies, I was a little occupied at work. But I have created the PR 👉 #103801

Please let me know if we can add this as part of 1.22.

@neolit123
Copy link
Member

@gkarthiks
no worries. we cannot add it in 1.22 because it's not a critical bug fix and we are in code freeze.
i will comment on the PR.

@neolit123
Copy link
Member

#103801 (comment)

looks like we should tell go-openapi/jsonreference to stop using the library too.

@gkarthiks
Copy link
Contributor Author

@neolit123 done. Created an issue in their repo as well go-openapi/jsonreference#10 and offered to create a PR if they are willing to move forward with in-sourcing the logic.

@neolit123
Copy link
Member

@gkarthiks thanks!

@gkarthiks
Copy link
Contributor Author

@neolit123 just an FYI, the maintainers of the go-openapi have also removed the purell dependency. 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/code-organization Issues or PRs related to kubernetes code organization area/kubeadm kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants