-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Escaping Rules #54
Comments
Coming back to this, the issue seems to be when other For example, the Go implementation produces a purl like this:
Note that according to URL standards, this is fine. The Go implementation also round-trips this correctly, too. However, the encoding is technically not correct according to the purl spec, because that defines that
And so, different implementation like the Javascript one parse the above PURL "incorrectly". I'm not sure if there's any other way than to simply At least for now, this would ensure compatibility with other implementations and make the Go library follow the spec as well. Generally however, I think this is an issue with the PURL spec....but fixing that would likely be a breaking change... |
Okay I think I'm going crazy. fmt.Println(url.PathEscape("a+b c@d"))
// a+b%20c@d
fmt.Println(url.QueryEscape("a+b c@d"))
// a%2Bb+c%40d // outputs from Go
let pathEscaped = "a+b%20c@d"
let queryEscaped = "a%2Bb+c%40d"
console.log(decodeURIComponent(pathEscaped))
// a+b c@d
console.log(decodeURI(pathEscaped))
// a+b c@d
console.log(decodeURIComponent(queryEscaped))
// a+b+c@d
console.log(decodeURI(queryEscaped))
// a%2Bb+c%40d (Apparently, Go is RFC3986 compliant ( The Go implementation seems to be more robust: console.log(encodeURIComponent("a+b c@d"));
// a%2Bb%20c%40d
console.log(encodeURI("a+b c@d"))
// a+b%20c@d fmt.Println(url.QueryUnescape("a%2Bb%20c%40d"))
// a+b c@d <nil>
fmt.Println(url.PathUnescape("a%2Bb%20c%40d"))
// a+b c@d <nil>
fmt.Println(url.QueryUnescape("a+b%20c@d"))
// a b c@d <nil>
fmt.Println(url.PathUnescape("a+b%20c@d"))
// a+b c@d <nil> So:
But that doesn't solve the parsing ambiguities. Reading into this a bit more, the primary issue seems to be the " " (space). In a query, it can either be encoded as However, given the fact that JS doesn't actually decode Footnotes
|
I think with #58 now being merged, this should be all done, so I'll close this issue! |
Hi again!
I've started wondering about the correct escaping-rules for all the different components of the purl. As far as I can tell, the current implementation doesn't actually match the spec in a couple of edge-cases, as it uses
PathEncode
instead ofQueryEncode
.(I'd argue that using
PathEncode
is the right thing fornamespace
,name
andversion
, but it creates a couple of difficulties that I've written down on an issue on the spec).Here's the escapes that are required:
/
to get the individual segments, escape every segment withPath/QueryEscape
. Currently this is implemented by the usage ofJoinPath
and thenEscapedPath
, I believe this is correct.Path/QueryEscape
: fix: escape and unescape name #55Path/QueryEscape
QueryEscape
The current implementation calls
EscapedPath
onnamespace
,name
andversion
, so technically implements the above points, except for one difference: thename
isn't escaped by itself, meaning that if it contains a/
, it will not be escaped.Additionally, it won't "fully" escape versions, which could create ambiguities if the
version
contains an@
, e.g.pkg:deb/debian/mypkg@v1.2.3@alpha-1
. However, this question I think needs to be answered by the spec, see my linked comment. As a workaround, I guess we could query-escape it...The
name
issue can be solved byPathEscape
ing (and unescaping) it, I think I'll raise a PR to fix this.It would be good to know however what to do with the version as well :)
The text was updated successfully, but these errors were encountered: