New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Link parsing only works for basic cases #2099
Comments
Looking at the RFC more deeply, looks that I'm missing a lot. For example, missing But many of the points above still apply. The RFC contains precise algorithm for parsing, I would expect a high quality library to strictly follow it, including the edge cases. |
Would you mind listing the cases that really do break and are valid link headers according to the specification? |
@odrotbohm I edited the original post of this issue. |
Would you mind turning them into test cases and submitting a PR? I'll see what I can do then. If you fancy taking a stab at a better parsing implementation, feel free to submit that, too. |
The previous implementation used naive parsing suffering from many issues, especially when special characters were part of values. It also didn't escape the special characters when serializing a link to string. I marked the `Link.valueOf` method as deprecated because it doesn't correctly parse multiple links, nor does it handle multiple values for the `rel` param. One should use `Links.parse`. There are no other incompatible API changes. I had to change one current test: we no longer ignore missing final `>` after the URL - such link is invalid anyway because it didn't have the rel parameter. Fixes spring-projects#2099
We've decided to use your library to parse
Link
headers, hoping that you'll correctly implement the specification intricacies, because doing parsing correctly is tricky. However, it's implemented completely naively, working only in the basic cases. It uses regex for parsing, even though it can't be used for non-context-free grammars.Here are a bunch of examples:
This is the ABNF for the
Link
header:And here for the content of the
rel
param:See https://httpwg.org/specs/rfc8288.html
So there can be multiple values in the
rel
param, even URIs and quoted URIs.Some of the errors can't be fixed without breaking b-w compatibility, e.g. the decoding of href, unescaping of params and multiple rel values. Also, I'm not sure why custom parameters aren't supported - I didn't read the whole spec, but I don't think it prohibits custom params (we don't use them, just wondering).
The text was updated successfully, but these errors were encountered: