New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Forwarded parser in cats-parse #4147
Forwarded parser in cats-parse #4147
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. This is one of the most complicated headers.
def fromString(s: String): ParseResult[Obfuscated] = | ||
new ModelNodeObfuscatedParser(s).parse | ||
parser.parseAll(s).left.map { e => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a ParseResult.fromParser
that's better for this.
// obfport = "_" 1*(ALPHA / DIGIT / "." / "_" / "-") | ||
val nodePort: P[Node.Port] = | ||
Numbers.digits1 | ||
// is it worth it to consume only up to 5 chars or just let it fail later? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A max
is coming in cats-parse-0.3 that will make this easier. We can let it fail later and clean it up on upgrade, I'd say.
def quoted[A](p: P[A]): P[A] = | ||
Rfc7230.token | ||
.orElse(Rfc7230.quotedString) | ||
.flatMap(str => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have to flatMap these, or is each p
sufficient on its own? Is it the quoting that makes this so complicated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked at the previous parser and this was more or less what it was doing. I imagined quoted-string
was more than putting the quotes in the beginning and the end but it isn't, so I think I can rewrite this as
oneOf(p.between('"', '"'), p)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
☝️ this is not true, i'm now getting more failures because the host parser is consuming the text of the following pairs, e.g. host="http4s.org;for=_kadabra;proto=http;by=_abra"
. Maybe quoted-string
is more restrictive than Host
? Will investigate later.
Also, we're about to fold the dotty branch into main and give up on the idea of cats-parsed 0.21 release. So don't worry about the MiMa parsers. |
@rossabaker have you had a change to look at this? I'm stuck in the encoding stuff. |
I have not. I'll try to take a deeper look tonight. One thing we might be able to do in the meantime is merge series/0.22 to and fix the compile errors: the target branch is now on cats-parse-0.3. Mostly, Parser becomes Parser0 and Parser1 becomes Parser. There's a scalafix for it, but I don't know whether this is big enough to be worth running, or just fixing up manually. |
@fredshonorio are you still up for picking this up again? |
I'm blocked in the encoding stuff I mentioned in the first comment, and could use some help. I can handle the rest once that's done. |
Tests are still failing, but I went ahead and dealt with the cats-parse-0.3 upgrade that's already on the target branch. Trying to examine the failures before I sleep... |
Some of the complication seems to be flowing from the arbitrary |
ForwardedSpec was failing whenever the RegName begins with a number. That's because we weren't backtracking from IPv4 addresses. That's fixed in 1dc3f87. We store a decoded value, but weren't encoding the rendered value. That's fixed in f2835c4, along with removing a previous hack around that bug. With that, |
It looks like at least two of the outstanding failures in ForwardedRenderingSpec are related to the encoding of weird RegNames, but I need to be up. I'll try to take another pass tomorrow if nobody else has figured it out. We need to get a lot more rigorous in the URI about what's encoded and what's not. That's the main source of trouble here. /cc @satorg, who worked on the original model. |
Seems like the problem is in the generator for hostnames. Dont we need to punycode any non-ascii chars for the hostname? |
I already took a shot at reforming the hostname generator here. In practice, non-ASCII characters in a hostname would be punycoded. But the URI spec intentionally says it doesn't necessarily follow the rules for hostname resolution. Some of the problem @fredshonorio ran into was that the generator was creating percent-encoded strings, and storing them in a model that stores decoded strings. I started fixing that in my commits, but I think there may be more nuance. As we think about a better URI design, a newtype to distinguish percent-encoded values from raw values could help us avoid bugs like this. |
A possible implementation using arbitraries adapted from ip4s:
|
@hamnis' suggestion passes the tests. I don't think it's complete per the spec, but I don't think it's incomplete in a way that is interesting enough to hold up progress. |
Nice, I haven't been able to give it all a full look yet, are there any outstanding issues still? |
Hmm. The tests passed for me locally, but I only ran There's an unmoored doc comment that I'll push in just a moment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cleaning up the max port length would be good, but the tests pass, and I think that could be done in a followup. If we merge this, I think most of parboiled2 is gone.
Took a little more than I'd hope but here it is.
This fails a bunch of tests related to parsing the
Host
. I'm unsure where the problem is but it looks encoding related, the tests are~tests/testOnly org.http4s.*.Forwarded*Spec
.I'm also unsure about the
quoted
implementation, if the approach is to run the parser with the parsed string or something else. If it is I don't know what error message to use.I wanted feedback on this so I haven't yet deleted the parboiled parsers or added MiMa stuff.