-
Notifications
You must be signed in to change notification settings - Fork 554
Update URI parser to use SBuf parsing APIs #275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe some new APIs should be adjusted. The rest is polishing.
I also suggest changing this PR title to Update URI parser to use modern parsing APIs.
BTW, have you considered completely dropping URN support?
I have, and am undecided on that proposal. I know some places use urn:, but not necessarily via Squid. There are some things we might use it for internally (eg Store-ID standard IDs or direct pointers to cached variants) - but it may be beneficial to have a mapping of those in cache managers under http(s):// (making it more directly public, hmm). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have two serious concerns (one old one and one related to passing c-strings images of well-known schemes to the UriScheme constructor). IIRC, the rest is polishing.
Removing URN support does not mean we can never add that support back. Removing URN support simply means that we do not have the resources to maintain/improve that barely (if at all) used and very low-quality experimental code right now. Thus, future potential uses are pretty much irrelevant in this decision AFAICT. The only important unknown here that may warrant keeping that code is the existence of a large set of current Squid users that are going to be inconvenienced (and that are actively supporting the Squid Project or are otherwise considered important for the Project). |
Co-Authored-By: Alex Rousskov <rousskov@measurement-factory.com>
Co-Authored-By: Alex Rousskov <rousskov@measurement-factory.com>
src/anyp/Uri.cc
Outdated
| if (!alphanum[*nid.end()]) | ||
| throw TextException("NID suffix is not alphanumeric", Here()); | ||
|
|
||
| debugs(23, 3, "Split URI into proto='urn', nid='" << nid << "', path='" << Raw("tok",tok.remaining().rawContent(),tok.remaining().length()) << "'"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first Raw() parameter names the thing you are printing. In this case, it is "path".
Also, we should not combine manual decorations with Raw decorations. As you probably know, I recommend removing single quotes because cache.log has enough delimiters to find the end of the path (which might contain single quotes itself). We should be relying on record separation mechanisms to separate records, not manual (and, hence, inconsistent and often insufficient) decorations.
| debugs(23, 3, "Split URI into proto='urn', nid='" << nid << "', path='" << Raw("tok",tok.remaining().rawContent(),tok.remaining().length()) << "'"); | |
| debugs(23, 3, "Split URI into proto='urn', nid='" << nid << "', " << Raw("path",tok.remaining().rawContent(),tok.remaining().length())); |
If you insist on having those quotes, add the corresponding decoration method/logic to Raw via Raw::quote().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
src/anyp/Uri.cc
Outdated
| if (!alphanum[*nid.end()]) | ||
| throw TextException("NID suffix is not alphanumeric", Here()); | ||
|
|
||
| debugs(23, 3, "Split URI into proto='urn', nid='" << nid << "', path='" << Raw("tok",tok.remaining().rawContent(),tok.remaining().length()) << "'"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To simplify the Raw() call parameters, consider moving this debugs() lower, after setting various fields, including path. I do not insist on this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found one bug and tried to respond to other recent code changes in hope to shorten the review cycles.
GitHub split my review into several separate change requests again, unfortunately, so please look around for more/isolated contemporary change requests. The *end bug should still be in this review though.
src/tests/testHttpRequest.cc
Outdated
| const MasterXaction::Pointer mx = new MasterXaction(XactionInitiator::initClient); | ||
| HttpRequest *aRequest = HttpRequest::FromUrl(url, mx); | ||
| auto aRequest = HttpRequest::FromUrlXXX(url, mx); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GitHub is having a hard time grouping review comments for this old PR with many force-pushes so I am adding this comment to reduce the chance of this older change request getting lost in the noise.
src/anyp/Uri.cc
Outdated
| return true; | ||
|
|
||
| } catch (...) { | ||
| debugs(23, 2, "error: " << CurrentException << Raw("rawUrl", rawUrl.rawContent(), rawUrl.length()).minLevel(DBG_DATA)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check whether minLevel() is needed here. Long Raw input will not be printed at lower debugging levels by default anyway, and it is probably a good idea to show short URIs for error messages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
With the requested change to use SBuf url variable these lines go from FromUrlXXX back to FromUrl and are no longer being touched by this PR. So the change to auto is now out of scope style polish. |
One must not dereference the end() iterator. Co-Authored-By: Alex Rousskov <rousskov@measurement-factory.com>
Thank you for addressing my concerns.
Initial replacement of URI/URL parse method internals with SBuf and Tokenizer based parse. For now this parsing only handles the scheme section of URL. With this we add the missing check for alpha character as first in the scheme name for unknown schemes and prohibit URL without any scheme (previously accepted). Also polishes the documentation, URN and asterisk-form URI parsing. Also, adds validation of URN NID portion characters to ensure valid authority host names are generated for THTTP lookup URLs.
|
@yadij, 6c880a1 commit broke master build: The patch below fixes the problem: I wonder why Jenkins build tests missed this problem. Probably 'test-builds.sh' does not include '--enable-ecap' configuration, however, I am not sure. |
|
It should have been caught by the default builds on nodes with the library installed. Anyway, would you like to do a PR for your patch? |
|
Fixed in PR473. |
Initial replacement of URI/URL parse method internals with SBuf and Tokenizer based parse. For now this parsing only handles the scheme section of URL. With this we add the missing check for alpha character as first in the scheme name for unknown schemes and prohibit URL without any scheme (previously accepted). Also polishes the documentation, URN and asterisk-form URI parsing. Also, adds validation of URN NID portion characters to ensure valid authority host names are generated for THTTP lookup URLs.
Initial replacement of URI/URL parse method internals with SBuf and Tokenizer based parse. For now this parsing only handles the scheme section of URL. With this we add the missing check for alpha character as first in the scheme name for unknown schemes and prohibit URL without any scheme (previously accepted). Also polishes the documentation, URN and asterisk-form URI parsing. Also, adds validation of URN NID portion characters to ensure valid authority host names are generated for THTTP lookup URLs.
Initial replacement of URI/URL parse method internals with
SBuf and Tokenizer based parse.
For now this parsing only handles the scheme section of
URL. With this we add the missing check for alpha character
as first in the scheme name for unknown schemes and
prohibit URL without any scheme (previously accepted).
Also polishes the documentation, URN and asterisk-form
URI parsing.
Also, adds validation of URN NID portion characters to
ensure valid authority host names are generated for
THTTP lookup URLs.