Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update URI parser to use SBuf parsing APIs #275

Closed
wants to merge 39 commits into from

Conversation

yadij
Copy link
Contributor

@yadij yadij commented Aug 12, 2018

Initial replacement of URI/URL parse method internals with
SBuf and Tokenizer based parse.

For now this parsing only handles the scheme section of
URL. With this we add the missing check for alpha character
as first in the scheme name for unknown schemes and
prohibit URL without any scheme (previously accepted).

Also polishes the documentation, URN and asterisk-form
URI parsing.

Also, adds validation of URN NID portion characters to
ensure valid authority host names are generated for
THTTP lookup URLs.

@rousskov rousskov self-requested a review August 12, 2018 17:09
@yadij yadij changed the title WIP: update URI parser Update URI parser Apr 1, 2019
@yadij yadij added S-waiting-for-reviewer ready for review: Set this when requesting a (re)review using GitHub PR Reviewers box S-waiting-for-more-reviewers needs a reviewer and/or a second opinion and removed S-waiting-for-reviewer ready for review: Set this when requesting a (re)review using GitHub PR Reviewers box labels Apr 1, 2019
@squid-anubis squid-anubis added M-failed-other https://github.com/measurement-factory/anubis#pull-request-labels and removed M-failed-other https://github.com/measurement-factory/anubis#pull-request-labels labels Apr 4, 2019
@squid-anubis squid-anubis added M-failed-other https://github.com/measurement-factory/anubis#pull-request-labels and removed M-failed-other https://github.com/measurement-factory/anubis#pull-request-labels labels May 2, 2019
Copy link
Contributor

@rousskov rousskov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe some new APIs should be adjusted. The rest is polishing.

I also suggest changing this PR title to Update URI parser to use modern parsing APIs.

BTW, have you considered completely dropping URN support?

src/anyp/Uri.cc Outdated Show resolved Hide resolved
src/anyp/Uri.cc Outdated Show resolved Hide resolved
src/anyp/Uri.cc Outdated Show resolved Hide resolved
src/anyp/Uri.cc Show resolved Hide resolved
src/anyp/Uri.cc Outdated Show resolved Hide resolved
src/anyp/UriScheme.cc Outdated Show resolved Hide resolved
src/anyp/UriScheme.cc Show resolved Hide resolved
src/anyp/UriScheme.h Outdated Show resolved Hide resolved
src/HttpRequest.cc Outdated Show resolved Hide resolved
src/HttpRequest.cc Outdated Show resolved Hide resolved
@yadij yadij changed the title Update URI parser Update URI parser to use SBuf parsing APIs Jun 13, 2019
@yadij
Copy link
Contributor Author

yadij commented Jun 13, 2019

BTW, have you considered completely dropping URN support?

I have, and am undecided on that proposal. I know some places use urn:, but not necessarily via Squid. There are some things we might use it for internally (eg Store-ID standard IDs or direct pointers to cached variants) - but it may be beneficial to have a mapping of those in cache managers under http(s):// (making it more directly public, hmm).

@yadij yadij added S-waiting-for-reviewer ready for review: Set this when requesting a (re)review using GitHub PR Reviewers box and removed S-waiting-for-more-reviewers needs a reviewer and/or a second opinion labels Jun 29, 2019
@yadij yadij requested a review from rousskov June 29, 2019 10:46
Copy link
Contributor

@rousskov rousskov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have two serious concerns (one old one and one related to passing c-strings images of well-known schemes to the UriScheme constructor). IIRC, the rest is polishing.

src/HttpRequest.cc Outdated Show resolved Hide resolved
src/anyp/Uri.cc Outdated Show resolved Hide resolved
src/anyp/Uri.cc Outdated Show resolved Hide resolved
src/anyp/Uri.cc Outdated Show resolved Hide resolved
src/anyp/Uri.cc Show resolved Hide resolved
src/anyp/UriScheme.cc Outdated Show resolved Hide resolved
src/anyp/UriScheme.cc Show resolved Hide resolved
src/base/CharacterSet.h Outdated Show resolved Hide resolved
src/base/CharacterSet.h Outdated Show resolved Hide resolved
src/anyp/Uri.cc Outdated Show resolved Hide resolved
@rousskov rousskov added S-waiting-for-author author action is expected (and usually required) and removed S-waiting-for-reviewer ready for review: Set this when requesting a (re)review using GitHub PR Reviewers box labels Jul 8, 2019
@rousskov
Copy link
Contributor

rousskov commented Jul 8, 2019

I have [considered removing URN support], and am undecided on that proposal. I know some places use urn:, but not necessarily via Squid. There are some things we might use it for internally (eg Store-ID standard IDs or direct pointers to cached variants) - but it may be beneficial to have a mapping of those in cache managers under http(s):// (making it more directly public, hmm).

Removing URN support does not mean we can never add that support back. Removing URN support simply means that we do not have the resources to maintain/improve that barely (if at all) used and very low-quality experimental code right now. Thus, future potential uses are pretty much irrelevant in this decision AFAICT.

The only important unknown here that may warrant keeping that code is the existence of a large set of current Squid users that are going to be inconvenienced (and that are actively supporting the Squid Project or are otherwise considered important for the Project).

@squid-anubis squid-anubis added M-failed-other https://github.com/measurement-factory/anubis#pull-request-labels and removed M-failed-other https://github.com/measurement-factory/anubis#pull-request-labels labels Jul 18, 2019
yadij and others added 4 commits September 9, 2019 15:47
Co-Authored-By: Alex Rousskov <rousskov@measurement-factory.com>
Co-Authored-By: Alex Rousskov <rousskov@measurement-factory.com>
src/anyp/Uri.cc Outdated
if (!alphanum[*nid.end()])
throw TextException("NID suffix is not alphanumeric", Here());

debugs(23, 3, "Split URI into proto='urn', nid='" << nid << "', path='" << Raw("tok",tok.remaining().rawContent(),tok.remaining().length()) << "'");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first Raw() parameter names the thing you are printing. In this case, it is "path".

Also, we should not combine manual decorations with Raw decorations. As you probably know, I recommend removing single quotes because cache.log has enough delimiters to find the end of the path (which might contain single quotes itself). We should be relying on record separation mechanisms to separate records, not manual (and, hence, inconsistent and often insufficient) decorations.

Suggested change
debugs(23, 3, "Split URI into proto='urn', nid='" << nid << "', path='" << Raw("tok",tok.remaining().rawContent(),tok.remaining().length()) << "'");
debugs(23, 3, "Split URI into proto='urn', nid='" << nid << "', " << Raw("path",tok.remaining().rawContent(),tok.remaining().length()));

If you insist on having those quotes, add the corresponding decoration method/logic to Raw via Raw::quote().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

src/anyp/Uri.cc Outdated
if (!alphanum[*nid.end()])
throw TextException("NID suffix is not alphanumeric", Here());

debugs(23, 3, "Split URI into proto='urn', nid='" << nid << "', path='" << Raw("tok",tok.remaining().rawContent(),tok.remaining().length()) << "'");
Copy link
Contributor

@rousskov rousskov Sep 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To simplify the Raw() call parameters, consider moving this debugs() lower, after setting various fields, including path. I do not insist on this change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

src/anyp/Uri.cc Outdated Show resolved Hide resolved
Copy link
Contributor

@rousskov rousskov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found one bug and tried to respond to other recent code changes in hope to shorten the review cycles.

GitHub split my review into several separate change requests again, unfortunately, so please look around for more/isolated contemporary change requests. The *end bug should still be in this review though.

src/anyp/Uri.cc Outdated Show resolved Hide resolved
const MasterXaction::Pointer mx = new MasterXaction(XactionInitiator::initClient);
HttpRequest *aRequest = HttpRequest::FromUrl(url, mx);
auto aRequest = HttpRequest::FromUrlXXX(url, mx);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GitHub is having a hard time grouping review comments for this old PR with many force-pushes so I am adding this comment to reduce the chance of this older change request getting lost in the noise.

src/anyp/Uri.cc Outdated
return true;

} catch (...) {
debugs(23, 2, "error: " << CurrentException << Raw("rawUrl", rawUrl.rawContent(), rawUrl.length()).minLevel(DBG_DATA));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check whether minLevel() is needed here. Long Raw input will not be printed at lower debugging levels by default anyway, and it is probably a good idea to show short URIs for error messages.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@yadij
Copy link
Contributor Author

yadij commented Sep 10, 2019

  • HttpRequest *aRequest = HttpRequest::FromUrl(url, mx);
  • auto aRequest = HttpRequest::FromUrlXXX(url, mx);

GitHub is having a hard time grouping review comments for this old PR with many force-pushes so I am adding this comment to reduce the chance of this older change request getting lost in the noise.

With the requested change to use SBuf url variable these lines go from FromUrlXXX back to FromUrl and are no longer being touched by this PR. So the change to auto is now out of scope style polish.

yadij and others added 3 commits September 10, 2019 15:02
One must not dereference the end() iterator.

Co-Authored-By: Alex Rousskov <rousskov@measurement-factory.com>
@yadij yadij added S-waiting-for-reviewer ready for review: Set this when requesting a (re)review using GitHub PR Reviewers box and removed S-waiting-for-author author action is expected (and usually required) labels Sep 10, 2019
@rousskov rousskov dismissed their stale review September 10, 2019 13:30

Thank you for addressing my concerns.

@rousskov rousskov added M-cleared-for-merge https://github.com/measurement-factory/anubis#pull-request-labels and removed S-waiting-for-reviewer ready for review: Set this when requesting a (re)review using GitHub PR Reviewers box labels Sep 10, 2019
squid-anubis pushed a commit that referenced this pull request Sep 10, 2019
Initial replacement of URI/URL parse method internals with
SBuf and Tokenizer based parse.

For now this parsing only handles the scheme section of
URL. With this we add the missing check for alpha character
as first in the scheme name for unknown schemes and
prohibit URL without any scheme (previously accepted).

Also polishes the documentation, URN and asterisk-form
URI parsing.

Also, adds validation of URN NID portion characters to
ensure valid authority host names are generated for
THTTP lookup URLs.
@squid-anubis squid-anubis added the M-waiting-staging-checks https://github.com/measurement-factory/anubis#pull-request-labels label Sep 10, 2019
@squid-anubis squid-anubis added M-merged https://github.com/measurement-factory/anubis#pull-request-labels and removed M-cleared-for-merge https://github.com/measurement-factory/anubis#pull-request-labels M-waiting-staging-checks https://github.com/measurement-factory/anubis#pull-request-labels labels Sep 11, 2019
@eduard-bagdasaryan
Copy link
Contributor

@yadij, 6c880a1 commit broke master build:

MessageRep.cc: In member function ‘virtual void Adaptation::Ecap::RequestLineRep::uri(const Area&)’:
./src/adaptation/ecap/MessageRep.cc|207 col 84| error: no matching function for call to ‘AnyP::Uri::parse(HttpRequestMethod&, const char*)’
      const auto ok = theMessage.url.parse(theMessage.method, aUri.toString().c_str());
                                                                                     ^
./src/adaptation/ecap/../../../src/anyp/Uri.h|62 col 10| note: candidate: bool AnyP::Uri::parse(const HttpRequestMethod&, const SBuf&)
      bool parse(const HttpRequestMethod &, const SBuf &url);

The patch below fixes the problem:

diff --git a/src/adaptation/ecap/MessageRep.cc b/src/adaptation/ecap/MessageRep.cc
index 6fc9aa2ed1..946f2081cc 100644
--- a/src/adaptation/ecap/MessageRep.cc
+++ b/src/adaptation/ecap/MessageRep.cc
@@ -204,7 +204,7 @@ Adaptation::Ecap::RequestLineRep::uri(const Area &aUri)
 {
     // TODO: if method is not set, AnyP::Uri::parse will assume it is not connect;
     // Can we change AnyP::Uri::parse API to remove the method parameter?
-    const auto ok = theMessage.url.parse(theMessage.method, aUri.toString().c_str());
+    const auto ok = theMessage.url.parse(theMessage.method, SBuf(aUri.toString()));
     Must(ok);
 }

I wonder why Jenkins build tests missed this problem. Probably 'test-builds.sh' does not include '--enable-ecap' configuration, however, I am not sure.

@yadij
Copy link
Contributor Author

yadij commented Sep 13, 2019

It should have been caught by the default builds on nodes with the library installed.

Anyway, would you like to do a PR for your patch?

@eduard-bagdasaryan
Copy link
Contributor

Fixed in PR473.

squidadm pushed a commit to squidadm/squid that referenced this pull request Oct 16, 2019
Initial replacement of URI/URL parse method internals with
SBuf and Tokenizer based parse.

For now this parsing only handles the scheme section of
URL. With this we add the missing check for alpha character
as first in the scheme name for unknown schemes and
prohibit URL without any scheme (previously accepted).

Also polishes the documentation, URN and asterisk-form
URI parsing.

Also, adds validation of URN NID portion characters to
ensure valid authority host names are generated for
THTTP lookup URLs.
yadij added a commit that referenced this pull request Oct 18, 2019
Initial replacement of URI/URL parse method internals with
SBuf and Tokenizer based parse.

For now this parsing only handles the scheme section of
URL. With this we add the missing check for alpha character
as first in the scheme name for unknown schemes and
prohibit URL without any scheme (previously accepted).

Also polishes the documentation, URN and asterisk-form
URI parsing.

Also, adds validation of URN NID portion characters to
ensure valid authority host names are generated for
THTTP lookup URLs.
@yadij yadij deleted the v5-uri-parser branch August 16, 2020 08:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
M-merged https://github.com/measurement-factory/anubis#pull-request-labels
Projects
None yet
4 participants