-
Notifications
You must be signed in to change notification settings - Fork 557
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improved base tag handling #232
Improved base tag handling #232
Conversation
…ssue with root-relative values on Linux.
} | ||
|
||
[TestMethod] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please do not delete ANY tests and make your test follow the same naming convention. Even if it's a little more ceremony.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To my knowledge, the consolidated test did follow the naming convention. The DisplayName is supposed to just be the name of the individual outcome of that test, but I'm not familiar with the testing library being used here so I could be wrong.
Abot2/Core/HyperLinkParser.cs
Outdated
{ | ||
// ignored | ||
} | ||
baseHref = uriToUse.Scheme + ":" + baseHref; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the crawledPage.Uri.Scheme is "more" correct here as we can't change the scheme of the crawl because the responding server returned a new scheme. It believe it would cause some confusion under that scenario.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would disagree. Since the responding server is the responding server, it should dictate how you're interpreting the base tag - just like your browser would if you were to visit the page. It's also a weird inconsistency, in that if you don't have a base tag, any relative links will be using the URI of the responding server. But if you then happen to have a base tag with a deferred scheme, the scheme is now all of a sudden decided by something else.
Either way, as the change doesn't relate directly to any of the 2 issues that this PR aims to resolve, I'll remove the change.
Would you be willing to refactor your changes to the Test class please? I understand semantically what you are trying to achieve with some of your changes but the diff makes it hard to quickly understand the impact of your changes to the tests. It's more important to have consistency in the code base and the keep the pr be as MINIMAL as possible. Also please do NOT delete/alter any tests that aren't necessary as this removes confidence that the change isn't causing issues with original set of tests. |
@@ -282,7 +282,7 @@ public void GetLinks_RelativeBaseTagPresent_ReturnsRelativeLinksPageUri() | |||
[TestMethod] | |||
public void GetLinks_InvalidBaseTagPresent_ReturnsRelativeLinksPageUri() | |||
{ | |||
_crawledPage.Content.Text = "<base href=\"http:http://http:\"><a href=\"http://aaa.com/\" ></a><a href=\"/aaa/a.html\" /></a>"; | |||
_crawledPage.Content.Text = "<base href=\"////images/\"><a href=\"http://aaa.com/\"></a><a href=\"/aaa/a.html\"></a><a href=\"aaa/a.html\"></a>"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically, the value of the base tag didn't have to be updated here, if the goal simply is to have a passing test. Leaving the base tag value as it was, does however mean that it's no longer testing what it was previously - which is, "what happens if an URI instance can't be created from it".
PR accepted, appreciate the contribution |
Takes care of issue #230 and #231.
This development aims to improve upon the current handling of base tags within the
HyperLinkParser
. More specifically, this introduces support for relative base tags in addition to fixing a bug relating to root-relative values when running on Linux. For more information, check the linked issues above.The testing of this functionality have also been expanded upon to cover more ground. To that end, all of the base tag tests have been consolidated into one data driven test with multiple outcomes, as the tests are otherwise completely the same. All current test cases are still present, but the input for the invalid base tag case had to be adjusted. The old value of
http:http://http:
technically is a valid relative value according to the defined standard and as such the new logic would succeed in creating aUri
instance from it since relative values are now supported. However, the overall goal of the test case (ensuring that invalid values won't break the parsing) remains intact.