-
-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generic uri parsing and fixing trailing slash issue #2392
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks very promising. There are a few code style issues. See https://github.com/k9mail/k-9/wiki/CodeStyle
class BitcoinUriParser implements UriParser { | ||
private static final Pattern BITCOIN_URI_PATTERN = | ||
Pattern.compile("bitcoin:[1-9a-km-zA-HJ-NP-Z]{27,34}(\\?[a-zA-Z0-9$\\-_.+!*'(),%:@&=]*)?", | ||
Pattern.CASE_INSENSITIVE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why case-insensitive?
return startPos; | ||
} | ||
|
||
String linkifiedUri = String.format("<a href=\"%1$s\">%1$s</a>", matcher.group()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The following avoids having to parse the format string and creating a temporary string.
String bitcoinUri = matcher.group();
outputBuffer.append("<a href=\"")
.append(bitcoinUri)
.append("\">")
.append(bitcoinUri)
.append("</a>");
Come to think of it. We also need to encode at least &
in the href
attribute.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be job of the UriParser or the converter?
SUPPORTED_URIS.put("bitcoin:", new BitcoinUriParser()); | ||
SUPPORTED_URIS.put("http:", new HttpUriParser()); | ||
SUPPORTED_URIS.put("https:", new HttpUriParser()); | ||
SUPPORTED_URIS.put("rtsp:", new HttpUriParser()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needlessly creates three HttpUriParser
instances.
@@ -431,23 +393,35 @@ protected static String getQuoteColor(final int level) { | |||
* @param outputBuffer Buffer to append linked text to. | |||
*/ | |||
protected static void linkifyText(final String text, final StringBuffer outputBuffer) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please extract this to a separate class to simplify testing.
while (matcher.find(currentPos)) { | ||
int startPos = matcher.start(); | ||
|
||
// Append all text in between |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We try to avoid comments because they're easily out of date when the code is changed but not the comments. It also encourages writing more readable code.
To make this more readable you could change it to
String textBeforeMatch = text.substring(currentPos, startPos);
outputBuffer.append(textBeforeMatch);
* | ||
* @return Position of first character after @ sign. | ||
*/ | ||
private int matchUserInfoIfAvailable(String text, int startPos, int authorityEnd) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a public API. We don't need JavaDoc for internal methods.
* @param outputBuffer Buffer where linkified variant of uri is written to. | ||
* @return Index where parsed uri ends (first non-uri letter). Should be startPos or smaller if no valid uri was found. | ||
*/ | ||
int linkifyUri(final String text, int startPos, final StringBuffer outputBuffer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no need for final
in interfaces.
@@ -19,7 +19,8 @@ | |||
@Config(manifest = Config.NONE) | |||
public class HtmlConverterTest { | |||
// Useful if you want to write stuff to a file for debugging in a browser. | |||
private static final boolean WRITE_TO_FILE = Boolean.parseBoolean(System.getProperty("k9.htmlConverterTest.writeToFile", "false")); | |||
private static final boolean WRITE_TO_FILE = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please revert the unrelated changes in this file.
@@ -207,4 +216,33 @@ public void testLinkifyBitcoinAndHttpUri() { | |||
"http://example.com/" + | |||
"</a>", outputBuffer.toString()); | |||
} | |||
|
|||
@Test | |||
public void testHttpUris() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please only one test per test method.
The current code will probably also linkify the http URI in a string like |
Sorry for the cody style issues, I imported the settings.jar as described. All mentioned points should be fixed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 good job!
16f4187
to
cf9c3d0
Compare
I squashed all commits and cleaned up the code a bit hoping the feature was ready to merge. However, While trying to fix this I noticed that Fixing the detection of valid http URLs surrounded by text is still something that needs to be done. |
Thanks for taking the time. I dropped the IDN detection and allowing only simple domain names now. I also changed the parsing to a more greedy approach, now successfully detecting your example. |
@@ -18,6 +16,8 @@ | |||
class HttpUriParser implements UriParser { | |||
// This string represent character group sub-delim as described in RFC 3986 | |||
private static final String SUB_DELIM = "!$&'()*+,;="; | |||
private static final Pattern DOMAIN_PATTERN = | |||
Pattern.compile("\\w([\\w-]*\\w)*(\\.\\w([\\w-]*\\w)*)*(:(\\d{0,5}))?"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately \w
also includes the underscore which is not valid in host names.
You can use non-capturing groups by having ?:
as first characters inside the parentheses. That'll make it easier to later get to the content you do want to capture.
if (!tryMatchDomainName(text, currentPos, authorityEnd) && | ||
!tryMatchIpv4Address(text, currentPos, authorityEnd, true) && | ||
!tryMatchIpv6Address(text, currentPos, authorityEnd)) { | ||
int matchedAuthorityEnd = Math.max(tryMatchDomainName(text, currentPos), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The whole Math.max()
business makes this super hard to read. There's also no need to attempt another match if one of the methods was successful. So I suggest to extract all of this to a separate method and then to check the return value after each call to tryMatch*()
and return early if a match was found.
int userInfoEnd = text.indexOf('@', startPos); | ||
if (userInfoEnd != -1 && userInfoEnd < authorityEnd) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
authorityEnd
is still a useful upper bound that can be used to avoid useless work.
Thanks for the hint with |
Awesome. Thanks a lot! |
I wrote a structure for uri parsing to solve #2265 in more elegant way. To supported a new uri type, a class implementing the UriParser interface has to be created. To be applied during HTML generation, this class needs to be "registered" in HtmlConverter together with all matching uri schemes.
To fix #1223, I rewrote the parsing of http uris including support for IPv6 addresses. I created a couple tests, but of course I maybe could have missed something. So do not hesitate to comment if you notice something.