Conversation
…icant performance gain)
deleted debug println.
|
A bit confused: in a quick look at PublicSuffixes2, it seems it's still building a big regex string and then Pattern in order to do the key operations. Is that the case? I would have thought those the main memory-consumers. Where does this get its memory savings, and what's the magnitude of the savings? Separate comments:
|
|
sorry, probably pull request description is misleading. javadoc comment in PublicSuffixes may be too short. Yes, it still uses the regular expression as old PublicSuffixes did. It was the fastest path to address the problem I found (described in https://webarchive.jira.com/browse/HER-1965). I added a comment to HER-1965 comparing regular expressions generated by old and new PublicSuffixes. In short, old regular expression has 14,197 (?: )'s, and new regex has 1,386. This results in ~90% smaller Matcher object, and apparently faster matching operation (not a rigorous benchmark, but I saw ~4x improvement). Also pattern generation must be taking less time and memory, but such one-time saving is not a big deal. It may be possible to implement even more efficient PublicSuffixes leveraging this radix tree approach, but I'm wondering how much effort would be necessary to beat the Java's (supposedly) well-optimized regular expression implementation. For use of Google Guava library, we've just found a case against it recently: https://webarchive.jira.com/browse/HER-2004 new PublicSuffixes has my name at the bottom of class-level javadoc comment. should it be in different format ("handle"?) |
reimplementation of PublicSuffixes with radix tree.