Simplify implementation of URLStringUtils.isURLLike

Hello!

First, let me say thank you for taking the time and effort to build such a comprehensive function for detecting whether a string is a valid URL. The Java SDK versions are so restrictive that they are unusable. What you provide is really, really important to have.

In Fenix, the `URLStringUtils.isURLLike` function is used to determine whether or not what the user's input to the Awesome Bar is a valid URL. That classification determines whether to load a page with that input as a URL or start a query with that input as the search terms. In other words, the application blocks while waiting for that decision to be made.

Unfortunately, due to the robustness of the regular expression underlying the `URLStringUtils.isURLLike` function, the first classification takes a very, very long time. According to our profiles, it takes almost 500ms (on a modern smartphone) to compile the regular expression. For context, that is 2-3x longer than it takes to instantiate and initialize Gecko on the same hardware. Of course, each comparison using the compiled regular expression is blindingly fast.

It would be ideal to maintain the functionality provided by your robust classification but the Fenix UI  cannot be stalled for that long. It is very perceptible to the user.

We experimented with precompiling the regular expression on a background thread prior to the first call to `URLStringUtils.isURLLike`. This worked and users were happy! However, there was concern among developers that using such a trick made the code less maintainable. 

So it is that I come to you asking for your help to improve the performance of `URLStringUtils.isURLLike`. 

I certainly don't mean to assume that I can do your job, but I thought through a few options for doing this:

1. Looking through the history, I noticed that earlier this summer AC migrated away from a more-straightforward regular expression for classifying input as valid URLs to one that matches Fennec's implementation. AC could reincorporate that regular expression and expose its capabilities in a new function named, say, `URLStringUtils.isURLSimple`. AC users could choose which one to use based on their needs -- those wanting an exhausting classification could you use `URLStringUtils.isURLLike` (aware of the performance penalty for the first classification) and and those wanting a fast classification could use `URLStringUtils.isURLSimple` (aware of the possibility of mismatching certain inputs)

2. AC could abandon regular expression-based classification and use an algorithmic approach like the one taken in https://github.com/jsdom/whatwg-url. 

3. Fantastical approach: AC could precompile the regular expression into bytecode at build time, use that to comprise a class specifically for matching URLs and use it to implement the classification. Such a technique was attempted here https://www.gnu.org/software/kawa/gnu.bytecode/compiling-regexps.html. Obviously that would "never work", but wouldn't it be cool if it did? :-)

Again, thank you very much for making such a robust implementation of functionality for detecting whether a URL is valid. We do not want to lose that functionality. However, users will not be denied :-) Please let me know how I can help with the optimization -- as you can tell, I have put some thought into different options and would love to be involved. Unfortunately this is a rather time-sensitive request given Fenix's timeline. If you could prioritize this, we would really appreciate it!

Thanks again,
Will

┆Issue is synchronized with this [Jira Task](https://mozilla-hub.atlassian.net/browse/FNXV2-4645)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Simplify implementation of URLStringUtils.isURLLike #5376

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Simplify implementation of URLStringUtils.isURLLike #5376

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions