Skip to content
This repository was archived by the owner on Nov 1, 2022. It is now read-only.
This repository was archived by the owner on Nov 1, 2022. It is now read-only.

Simplify implementation of URLStringUtils.isURLLike #5376

@hawkinsw

Description

@hawkinsw

Hello!

First, let me say thank you for taking the time and effort to build such a comprehensive function for detecting whether a string is a valid URL. The Java SDK versions are so restrictive that they are unusable. What you provide is really, really important to have.

In Fenix, the URLStringUtils.isURLLike function is used to determine whether or not what the user's input to the Awesome Bar is a valid URL. That classification determines whether to load a page with that input as a URL or start a query with that input as the search terms. In other words, the application blocks while waiting for that decision to be made.

Unfortunately, due to the robustness of the regular expression underlying the URLStringUtils.isURLLike function, the first classification takes a very, very long time. According to our profiles, it takes almost 500ms (on a modern smartphone) to compile the regular expression. For context, that is 2-3x longer than it takes to instantiate and initialize Gecko on the same hardware. Of course, each comparison using the compiled regular expression is blindingly fast.

It would be ideal to maintain the functionality provided by your robust classification but the Fenix UI cannot be stalled for that long. It is very perceptible to the user.

We experimented with precompiling the regular expression on a background thread prior to the first call to URLStringUtils.isURLLike. This worked and users were happy! However, there was concern among developers that using such a trick made the code less maintainable.

So it is that I come to you asking for your help to improve the performance of URLStringUtils.isURLLike.

I certainly don't mean to assume that I can do your job, but I thought through a few options for doing this:

  1. Looking through the history, I noticed that earlier this summer AC migrated away from a more-straightforward regular expression for classifying input as valid URLs to one that matches Fennec's implementation. AC could reincorporate that regular expression and expose its capabilities in a new function named, say, URLStringUtils.isURLSimple. AC users could choose which one to use based on their needs -- those wanting an exhausting classification could you use URLStringUtils.isURLLike (aware of the performance penalty for the first classification) and and those wanting a fast classification could use URLStringUtils.isURLSimple (aware of the possibility of mismatching certain inputs)

  2. AC could abandon regular expression-based classification and use an algorithmic approach like the one taken in https://github.com/jsdom/whatwg-url.

  3. Fantastical approach: AC could precompile the regular expression into bytecode at build time, use that to comprise a class specifically for matching URLs and use it to implement the classification. Such a technique was attempted here https://www.gnu.org/software/kawa/gnu.bytecode/compiling-regexps.html. Obviously that would "never work", but wouldn't it be cool if it did? :-)

Again, thank you very much for making such a robust implementation of functionality for detecting whether a URL is valid. We do not want to lose that functionality. However, users will not be denied :-) Please let me know how I can help with the optimization -- as you can tell, I have put some thought into different options and would love to be involved. Unfortunately this is a rather time-sensitive request given Fenix's timeline. If you could prioritize this, we would really appreciate it!

Thanks again,
Will

┆Issue is synchronized with this Jira Task

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions