Skip to content

Vimium Omnibox implementation, with fuzzy completion #459

Merged
merged 78 commits into from Apr 26, 2012

5 participants

@niklasb
niklasb commented Jan 23, 2012

As seen in #442, this pull requests suggests a sophisticated new UI and infrastructure that allows for efficient, keyboard-controlled navigation of history items, bookmarks, tabs and for user-defined commands like web search, URL input and accessing different search engines.

Parts of the changes included in this request could be applied on their own (for example the changes to lib/utils.js), but this wouldn't be of a significant advantage within the context of the existing Vimium code base. The approach I take is therefore to suggest all the changes as a whole and explain how they are structured.

Motivation

I really like Vimium in its current state, but what I always missed about it was a really efficient way to navigate away from a page. The Omnibox is quite good, but I find it's a pain to use it with Vimium, as pressing Ctrl+L causes the current page to lose focus, which defeats almost all of Vimium's bindings. Returning focus to the page is near impossible without using the mouse. Also, the Omnibox is not as configurable as I like it to be. It doesn't do fuzzy matching (which I love and use a lot inside vim), doesn't help in searching lost tabs and adding search engines is a bit of a pain. One more reason is that the location bar takes away vertical space, which I have better use for on my laptop screen.

Changes to the existing library code

  • background_page.html: added new background API elements to access bookmarks, history and tabs from content scripts
  • lib/utils.js: refactored the existing URL detection to be a bit more useful. Also added some helpers I needed that I think are general enough to be included here
  • lib/keyboardUtils.js: Handle uppercase input (don't know why this wasn't the case before, maybe it's not the right thing to add this here)

Logic for different kinds of completion

The underlying model is a simple one. A completion driver is something that gets input and returns a list of suggestions together with a rendering routing to transform them into HTML and with a relevancy value. There are 5 different drivers already implemented:

  • smart: A driver that takes an input string and tries to make sense of it. It tries to detect whether the input is

    • a user-defined command (wiki Obama) => suggests to evaluate the command
    • an URL => suggests to visit it
    • nothing special at all => suggests a basic web search
  • bookmarks, history, tabs: they fuzzy match the different kinds of items

  • merging completer: a driver that can be used to combine and sort the results of multiple driver instances

The library code includes a comprehensive class to manage fuzzy string matching which implements filtering and ranking, together with some basic optimization like result caching.

I think this approach is very flexible and can be extended for other use cases as well.

User interface

The user interface is a simple HTML box which takes input and presents the results returned by the underlying completion driver. It is designed to be easily readable, with a rather large font and dark background for maximum contrast (although the styling is only a matter of CSS, of course):

Example integration

As a suggestion, three Vimium commands are implemented to demonstrate the usefulness of this addition:

  • Generic complete (bound to o => open in current tab, O => open in new tab). Fuzzy-matches history, bookmarks, tabs and uses the smart completer to suggest sensible actions depending on input
  • Tab complete (bound to T). Fuzzy-matches open tabs in the current window for people who never close tabs and therefore got lots of them open at the same time (like me)

Maintenance

I am aware that this pull request, with its 700 LoC+, is pretty large, compared to the existing combination of completionDialog.js and bookmarks.js, which together have under 300 lines. However, my approach has the advantage of unifying different kinds of dialogs, which makes it easy to drop, add or recombine functionality in the future.

I am also aware that the maintainers of this project keep a focus on efficiency and maintainability. I did my best to improve usability and responsiveness of the UI and to add documentation for the significant pieces of code, but I am convinced that my code in the current state leaves room for improvement. If this gets a chance, I will put some effort into code reduction. I am also willing to support and improve this facility in the future, if that would be of help. Hacking Vimium is really fun :)

@philc
Owner
philc commented Jan 23, 2012

Wow, this is epic. Nice start Niklas. It will take some time for the team to review it, and in doing so we'll evaluate

  1. whether this UX fits with Vimium
  2. the design of the system
  3. the code

I'll begin looking at your branch in detail in the next couple of days. @int3, if you have time to do the same and come up with an initial round of comments, that would be really helpful.

@niklasb
niklasb commented Jan 23, 2012

@philc: Thank you, I am absolutely willing to work together with you guys to make this a good fit for Vimium.

@int3
Collaborator
int3 commented Jan 24, 2012

I'm a little busier now since school's started again, but yeah, I'll find some time to help review this. Thanks @niklasb for the awesome work!

@ilya
Collaborator
ilya commented Jan 24, 2012

Wow. Epic indeed. I will try this out.

@int3
Collaborator
int3 commented Jan 25, 2012

I haven't had time to really look through the code yet, but I noticed that I wasn't getting some history results that I ought to. Skimming over your code, it looks like you limit the number of history entries that can be transmitted from the background page to the content script, and then you calculate the relevancy after that. Hence some highly-relevant results could get discarded. Indeed, when I increase maxResults to a very large number, I obtained the result I was looking for.

I think we could solve this problem by calculating relevancy in the background page first, and sending the X most relevant results to the content scripts.

@niklasb
niklasb commented Jan 25, 2012

@int3: That is indeed a problem I don't know how to solve yet. The relevancy calculation is executed in the content script on a pre-cached set of history items. This is because transferring to and from the background page seems to be quite slow compared to the actual filtering and ranking algorithm.

So what I currently do is to fetch the most visited 1500 pages from the background page once and then operate on the result. I am not sure how to transfer parts of the logic to the background page without losing performance, but I'll try something out later.

@int3
Collaborator
int3 commented Jan 25, 2012

If you are only transferring the top 10 or so results on each keystroke, shouldn't it be reasonably fast?

@niklasb
niklasb commented Jan 25, 2012

@int3: Yeah, that's what I was thinking. I'll try to transfer as much of the logic as possible to the background page. An additional benefit would be lower memory consumption (only one cache instead of one for every page that's used to access the omnibox). There are some caveats however, for example that I can't transfer the rendering function to the content page. I think I can work around this, but it might not be as clean as it is now.

@niklasb
niklasb commented Jan 25, 2012

@int3: I rethought this and see more clearly now what the problem is. It's not a matter of where the results are filtered but how many history items are actually included in the search. I don't make use of the search facility included in the Chrome API, because I use a custom, fuzzy algorithm to filter the results. I limited the number to 1500 because the fuzzy matching takes a quite bit of time, which obviously is O(n) to the number of entries. Raising this number significantly (e.g. to 15000) will inevitably make the dialog less responsive (at least if you don't have a high-end CPU). The only thing I can imagine how the impact of this could be reduced is by

  • a) making the filtering significantly more efficient (which I am not sure how to do, it's currently using fast regex matching and uses caching to only search within plausible sets of previous results)
  • b) initially using a different sort algorithm, so that relevant history items are more likely to be amongst the search set of (for example, we could sort by last access time rather than visit count?)

Moving logic to background page is still sensible (and I am going to do it), but it wouldn't solve this particular problem.

@int3
Collaborator
int3 commented Jan 25, 2012

Given that you're calling Array.sort() in your code, I think your runtime should be O(n lg n), not O(n). Since we only really need the top k results, we might be able to speed things up. One way is to use the quickselect algorithm to find the kth largest element, and then loop over the array once more to collect all the k elements whose relevancy is >= to the kth largest one. The running time for that should be O(n) on average. The question, however, is whether the constant overhead of having this algorithm in JS instead of native C will outweigh the O(lg n) savings.

I think the matching / relevancy algorithm might be amenable to some optimizations as well... let me mull over that for a bit.

@int3
Collaborator
int3 commented Jan 25, 2012

Oh, my bad, I think I misunderstood your comment. You were trying to say that the filtering of the results was the bottleneck, and not the sorting, hence the O(n) runtime. Yeah, I guess that makes sense, considering that the matching algorithm takes O(m) time to run where m is the size of the query. Hmm...

Edit: On closer inspection of match(), I realize the runtime is actually much slower than that.

@niklasb
niklasb commented Jan 25, 2012

@int3: fuzzyMatcher.match() is only ever called on the results that are actually displayed (as the documentation says). It uses a more complicated, backtracking algorithm to prefer marking consecutive letters in the dialog. fuzzyMatcher.filter() is used to filter the results, and it's O(n) with n being the number of matches of the last call or the number of total history items if this is the first call.

@int3
Collaborator
int3 commented Jan 25, 2012

Oh. Well in that case... I can think of two things to try:

  1. Build a trie and match on that
  2. Filter the completions lazily. I.e. whenever the filter function finds a match, call a callback and pass that match to it.
@niklasb
niklasb commented Jan 25, 2012

@int3: The idea about using a prefix-based search tree also came to my mind, but it would take huge amounts of memory. Do you have something specific in mind?

About your second suggestion: There are three steps involved in filtering/ranking the results:

  1. roughly filter the source list for possible matches using regexes (fuzzyMatcher.filter)
  2. assign a relevancy value to each match (fuzzyMatcher.calculateRelevancy)
  3. sort results by relevancy (MergingCompleter.filter)

I fail to see how laziness would buy us anything here, as we need to sort (or quickselect) the complete set of results after the complete list of filtered results are available. The only way I think this could be helpful is if this is used with some kind of insertion sort. I doubt that it will give better performance, though, Array.sort() should be quite fast already (the filtering really is the bottleneck!).

UPDATE: Also, I added a commit that causes the dialog to use the last visited 2000 pages as a source, instead of the 1500 most frequently visited. Do you think this is an improvement?

UPDATE 2: About fuzzy matching and tries: If the user gives us input like "abcde", we want to match the source list against a pattern like *a*b*c*d*e*. So already a simple string like "12345" would have several paths leading to it, like "123", "134", ..., so it would need to be inserted at N = bin(5,1) + bin(5,2) + bin(5,3) + bin(5,4) + bin(5,5) = 31 locations within a primitive search tree of depth 5. I can experiment a bit with this, but I think optimizing this is not a simple task.

UPDATE 3: I played around a bit and managed to optimize quite a lot by deferring some calculations as long as possible, so thanks for your hint. I think this allows us to raise the number of history items searched by default, at least it works on my system rather smoothly with ~10000 items. Can you try the changes?

@int3
Collaborator
int3 commented Jan 26, 2012

Well, I'm getting the particular result I was looking for earlier, so that's nice :) Let me play around with it for a bit more to see how it fares.

Anyway, to clarify my earlier comment on lazy evaluation, I was indeed thinking about sorting incrementally via insertion sort -- it would be slower than the native Array.sort(), but it seems like it'd work around the slowness with filtering. What approach did you take in the end?

@niklasb
@int3
Collaborator
int3 commented Jan 26, 2012

Yes, we might take a while to get a particular desired result, but lazy filtering would allow us to start sending the results sooner. It would not decrease total runtime in the worst case, but it would increase the responsiveness of Vimium, and responsiveness is a bigger determinant of how fast an application feels. Moreover, pre-sorting by last visit time / number of visits would allow us to return more likely hits first. Just think about searching in Spotlight -- you get a few results almost instantly, usually of frequently used programs and files, but more results trickle in afterwards.

@niklasb
niklasb commented Jan 26, 2012

@int3: I get it now. This is what I had in mind too, but it seems to be much more complicated to implement, so I dropped the idea in favor of code simplicity (for the moment). Maybe I'll come back to this, though.

@niklasb
niklasb commented Jan 27, 2012

The logic now lives in the background page. This seems to greatly improve performance and overall responsiveness.

@philc
Owner
philc commented Jan 27, 2012

I still haven't gotten to diving into this deeply, and I want to do a 1.31 release shortly before we pull something this large in.

@niklasb
niklasb commented Jan 27, 2012

@philc: I understand. It's actually still a bit of work in progress, as it turns out (especially regarding optimization). That's why I keep pushing one change after the other :P When you think you can spend some time on checking this out, I will concentrate on bringing it into a more consistent and stable state. Do you think this is okay?

UPDATE I just noticed that the above formulation is a bit suboptimal. Please don't get me wrong, I think this is absolutely usable right now (I use it every day), I just think it can be improved even further.

@philc
Owner
philc commented Jan 27, 2012

Sounds good

@philc
Owner
philc commented Feb 13, 2012

This feature and bookmarks would benefit from a fix to #411 -- our dialogs show under Flash, not on top of it, on windows.

@niklasb
niklasb commented Feb 13, 2012

@philc: Thanks for the hint, I will integrate that fix.

@ilya
Collaborator
ilya commented Apr 6, 2012

What's the status here?

@niklasb
niklasb commented Apr 6, 2012

@ilya: Using it every day, so it's kinda ready :) philc said something about coming back to this if his time allows? Haven't checked since, I guess this needs some rebasing (not a lot).

@philc
Owner
philc commented Apr 6, 2012

It sounds like the feature is usable. I haven't been using it myself, but the next steps are to review the implementation and tidy it up if necessary, and style the UI where necessary.

@philc
Owner
philc commented Apr 6, 2012

@niklasb can you rebase this?

@niklasb
niklasb commented Apr 6, 2012

@philc: Can't finish that today, but I think I can do it by the middle of next week, if that's okay.

niklasb added some commits Jan 21, 2012
@niklasb niklasb add fuzzy mode 6f589fe
@niklasb niklasb make failing test pass again :) fd549f0
@niklasb niklasb remove trimming from isUrl c808c6a
@niklasb niklasb add possibility to use custom search engines and use "wiki ", "cc ", …
…and ";" (goto) as custom commands
253dc6f
@niklasb niklasb remove useless CSS directive f4416cb
@niklasb niklasb make CSS more specific 2a9c88a
@niklasb niklasb fix some issues with asynchronous filtering and refreshing bd6f3be
@niklasb niklasb make F5 refresh the completions b6a91ff
@niklasb niklasb remove empty leading line 9e6bdef
@niklasb niklasb fix syntax error ef7084e
@niklasb niklasb allow fuzzy completion of HTML-decorated content and implement this f…
…or history and bookmarks
79b0340
@niklasb niklasb allow custom functions as commands, add example in form of forced web…
… search
69ea703
@niklasb niklasb document the HTML matching algorithm better 667f97b
@niklasb niklasb make number of completions configurable 85ae8c6
@niklasb niklasb decrease number of history items 6979cd6
@niklasb niklasb underscore is not a word character in our context 8f126b0
@niklasb niklasb improve relevancy calculation 0bcad93
@niklasb niklasb move some logic from background to content page 3520212
@niklasb niklasb fix relevancy heuristic 642694b
@niklasb niklasb add Google lucky search as command "luck " f0409c5
@niklasb niklasb add tab completion support 53f8158
@niklasb niklasb really get more history items 76b26a8
@niklasb niklasb remove debug statement 516101b
@niklasb niklasb improve string ranking fc545ea
@niklasb niklasb prefer marking continguous characters 4691478
@niklasb niklasb use background page to open URLs 2c5b916
@niklasb niklasb fix bug in query normalization 9c3ea93
@niklasb niklasb restructure fuzzy mode flow 9890e17
@niklasb niklasb add special command for tab completion 9bab5a3
@niklasb niklasb fix search URL building aa5b39f
@niklasb niklasb make CSS more robust 6be4f61
@niklasb niklasb fix ranking algorithm 8e20171
@niklasb niklasb sort results in tab completion mode de0a360
@niklasb niklasb fall back to regex matching for long queries 59b096f
@niklasb niklasb raise regex threshold to query length 15 8179e6d
@niklasb niklasb fix bad English 958dd00
@niklasb niklasb improve perfmance by caching history results in the background page.
Also decrease the number of included results slightly.
6bc1d44
@niklasb niklasb minor optimizations and code cleanup 14cb3e4
@niklasb niklasb improve comments and fix some naming style inconsistencies 2eae98f
@niklasb niklasb DRY up code a9ea93a
@niklasb niklasb add bookmark in "all" completion 82b461b
@niklasb niklasb more DRYing f69fea3
@niklasb niklasb fix small bug 4edbe74
@niklasb niklasb update results list asynchronously to take load from the CPU and impr…
…ove the perceived responsiveness
a9d89d0
@niklasb niklasb fix small bug when closing and reopening dialog in combination with a…
…sync updating
6463714
@niklasb niklasb hide completion list if no completions are available 9fd6faa
@niklasb niklasb force update after pressing Return 3f1fa91
@niklasb niklasb code cleanup 951a853
@niklasb niklasb add HTML helpers f41ebfb
@niklasb niklasb create DOM by HTML 885c633
@niklasb niklasb fix regression in 10a77d25c3 99c19d2
@niklasb niklasb move query length threshold from UI to logic
This enables to set threshold to 0 for tabs (so that tabs are shown before typing).
b226ddf
@niklasb niklasb prevent duplicate history items c397fff
@niklasb niklasb make refresh interval configurable
For example, tab completion can happen instantly while history completion shouldn't.
ff8e8aa
@niklasb niklasb make refresh <F5> work as expected a0d0d8e
@niklasb niklasb introduce a utils helper for prototype inheritance d761e42
@niklasb niklasb code cleanup + small bugfixes 386c084
@niklasb niklasb correctly handle selection after refresh ccd7998
@niklasb niklasb fix HTML tag stripping 4cfe44b
@niklasb niklasb make fuzzybox HTML more compact d1364c2
@niklasb niklasb rename "match" method to prevent confusion 66330ac
@niklasb niklasb fix small bug with selected item b23ec4b
@niklasb niklasb fix a bug in the filtering algo that causes caching not to happen 76b981c
@niklasb niklasb sort history by last visit time and raise number of history items to …
…be searched
7c2755b
@niklasb niklasb fix bug when history maxResults is too high e8b4025
@niklasb niklasb add lazy evaluation at several places 392232f
@niklasb niklasb raise number of history items searched f7f40e3
@niklasb niklasb really only update every X milliseconds 0ab64a0
@niklasb niklasb move completion logic to background page
This has the following advantages:
* searching is done in the background, UI responsiveness is improved
* caches are no longer duplicated. This saves RAM and improves performance
3449af0
@niklasb niklasb small tweaks bcc0e02
@niklasb niklasb optimization 7eb205f
@niklasb niklasb small changes f28edd6
@niklasb niklasb add domain completion a la Chrome omnibox 5ee53fc
@niklasb niklasb don't complete tabs in omni mode b56ee27
@niklasb niklasb fix strange whitespace artefacts 09aaab6
@niklasb niklasb fix domain completion bb37bcc
@niklasb
niklasb commented Apr 11, 2012

@philc: I successfully rebased this. Two problems remain that I am aware of:

  1. Issue #411 seems to apply to all Vimium dialogs, including this one, on my machine (Arch Linux x86_64, Chrome 17).
  2. Websites sometimes break the CSS of the box. I could of course incrementally add more and more CSS reset rules to overwrite certain sites' stylesheets, but that's not a final solution. Not sure how to handle this.
@ilya
Collaborator
ilya commented Apr 12, 2012

I'm not that worried about #1.

How often is #2 happening? I'm no CSS expert but can't you hard reset everything and scope it only to our stuff?

@niklasb
niklasb commented Apr 12, 2012

@ilya: It's not happening too often. As far as I understand, I'd have to "hard reset" every single property to really exclude every possiblity of interference. I'm already assigning the .vimimumReset class to the box, maybe I could do that for every single HTML element if necessary. We could also add !important to the .vimiumReset rules to make them a lot more specific.

I just remembered what problem number 3 was: Decoding the key presses seems to be a non-trivial task. I was unable to figure out a generic way to detect which characters was typed in by the user. The result of this is that the Shift key doesn't work properly for some non-character keys like !@#$.. on a US layout. This might be even more of a problem on different locales (though German special characters seem to work). I will look into this again.

EDIT: I should say that the same applies to the existing bookmark search. I already fixed the previous behaviour where you could only type lowercase letters.

@ilya
Collaborator
ilya commented Apr 12, 2012

I feel like the third problem was solved in our codebase somewhere already. I did a lot of work back in the day to make sure we supported international keyboards and such.

@niklasb
niklasb commented Apr 12, 2012

@ilya: Hm, at least it wasn't solved properly. There is lib/keyboardUtils.js:getKeyChar, but it's far from reliable from my testing. The most obvious issue was that it didn't let you type uppercase letters, which I patched (maybe in a suboptimal way). Maybe your work was at a different place? To reproduce what I mean, just press b and try to input something like aA!, it comes out as aa1 here.

@ilya
Collaborator
ilya commented Apr 13, 2012

Works for me on a US keyboard. I guess you're right -- the internationalization is messed up.

@niklasb
niklasb commented Apr 13, 2012

@ilya: Strange.. I have a German keyboard, but the whole system is US-localized. I guess it would be a better idea to use an <input type="text"> to receive the input, to make this a lot more reliable.

@philc
Owner
philc commented Apr 13, 2012

I haven't played with it yet, but if you're not using a text input to receive input, please do. There's no reason handling the platform specific shortcuts and editing via the mouse shouldn't work.

@niklasb
niklasb commented Apr 13, 2012

@philc: I'm at it :) I had just copied some of the existing code from the bookmarks search, guess that wasn't the best idea.

@ilya
Collaborator
ilya commented Apr 13, 2012

@philc you should play with it.

@ilya
Collaborator
ilya commented Apr 13, 2012

Can this omnibox subsume the bookmarks stuff? I'm really digging it.

@int3
Collaborator
int3 commented Apr 13, 2012

I'm pretty sure we have the same internationalization problems in bookmarks search and find mode... there are a bunch of open issues somewhere.

And I agree with @ilya, the omnibox should replace the bookmarks search.

@niklasb
niklasb commented Apr 13, 2012

@ilya: Yes, it currently matches domain names, history, bookmarks and tabs. EDIT: Oh, not sure whether I understood this correctly.

@niklasb
niklasb commented Apr 14, 2012

@ilya, @philc, @int3: Fixed the input problem by using an <input>. Still not sure how to go on about the CSS issue best. What do you think of changing the .vimimumReset rules to !important?

Also, has someone an idea why that Flash issue occurs? Can we somehow work around that?

@niklasb
niklasb commented Apr 25, 2012

@ilya, @philc: What's the status here?

@philc
Owner
philc commented Apr 26, 2012

@niklasb Next action is on my plate. I'll be using and code reviewing this tonight.

@ilya
Collaborator
ilya commented Apr 26, 2012

woot!

@philc philc merged commit a26a2bf into philc:master Apr 26, 2012
@philc
Owner
philc commented Apr 26, 2012

This is pretty amazing and the code looks good. I'm merging it into master so it can get some mileage and we can start improving it and fixing bugs on master instead of a branch. I've filed a couple to get us started (#515 and #516).

In the coming days I'll shuffle a few bits around so the style is more uniform with the rest of the Vimium code base, but on the whole the code is nicely structured and documented.

Excellent, excellent work @niklasb.

@philc
Owner
philc commented Apr 26, 2012

I'm calling this the "vomnibox" so we have a convenient way to refer to it. Alternate names are welcome.

@ilya
Collaborator
ilya commented Apr 26, 2012

This is really exciting. I think one major action item is to rip out the existing bookmarks browser.

@int3
Collaborator
int3 commented Apr 26, 2012

I'm glad we're finally getting a omnibar :) Thanks so much @niklasb!

Re alternate names: Can we just call it 'omnibox'? The 'vom-' syllable reminds me vaguely of an unpleasant digestive phenomenon...

@niklasb
niklasb commented Apr 26, 2012

@philc: Nice!

@int3: Obviously the term "Omnibox" is already taken by Google... But I also don't like vomnibox very much (although I don't really care :P)

@int3
Collaborator
int3 commented Apr 27, 2012

Ah, I thought Chrome's one was called the 'omnibar', but apparently it has more than one name... well yeah it's not a big deal either way :)

@sheerun
sheerun commented Feb 6, 2013

Fuzzy match does not work for me on history items..

@philc
Owner
philc commented Feb 6, 2013

It works for me, but I also saw it break on @harob's Vimium. There were no results and no exceptions from the content script or background script. I suspect there's some kind of bad history item that we're not handling.

@philc
Owner
philc commented Feb 6, 2013

Filed a new bug as #771.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.