History completion #547

The-Compiler · 2015-03-11T22:26:44Z

This is the continuation of #531 - related issue: #32.

Open issues

Sometimes the history completion doesn't update anymore when visiting new pages.
- One possible issue: When the completion is limited to the 1000 newest items, self._history.historyContains(entry.url_string) will still be True even if the entry is in the full history, but not actually in the completion.
Slicing of the last N URLs doesn't seem to work properly
- I don't have http://heise.de/ in my history completion even though it's on line 15085 of 15320 in my history file.
Currently there are duplicated entries if something is in WebHistory._old_urls and in WebHistory._new_history. I'm not sure what's the best way to fix that though.
RuntimeError because category QStandardItem got deleted in history_changed.
- details
- Reproducable on @toofar's histcomplete branch when (after applying a small fix for the segfault) opening/closing a second window, and then opening a new URL.
- Maybe the lambda connected to WebHistory.item_added doesn't get disconnected properly when the window was closed and the completion model was destroyed? I don't really get why, but it's not reproducible anymore with the new code not using a lambda.
segfault when using :open -w
- This only seems to occur in 834832e from Add url history completion for open. #531 by @toofar, but is gone in 0b975db with my changes.
- C++ stacktrace
- python report
A new Completer object is created for each CompletionView currently, and thus for every MainWindow. This means opening a new window takes more time (~1s on my machine) and RAM.
- I think every view still needs its own Completer, but the model should be global instead.
- See the profile output for running :repeat 20 open -w and :quit.
Performance still isn't that good for some users, we probably should limit the amount of old history we load, with a config option.

Other things and nice-to-have

Should there be duplicates for URLs with different fragments? Maybe the fragment should always be stripped when adding the URL to the history? Especially for javascript-URLs with a # and empty fragment...
- We shouldn't strip fragments, as they are used for single-page webpages to save state.
The third column can have a date in a human-readable format, ideally configurable with a new config option completion -> timestamp-format or so.
"which usually the last ones accessed" in qutebrowser/browser/history.py is confusing

This change is

Adds a basic completion model implementation around the global browser history and registers that for the open command. Modifies WebHistory to add an __iter__ method and to use a dict instead of a set to store an entire HistoryEntry for each archived item instead of just the URL. Brief tests showed that the lookup time for set and dict are very similar. They are at least on the same order of magnitude. Testing membership of a list on the other hand, as was the case before a set was used, was four orders of magnitude slower on my machine.

I went to some effort to avoid duplipcating code which which leads to some arguably ugly class method calling.

Each new HistoryEntry is emitted after being added to the global history store. Current members of the HistoryEntry are `url` and `atime`. `title` should be coming soon.

toofar · 2015-03-11T22:59:51Z

Currently there are duplicated entries if something is in WebHistory._old_urls and in WebHistory._new_history`. I'm not sure what's the best way to fix that though.

We could just put all of the entries into the same dict and either have a archived flag or have an additional list of URLs that still need to be written out.

Should there be duplicates for URLs with different fragments? Maybe the fragment should always be stripped when adding the URL to the history? Especially for javascript-URLs with a # and empty fragment...

I think we shouldn't ignore the fragment. Think of single page apps that use the fragment for state and wanting to go back to that particular state from your history.

The-Compiler · 2015-03-12T05:46:14Z

We could just put all of the entries into the same dict and either have a archived flag or have an additional list of URLs that still need to be written out.

Maybe I'm missing something (6:45 AM), but don't we basically have that already? WebHistory has _old_urls (the url -> HistoryEntry dict), and _new_history (a list of HistoryEntry objects).

_old_urls is only used in __iter__ and in historyContains - so it should be easy to add all entries to _old_urls (maybe rename it then), only look at that in historyContains and use _new_history only for saving/self._lineparser.

Two things here. One is to use `WebHistory._new_history` only as a to-save queue, so we now add entries to `_old_urls` when they are first created and can now no longer iterate of `_new_history` in `__iter__()`. Second is to stop blindly tacking new history entries on the end of the history completion model. It does involve iterating over the model to find the existing entry but we only do that if we know the duplicate is there, which is fast to check. This also ads another point of mutation to the history completion model which may prove problematic if it leads to more segfaults.

The-Compiler · 2015-03-12T07:15:15Z

I got the segfault figured out:

In your code, you did model.new_item(cat, entry.url, "", entry.atime), without using int() on the atime.
With self._old_urls, everything was fine because HistoryEntry.atime happened to be a string.
For self._new_history, the atime actually is a float
For history_changed that happened to work fine because there, you used str()
In fill_model however (so in the __init__ of a new model when there's something in _new_history), an int got passed to QStandardItem.
This invoked the QStandardItem(int rows, int columns = 1) constructor, which (apart of not doing what intended) seems to segfault with large ints - I reported that as QTBUG-44951.

In my code, I added the int() (mainly to get rid of the decimal places) so this went away. Above I also commited a change which always converts the atime to float for WebHistoryEntry because that makes more sense.

toofar · 2015-03-12T07:37:29Z

Oh goodie. Is that both the crashes that you experienced handled or just the one that was mysteriously fixed? Because so far I haven't managed to reproduce either. I am going to make a 64bit arch chroot to see if it is because of a difference in our setups. Currently I am on 32bit Debian testing with Qt at version 5.3.2.

The-Compiler · 2015-03-12T07:41:29Z

That's just the segfault which has been mysteriously fixed - I haven't looked at the exception yet. I've just had it happen twice yesterday after using your branch for some time, but I didn't investigate yet.

The need for those were removed in #548.

- HistoryItem.atime now always should be an int/float. - The data for the sort role should also be an int, not a string. A float would also work, but maybe be slower for no real benefit.

Otherwise we would construct a QStandardItem with the QStandardItem(int rows, int columns = 1) constructor, which will most likely not do what we want.

The-Compiler · 2015-03-12T14:40:12Z

I pushed commits to handle everything except that exception - I hope I'll have time to look at that later today.

Quick question about dbd121a: Is there some reason you constructed a new QStandardItem there instead of setting the data of the existing one?

The-Compiler · 2015-03-12T18:35:37Z

I added another issue above - currently the models are created again for every window, which means this will take a lot of time. Instead, the completion models should be stored globally.

The-Compiler · 2015-03-12T20:18:36Z

I also found out how to reproduce the exception - and even though I'm not 100% sure why, it doesn't happen with the new code anymore. So the only thing left is making the models global instead of per-window.

Before, we initialized the completions once for every window spawned, which was a waste of CPU-time and RAM. Now we only initialize them once, when the user uses the completion for the first time.

The-Compiler · 2015-03-13T15:58:32Z

This is now handled as well, completions are now only loaded once, when they're used the first time (to not slow down startup time).

However two people in #qutebrowser testing it had some performance issues:

16:34 <flavi0> well the first populated list seems to contain all of history and things feel slightly sluggish
16:34 <The-Compiler> yeah, it contains the whole history currently - it works nicer than I thought it would, fast enough for me
16:35 <flavi0> maybe you want to restrict that to the last week, or maybe N entries which will fit on the screen and only start filtering through all of history as soon as people start to type, or something like that...
16:36 <The-Compiler> I think the Qt models already do that
16:37 <flavi0> hmm but it seems that the whole list is populated upfront.. don't see why it should scroll sluggishly otherwise.. don't know how that qt model works, anyways..
16:38 <flavi0> anyways it looks great! :)
16:38 <Unison> Looks great, I'm getting some pretty tremendous lag though
16:38 <The-Compiler> yeah, the Qt model itself is populated when opening the completion the first time
16:39 <Unison> Typed 2 letters, took around 12 seconds to unfreeze the screen
16:39 <The-Compiler> Unison: :(
16:39 <Unison> I have around 17k entries in my history
16:39 <flavi0> here when typing it's ok (where can i see the history file?)
16:39 <Unison> ~/.local/share/qutebrowser/history
16:40 <The-Compiler> what does this say for you guys?
16:40 <The-Compiler> cut -d' ' -f2 ~/.local/share/qutebrowser/history | sort | uniq | wc -l
16:40 <flavi0> ok so here it's ~2360 entries
16:40 <Unison> The-Compiler: 5853
16:40 <flavi0> oh hang on that's with duplicated
16:40 <The-Compiler> hmm, 6614 here - what hardware, Unison?
16:40 <flavi0> 1401
16:41 <The-Compiler> Unison: also, if you do 'o', wait until the completion is there, and then type, is it sluggish as well? Or is it opening the completion?
16:41 <Unison> The-Compiler: It's while typing
16:41 <flavi0> for me typing is fine, it's only the scrolling which is sluggish
16:41 <Unison> It's actually not bad when first pressing 'o'
16:42 <flavi0> if i hit scroll wheel, i'd say it pauses about one second until scrolling is finished
16:43 <Unison> The-Compiler: 6 gb ram, i5 1.8 ghz
16:43 <The-Compiler> completely smooth for me :)
16:43 <The-Compiler> (here at work, i5 @ 3.3 GHz, 12 GB RAM)
16:44 <flavi0> well not really pauses, it jumps actually to one intermediate step and then a second time to the end. which overall takes close to a second i'd say
16:44 <flavi0> dual-core amd opteron @2.6

I'll profile it and see if I can optimize anything - but we probably should limit the amount of history loaded with a config option.

This is now renamed to cmd-history-max-items to avoid confusion with the web history.

Instead of calling expandAll() and iterating through all items, we can just force the top-level items to be expanded.

The-Compiler · 2015-03-14T22:03:33Z

That's also done, by default limited to 1000 items. I still want to do some testing and get some feedback on performance, and then I'll merge this!

Thanks a lot again for getting started with this so I was motivated to finish it :)

Also, sorry for the spam here, @toofar ;)

Before, the item_added signal was emitted *after* an item was added, which means the on_history_item_added slot always assumed the item already is in the history.

toofar · 2015-03-16T04:34:16Z

I guess limiting the history for completion is pragmatic, it is definitely a lot smoother. Another option may be to just complete on Tab key presses like dwb does. I was trying to look into why it is so slow but it is tough going. As far as I can tell it isn't in the python code so it is either in Qt (probably the proxy view) or in the interface between Qt and python.

I was testing with ~35,000 entries from my dwb history but with 0 added to them all for the atime. It took me a while to realize that having the sort key the same for all of the items was a pathological case for the sorting algorithm and it was causing some rather large pauses. Just in case anyone else was testing it that way.

One last thing, non-ascii strings in URLs are percent encoded and they aren't currently decoded so you can't search for a URL with an umlaut or whatever. I just added a urllib.parse.unquote_plus around the item.urls when they are passed to new_item in urlmodel.

The-Compiler · 2015-03-16T05:35:38Z

I guess limiting the history for completion is pragmatic, it is definitely a lot smoother.

Yeah, and people who want unlimited history can still set completion -> web-history-max-items to -1.

Another option may be to just complete on Tab key presses like dwb does.

I don't like that, because many people don't even know a completion exists in dwb if it's not something which shows up immediately.

I was trying to look into why it is so slow but it is tough going. As far as I can tell it isn't in the python code so it is either in Qt (probably the proxy view) or in the interface between Qt and python.

It's mainly the filtering it seems. If you run scripts/run_profile.py (with pyprof2calltree and KCacheGrind installed) you'll see what takes the most time.

I was testing with ~35,000 entries from my dwb history but with 0 added to them all for the atime. It took me a while to realize that having the sort key the same for all of the items was a pathological case for the sorting algorithm and it was causing some rather large pauses. Just in case anyone else was testing it that way.

Hmm, does the sorting take much time for you? Can you comment the sorting in completion/models/sortfilter.py:set_pattern and see if that makes a difference? For me it made almost no difference so I didn't optimize it. But in theory we could do without any sorting at all by inserting the elements in the right order and always inserting new elements on top.

One last thing, non-ascii strings in URLs are percent encoded and they aren't currently decoded so you can't search for a URL with an umlaut or whatever. I just added a urllib.parse.unquote_plus around the item.urls when they are passed to new_item in urlmodel.

I'll take a look at that and then merge.

We also use QUrl::toDisplayString for the completion so things like spaces or umlauts are decoded properly.

Before, if an URL was present early in the history and then again later, we didn't move it to the end of the OrderedDict. This means it won't be loaded in the completion.

The-Compiler · 2015-03-16T08:19:54Z

@toofar - A few open questions:

In dbd121a: Is there some reason you constructed a new QStandardItem there instead of setting the data of the existing one?
In the same function (on_history_item_added), why are you checking for if item.url:? I can't see how an item without an URL (or an empty string as URL) would ever be added.
Also, what's the if not name_item: for? Shouldn't this always be true, as we're simply iterating based on rowCount()?

Before we limited the history items we could simply call WebHistory's historyContains before iterating through all items in the history completion. Now however it's possible an item is in the real WebHistory, but not actually in the completion - so we always have to check the whole completion.

drlight-code · 2015-03-16T10:38:32Z

Another option may be to just complete on Tab key presses like dwb does.
I don't like that, because many people don't even know a completion exists in dwb if it's not something which shows up immediately.

I must say I would welcome if there was an option for this and would probably use the tab variant, since most of the time I just enter urls and don't use history I prefer to save those cycles and have the smoother experience. The default can of course be showing history while typing, but I second that proposal!

fosskers · 2015-03-16T14:57:24Z

Unrelated: how do you make the checkboxes in issues like that?

The-Compiler · 2015-03-16T15:03:38Z

@flvi0 I opened #557 for that.
@fosskers: Using - [ ] foo - see Writing on GitHub - Task lists

fosskers · 2015-03-16T15:18:05Z

Danke, I'll be using that.

toofar · 2015-03-17T03:05:10Z

In dbd121a: Is there some reason you constructed a new QStandardItem there instead of setting the data of the existing one?

Nope, no reason. I probably didn't think about it or tried to reduce mutation for some reason.

In the same function (on_history_item_added), why are you checking for if item.url:? I can't see how an item without an URL (or an empty string as URL) would ever be added.

I honestly thought I copied it from you and that you must have had it there for some reason. But I wrote it so I guess I was just being extra careful.

Also, what's the if not name_item: for? Shouldn't this always be true, as we're simply iterating based on rowCount()?

I definitely got a runtime error because of a nonetype there at some point. I am pretty sure I was trying to figure out the structure of the nested table-like models by trial and error. I guess was being defensive because I am not familiar with the codebase and the APIs it uses.

The-Compiler · 2015-03-17T05:13:52Z

Also, what's the if not name_item: for? Shouldn't this always be true, as we're simply iterating based on rowCount()?

I definitely got a runtime error because of a nonetype there at some point. I am pretty sure I was trying to figure out the structure of the nested table-like models by trial and error.

Okay - I guess I'll leave it out for now, and then we'll see if there are any exceptions reported.

I guess was being defensive because I am not familiar with the codebase and the APIs it uses.

No worries! I'm just curious :)

Soooo, time to merge this 👍

toofar and others added 6 commits March 11, 2015 21:50

Add new UrlCompletion model which includes web history and quickmarks.

59948a0

I went to some effort to avoid duplipcating code which which leads to some arguably ugly class method calling.

Web history changed signal now emits the new entry.

834832e

Each new HistoryEntry is emitted after being added to the global history store. Current members of the HistoryEntry are `url` and `atime`. `title` should be coming soon.

Use CompletionFilterModel's sort implementation.

fe4f326

Fix lint

34b24aa

Refactor how completions are organized.

0b975db

The-Compiler mentioned this pull request Mar 11, 2015

Add url history completion for open. #531

Closed

The-Compiler added enhancement component: ui Issues related to the user interface. labels Mar 11, 2015

The-Compiler and others added 3 commits March 12, 2015 08:07

Convert the atime to float in HistoryEntry.

9ee19be

Clean up incomprehensible comment.

7342681

toofar and others added 7 commits March 12, 2015 21:35

history: simplify

61e732f

history: Remove unneeded _old_hit and _old_miss.

299dbfa

The need for those were removed in #548.

Fix lint.

7a28b6c

Make it possible to configure the timestamp format.

8023b14

Improve types of history model values.

97dd867

- HistoryItem.atime now always should be an int/float. - The data for the sort role should also be an int, not a string. A float would also work, but maybe be slower for no real benefit.

Set data of existing item.

dbd121a

Make sure args are not int in new_item().

389feab

Otherwise we would construct a QStandardItem with the QStandardItem(int rows, int columns = 1) constructor, which will most likely not do what we want.

The-Compiler added 2 commits March 13, 2015 16:25

Initialize completions lazily and only once.

74892ac

Before, we initialized the completions once for every window spawned, which was a waste of CPU-time and RAM. Now we only initialize them once, when the user uses the completion for the first time.

Fix lint.

96da7d9

The-Compiler added 6 commits March 13, 2015 19:46

Rename completion -> history-length.

55eabaf

This is now renamed to cmd-history-max-items to avoid confusion with the web history.

Limit the count of history items in the completion.

833830d

Regenerate docs.

a857b9a

Measure time it takes to init URL completion.

70cd8e7

Also measure the time it takes to set the pattern.

cdbb118

completion: Don't unnecessarily expand all items.

9512a52

Instead of calling expandAll() and iterating through all items, we can just force the top-level items to be expanded.

Fix adding of URLs to history completion.

d266665

Before, the item_added signal was emitted *after* an item was added, which means the on_history_item_added slot always assumed the item already is in the history.

The-Compiler added 6 commits March 16, 2015 06:43

Add a test for utils.debug.log_time.

46c3191

Also save the QUrl in a HistoryEntry.

553d8cf

We also use QUrl::toDisplayString for the completion so things like spaces or umlauts are decoded properly.

Cleanup

693ea0c

Compare history items based on QUrl.

5030608

Fix exception in filtermodel if model has no items.

3df5e13

Move new entries to the end when loading history.

806742a

Before, if an URL was present early in the history and then again later, we didn't move it to the end of the OrderedDict. This means it won't be loaded in the completion.

The-Compiler merged commit 59bbca9 into master Mar 17, 2015

The-Compiler deleted the histcomplete branch March 17, 2015 21:18

The-Compiler restored the histcomplete branch March 18, 2015 06:59

The-Compiler deleted the histcomplete branch April 9, 2015 20:41

The-Compiler added the component: completion Issues related to the commandline completion or history. label Jun 16, 2015

qutebrowser-bot added the status: stale label Dec 20, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

History completion #547

History completion #547

The-Compiler commented Mar 11, 2015 •

edited

Loading

toofar commented Mar 11, 2015

The-Compiler commented Mar 12, 2015

The-Compiler commented Mar 12, 2015

toofar commented Mar 12, 2015

The-Compiler commented Mar 12, 2015

The-Compiler commented Mar 12, 2015

The-Compiler commented Mar 12, 2015

The-Compiler commented Mar 12, 2015

The-Compiler commented Mar 13, 2015

The-Compiler commented Mar 14, 2015

toofar commented Mar 16, 2015

The-Compiler commented Mar 16, 2015

The-Compiler commented Mar 16, 2015

drlight-code commented Mar 16, 2015

fosskers commented Mar 16, 2015

The-Compiler commented Mar 16, 2015

fosskers commented Mar 16, 2015

toofar commented Mar 17, 2015

The-Compiler commented Mar 17, 2015

History completion #547

History completion #547

Conversation

The-Compiler commented Mar 11, 2015 • edited Loading

Open issues

Other things and nice-to-have

toofar commented Mar 11, 2015

The-Compiler commented Mar 12, 2015

The-Compiler commented Mar 12, 2015

toofar commented Mar 12, 2015

The-Compiler commented Mar 12, 2015

The-Compiler commented Mar 12, 2015

The-Compiler commented Mar 12, 2015

The-Compiler commented Mar 12, 2015

The-Compiler commented Mar 13, 2015

The-Compiler commented Mar 14, 2015

toofar commented Mar 16, 2015

The-Compiler commented Mar 16, 2015

The-Compiler commented Mar 16, 2015

drlight-code commented Mar 16, 2015

fosskers commented Mar 16, 2015

The-Compiler commented Mar 16, 2015

fosskers commented Mar 16, 2015

toofar commented Mar 17, 2015

The-Compiler commented Mar 17, 2015

The-Compiler commented Mar 11, 2015 •

edited

Loading