Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

History completion #547

Merged
merged 35 commits into from
Mar 17, 2015
Merged

History completion #547

merged 35 commits into from
Mar 17, 2015

Conversation

The-Compiler
Copy link
Member

@The-Compiler The-Compiler commented Mar 11, 2015

This is the continuation of #531 - related issue: #32.

/cc @toofar @fosskers

Open issues

  • Sometimes the history completion doesn't update anymore when visiting new pages.
    • One possible issue: When the completion is limited to the 1000 newest items, self._history.historyContains(entry.url_string) will still be True even if the entry is in the full history, but not actually in the completion.
  • Slicing of the last N URLs doesn't seem to work properly
    • I don't have http://heise.de/ in my history completion even though it's on line 15085 of 15320 in my history file.
  • Currently there are duplicated entries if something is in WebHistory._old_urls and in WebHistory._new_history. I'm not sure what's the best way to fix that though.
  • RuntimeError because category QStandardItem got deleted in history_changed.
    • details
    • Reproducable on @toofar's histcomplete branch when (after applying a small fix for the segfault) opening/closing a second window, and then opening a new URL.
    • Maybe the lambda connected to WebHistory.item_added doesn't get disconnected properly when the window was closed and the completion model was destroyed? I don't really get why, but it's not reproducible anymore with the new code not using a lambda.
  • segfault when using :open -w
  • A new Completer object is created for each CompletionView currently, and thus for every MainWindow. This means opening a new window takes more time (~1s on my machine) and RAM.
    • I think every view still needs its own Completer, but the model should be global instead.
    • See the profile output for running :repeat 20 open -w and :quit.
  • Performance still isn't that good for some users, we probably should limit the amount of old history we load, with a config option.

Other things and nice-to-have

  • Should there be duplicates for URLs with different fragments? Maybe the fragment should always be stripped when adding the URL to the history? Especially for javascript-URLs with a # and empty fragment...
    • We shouldn't strip fragments, as they are used for single-page webpages to save state.
  • The third column can have a date in a human-readable format, ideally configurable with a new config option completion -> timestamp-format or so.
  • "which usually the last ones accessed" in qutebrowser/browser/history.py is confusing

This change is Reviewable

toofar and others added 6 commits March 11, 2015 21:50
Adds a basic completion model implementation around the global browser
history and registers that for the open command.

Modifies WebHistory to add an __iter__ method and to use a dict instead of a
set to store an entire HistoryEntry for each archived item instead of just the
URL. Brief tests showed that the lookup time for set and dict are very
similar. They are at least on the same order of magnitude. Testing membership
of a list on the other hand, as was the case before a set was used, was four
orders of magnitude slower on my machine.
I went to some effort to avoid duplipcating code which which leads to some
arguably ugly class method calling.
Each new HistoryEntry is emitted after being added to the global history
store. Current members of the HistoryEntry are `url` and `atime`. `title`
should be coming soon.
@The-Compiler The-Compiler added enhancement component: ui Issues related to the user interface. labels Mar 11, 2015
@toofar
Copy link
Member

toofar commented Mar 11, 2015

Currently there are duplicated entries if something is in WebHistory._old_urls and in WebHistory._new_history`. I'm not sure what's the best way to fix that though.

We could just put all of the entries into the same dict and either have a archived flag or have an additional list of URLs that still need to be written out.

Should there be duplicates for URLs with different fragments? Maybe the fragment should always be stripped when adding the URL to the history? Especially for javascript-URLs with a # and empty fragment...

I think we shouldn't ignore the fragment. Think of single page apps that use the fragment for state and wanting to go back to that particular state from your history.

@The-Compiler
Copy link
Member Author

We could just put all of the entries into the same dict and either have a archived flag or have an additional list of URLs that still need to be written out.

Maybe I'm missing something (6:45 AM), but don't we basically have that already? WebHistory has _old_urls (the url -> HistoryEntry dict), and _new_history (a list of HistoryEntry objects).

_old_urls is only used in __iter__ and in historyContains - so it should be easy to add all entries to _old_urls (maybe rename it then), only look at that in historyContains and use _new_history only for saving/self._lineparser.

The-Compiler and others added 3 commits March 12, 2015 08:07
Two things here. One is to use `WebHistory._new_history` only as a to-save
queue, so we now add entries to `_old_urls` when they are first created and
can now no longer iterate of `_new_history` in `__iter__()`.

Second is to stop blindly tacking new history entries on the end of the
history completion model. It does involve iterating over the model to find the
existing entry but we only do that if we know the duplicate is there, which is
fast to check.

This also ads another point of mutation to the history completion model which
may prove problematic if it leads to more segfaults.
@The-Compiler
Copy link
Member Author

I got the segfault figured out:

  • In your code, you did model.new_item(cat, entry.url, "", entry.atime), without using int() on the atime.
  • With self._old_urls, everything was fine because HistoryEntry.atime happened to be a string.
  • For self._new_history, the atime actually is a float
  • For history_changed that happened to work fine because there, you used str()
  • In fill_model however (so in the __init__ of a new model when there's something in _new_history), an int got passed to QStandardItem.
  • This invoked the QStandardItem(int rows, int columns = 1) constructor, which (apart of not doing what intended) seems to segfault with large ints - I reported that as QTBUG-44951.

In my code, I added the int() (mainly to get rid of the decimal places) so this went away. Above I also commited a change which always converts the atime to float for WebHistoryEntry because that makes more sense.

@toofar
Copy link
Member

toofar commented Mar 12, 2015

Oh goodie. Is that both the crashes that you experienced handled or just the one that was mysteriously fixed? Because so far I haven't managed to reproduce either. I am going to make a 64bit arch chroot to see if it is because of a difference in our setups. Currently I am on 32bit Debian testing with Qt at version 5.3.2.

@The-Compiler
Copy link
Member Author

That's just the segfault which has been mysteriously fixed - I haven't looked at the exception yet. I've just had it happen twice yesterday after using your branch for some time, but I didn't investigate yet.

toofar and others added 7 commits March 12, 2015 21:35
The need for those were removed in #548.
- HistoryItem.atime now always should be an int/float.
- The data for the sort role should also be an int, not a string.
  A float would also work, but maybe be slower for no real benefit.
Otherwise we would construct a QStandardItem with the
QStandardItem(int rows, int columns = 1) constructor, which will most likely
not do what we want.
@The-Compiler
Copy link
Member Author

I pushed commits to handle everything except that exception - I hope I'll have time to look at that later today.

Quick question about dbd121a: Is there some reason you constructed a new QStandardItem there instead of setting the data of the existing one?

@The-Compiler
Copy link
Member Author

I added another issue above - currently the models are created again for every window, which means this will take a lot of time. Instead, the completion models should be stored globally.

@The-Compiler
Copy link
Member Author

I also found out how to reproduce the exception - and even though I'm not 100% sure why, it doesn't happen with the new code anymore. So the only thing left is making the models global instead of per-window.

Before, we initialized the completions once for every window spawned, which was
a waste of CPU-time and RAM.

Now we only initialize them once, when the user uses the completion for the
first time.
@The-Compiler
Copy link
Member Author

This is now handled as well, completions are now only loaded once, when they're used the first time (to not slow down startup time).

However two people in #qutebrowser testing it had some performance issues:

16:34 <flavi0> well the first populated list seems to contain all of history and things feel slightly sluggish
16:34 <The-Compiler> yeah, it contains the whole history currently - it works nicer than I thought it would, fast enough for me
16:35 <flavi0> maybe you want to restrict that to the last week, or maybe N entries which will fit on the screen and only start filtering through all of history as soon as people start to type, or something like that...
16:36 <The-Compiler> I think the Qt models already do that
16:37 <flavi0> hmm but it seems that the whole list is populated upfront.. don't see why it should scroll sluggishly otherwise.. don't know how that qt model works, anyways..
16:38 <flavi0> anyways it looks great! :)
16:38 <Unison> Looks great, I'm getting some pretty tremendous lag though
16:38 <The-Compiler> yeah, the Qt model itself is populated when opening the completion the first time
16:39 <Unison> Typed 2 letters, took around 12 seconds to unfreeze the screen
16:39 <The-Compiler> Unison: :(
16:39 <Unison> I have around 17k entries in my history
16:39 <flavi0> here when typing it's ok (where can i see the history file?)
16:39 <Unison> ~/.local/share/qutebrowser/history
16:40 <The-Compiler> what does this say for you guys?
16:40 <The-Compiler> cut -d' ' -f2 ~/.local/share/qutebrowser/history | sort | uniq | wc -l
16:40 <flavi0> ok so here it's ~2360 entries
16:40 <Unison> The-Compiler: 5853
16:40 <flavi0> oh hang on that's with duplicated
16:40 <The-Compiler> hmm, 6614 here - what hardware, Unison?
16:40 <flavi0> 1401
16:41 <The-Compiler> Unison: also, if you do 'o', wait until the completion is there, and then type, is it sluggish as well? Or is it opening the completion?
16:41 <Unison> The-Compiler: It's while typing
16:41 <flavi0> for me typing is fine, it's only the scrolling which is sluggish
16:41 <Unison> It's actually not bad when first pressing 'o'
16:42 <flavi0> if i hit scroll wheel, i'd say it pauses about one second until scrolling is finished
16:43 <Unison> The-Compiler: 6 gb ram, i5 1.8 ghz
16:43 <The-Compiler> completely smooth for me :)
16:43 <The-Compiler> (here at work, i5 @ 3.3 GHz, 12 GB RAM)
16:44 <flavi0> well not really pauses, it jumps actually to one intermediate step and then a second time to the end. which overall takes close to a second i'd say
16:44 <flavi0> dual-core amd opteron @2.6

I'll profile it and see if I can optimize anything - but we probably should limit the amount of history loaded with a config option.

This is now renamed to cmd-history-max-items to avoid confusion with the web
history.
Instead of calling expandAll() and iterating through all items, we can just
force the top-level items to be expanded.
@The-Compiler
Copy link
Member Author

That's also done, by default limited to 1000 items. I still want to do some testing and get some feedback on performance, and then I'll merge this!

Thanks a lot again for getting started with this so I was motivated to finish it :)

Also, sorry for the spam here, @toofar ;)

Before, the item_added signal was emitted *after* an item was added, which
means the on_history_item_added slot always assumed the item already is in the
history.
@toofar
Copy link
Member

toofar commented Mar 16, 2015

I guess limiting the history for completion is pragmatic, it is definitely a lot smoother. Another option may be to just complete on Tab key presses like dwb does. I was trying to look into why it is so slow but it is tough going. As far as I can tell it isn't in the python code so it is either in Qt (probably the proxy view) or in the interface between Qt and python.

I was testing with ~35,000 entries from my dwb history but with 0 added to them all for the atime. It took me a while to realize that having the sort key the same for all of the items was a pathological case for the sorting algorithm and it was causing some rather large pauses. Just in case anyone else was testing it that way.

One last thing, non-ascii strings in URLs are percent encoded and they aren't currently decoded so you can't search for a URL with an umlaut or whatever. I just added a urllib.parse.unquote_plus around the item.urls when they are passed to new_item in urlmodel.

@The-Compiler
Copy link
Member Author

I guess limiting the history for completion is pragmatic, it is definitely a lot smoother.

Yeah, and people who want unlimited history can still set completion -> web-history-max-items to -1.

Another option may be to just complete on Tab key presses like dwb does.

I don't like that, because many people don't even know a completion exists in dwb if it's not something which shows up immediately.

I was trying to look into why it is so slow but it is tough going. As far as I can tell it isn't in the python code so it is either in Qt (probably the proxy view) or in the interface between Qt and python.

It's mainly the filtering it seems. If you run scripts/run_profile.py (with pyprof2calltree and KCacheGrind installed) you'll see what takes the most time.

I was testing with ~35,000 entries from my dwb history but with 0 added to them all for the atime. It took me a while to realize that having the sort key the same for all of the items was a pathological case for the sorting algorithm and it was causing some rather large pauses. Just in case anyone else was testing it that way.

Hmm, does the sorting take much time for you? Can you comment the sorting in completion/models/sortfilter.py:set_pattern and see if that makes a difference? For me it made almost no difference so I didn't optimize it. But in theory we could do without any sorting at all by inserting the elements in the right order and always inserting new elements on top.

One last thing, non-ascii strings in URLs are percent encoded and they aren't currently decoded so you can't search for a URL with an umlaut or whatever. I just added a urllib.parse.unquote_plus around the item.urls when they are passed to new_item in urlmodel.

I'll take a look at that and then merge.

We also use QUrl::toDisplayString for the completion so things like spaces or
umlauts are decoded properly.
Before, if an URL was present early in the history and then again later, we
didn't move it to the end of the OrderedDict. This means it won't be loaded in
the completion.
@The-Compiler
Copy link
Member Author

@toofar - A few open questions:

  • In dbd121a: Is there some reason you constructed a new QStandardItem there instead of setting the data of the existing one?
  • In the same function (on_history_item_added), why are you checking for if item.url:? I can't see how an item without an URL (or an empty string as URL) would ever be added.
  • Also, what's the if not name_item: for? Shouldn't this always be true, as we're simply iterating based on rowCount()?

Before we limited the history items we could simply call WebHistory's
historyContains before iterating through all items in the history completion.

Now however it's possible an item is in the real WebHistory, but not actually
in the completion - so we always have to check the whole completion.
@drlight-code
Copy link
Contributor

Another option may be to just complete on Tab key presses like dwb does.
I don't like that, because many people don't even know a completion exists in dwb if it's not something which shows up immediately.

I must say I would welcome if there was an option for this and would probably use the tab variant, since most of the time I just enter urls and don't use history I prefer to save those cycles and have the smoother experience. The default can of course be showing history while typing, but I second that proposal!

@fosskers
Copy link

Unrelated: how do you make the checkboxes in issues like that?

@The-Compiler
Copy link
Member Author

@flvi0 I opened #557 for that.
@fosskers: Using - [ ] foo - see Writing on GitHub - Task lists

@fosskers
Copy link

Danke, I'll be using that.

@toofar
Copy link
Member

toofar commented Mar 17, 2015

In dbd121a: Is there some reason you constructed a new QStandardItem there instead of setting the data of the existing one?

Nope, no reason. I probably didn't think about it or tried to reduce mutation for some reason.

In the same function (on_history_item_added), why are you checking for if item.url:? I can't see how an item without an URL (or an empty string as URL) would ever be added.

I honestly thought I copied it from you and that you must have had it there for some reason. But I wrote it so I guess I was just being extra careful.

Also, what's the if not name_item: for? Shouldn't this always be true, as we're simply iterating based on rowCount()?

I definitely got a runtime error because of a nonetype there at some point. I am pretty sure I was trying to figure out the structure of the nested table-like models by trial and error. I guess was being defensive because I am not familiar with the codebase and the APIs it uses.

@The-Compiler
Copy link
Member Author

Also, what's the if not name_item: for? Shouldn't this always be true, as we're simply iterating based on rowCount()?

I definitely got a runtime error because of a nonetype there at some point. I am pretty sure I was trying to figure out the structure of the nested table-like models by trial and error.

Okay - I guess I'll leave it out for now, and then we'll see if there are any exceptions reported.

I guess was being defensive because I am not familiar with the codebase and the APIs it uses.

No worries! I'm just curious :)

Soooo, time to merge this 👍

@The-Compiler The-Compiler merged commit 59bbca9 into master Mar 17, 2015
@The-Compiler The-Compiler deleted the histcomplete branch March 17, 2015 21:18
@The-Compiler The-Compiler restored the histcomplete branch March 18, 2015 06:59
@The-Compiler The-Compiler deleted the histcomplete branch April 9, 2015 20:41
@The-Compiler The-Compiler added the component: completion Issues related to the commandline completion or history. label Jun 16, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component: completion Issues related to the commandline completion or history. component: ui Issues related to the user interface.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants