Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add url history completion for open. #531

Closed
wants to merge 3 commits into from

Conversation

toofar
Copy link
Member

@toofar toofar commented Mar 5, 2015

Add url history completion for open.

I mostly just copied this off of the quickmark completion. It probably
isn't done yet but I had some questions to raise. Also I see you have a
newhistory branch but can't see any conflicts on there yet.

I wanted to have the open command complete on history and on quickmarks
(well, I don't care about completing on quickmarks but I don't want to
break it). When commands are registered multiple completion models can
be passed in but it seems that only one of them is used. Is there any
way to chain completions or should I just create a new
OpenCommandCompletionModel or something which does both?

When the WebHistoryCompletionModel gets a history changed signal it just
pops off the most recent history to add to the completion model. I am
not sure how these signals are handled. Is it possible the history to be
appended to a second time before the signal gets handled? If so I will
either change it to walk backwards through history until it sees one it
has seen before or change the signal to a typed signal (I think these
are a thing) that passes the new history entry to the handlers.

Lastly there was a NoneType has no method casefold() exception being
thrown when using this new completion model and I don't know why that
wasn't showing up when using the quickmarks completion. To reproduce it
just remove the if not data check and try to open a new page by using
the open command and not selecting one of the completion options. I don't
know whether this is a bug I have caused or just exposed.

Lastly it would be nice to be able to complete based on page title too.
Since the QWebHistoryInterface API doesn't pass this information would
you consider adding history items at an appropriate stage of loading
from in qutebrowser? Like these guys did: https://git.reviewboard.kde.org/r/104257/diff/2/#1


This change is Reviewable

The-Compiler added a commit that referenced this pull request Mar 5, 2015
@The-Compiler
Copy link
Member

Awesome, thanks for your work! :)

I mostly just copied this off of the quickmark completion. It probably
isn't done yet but I had some questions to raise. Also I see you have a
newhistory branch but can't see any conflicts on there yet.

The newhistory branch shouldn't be an issue - it's just a
performance-improvement for the QWebHistoryInterface and there's only a few
unittests missing before I'll merge it.

I wanted to have the open command complete on history and on quickmarks
(well, I don't care about completing on quickmarks but I don't want to
break it). When commands are registered multiple completion models can
be passed in but it seems that only one of them is used.

The list of models is mapped to the arguments - take a look at
config/config.py:CompletionManager.set_command for example. That first takes
a section, then an option, then a value.

In the commit linked above I just added an exception if there are more
completions than arguments given.

Is there any way to chain completions or should I just create a new
OpenCommandCompletionModel or something which does both?

Currently you'd have to create a new UrlCompletionModel or so which does
both, and adds two categories via self.new_category - similiar to the
Current/Default and Completions category for SettingValueCompletionModel.

The API for managing completions really should be simpler, ideally taking a
list of QAbstractTableModels
and using that. See #74 for that. Unfortunately I haven't had time for that
yet, and completions aren't that important for me currently.

When the WebHistoryCompletionModel gets a history changed signal it just
pops off the most recent history to add to the completion model. I am
not sure how these signals are handled. Is it possible the history to be
appended to a second time before the signal gets handled? If so I will
either change it to walk backwards through history until it sees one it
has seen before or change the signal to a typed signal (I think these
are a thing) that passes the new history entry to the handlers.

I think the Qt documentation says the slots (methods/functions) connected to
the signals will get called immediately on emit if the thing emitting the
signal is in the main thread, which probably is true here (though QtWebKit
uses some threads behind the scenes, so I'm not 100% sure).

But either way, I think it'd be clearer to add a new argument to the signal
(and rename it) as you suggested, something like:

item_added = pyqtSignal(str)
...
    self.item_added.emit(url_string)

Lastly there was a NoneType has no method casefold() exception being
thrown when using this new completion model and I don't know why that
wasn't showing up when using the quickmarks completion. To reproduce it
just remove the if not data check and try to open a new page by using
the open command and not selecting one of the completion options. I don't
know whether this is a bug I have caused or just exposed.

I remember funny things happening when mutating existing completion models and
didn't want to look at that before #74 - that's why for all existing
completions I throw the whole model away and construct a new one, which of
course is suboptimal.

Honestly I'm surprised adding new items seems to be working so well - I think
you're in for a lot more surprises when changing items ;)

Lastly it would be nice to be able to complete based on page title too.
Since the QWebHistoryInterface API doesn't pass this information would
you consider adding history items at an appropriate stage of loading
from in qutebrowser? Like these guys did: https://git.reviewboard.kde.org/r/104257/diff/2/#1

Oh, nice! I used QWebHistoryInterface because I thought there must be some
good reason it's implemented that way, and there are probably some weird corner
cases I didn't consider - but when there's someone else doing it, it can't be
such a bad idea, so why not ;) Do you feel like working on this?

Though I think QWebHistoryInterface.historyContains should still stay there,
because it's also used by WebKit, e.g. to color visited links.

@toofar
Copy link
Member Author

toofar commented Mar 6, 2015

Thanks for the reply. Pretty much exactly what I was looking for. I will look into fixing this up and changing the history interface next week. I haven't exactly been dogfooding so consider me suitably chastised for just treating the symptom and ignoring the cause of crashes.

@toofar
Copy link
Member Author

toofar commented Mar 9, 2015

Here is what I have so far. I made a new completion model UrlCompletionModel
that includes the data from both the WebHistory and Quickmark ones. In an
effort to not have duplicate code I added fill_model methods to the models
that are called from the Url model in a way which may not fit with the
aesthetics of your codebase. I not particularly attached to it though if that
is an issue.

I also am now sorting the history and the atime of the history entries so
that the most recent ones are near the top. My use case is that I generally
use the history instead of bookmarks for frequently visited sites.

But ... It doesn't work right now because I was getting the history entries to
fill the model with from the WebHistory._history list. This has now been
effectively move to WebHistory._old_urls which is a set of URLs and doesn't
include the atime. I also want to be able to show and match on the title of the
webpages in history too and this method would not retain the title. I am
assuming you switched to a set for faster lookups but was wondering if I
could change it back to a list or dict. If a dict I would probably change the
items to either be a list of HistoryEntrys or change HistoryEntry to have
a list of atimes so that I can maybe implement a recency calculation at some
point.

@The-Compiler
Copy link
Member

Yeah, it's currently a set for performance reasons - for some reason QtWebKit called historyContains a lot when clicking/selecting text on certain pages, which caused a few seconds of lag...

I think the best way would be to add an __iter__ to WebHistory which combines the old history from its self._lineparser and the new history from self._new_history, similiarly to how qutebrowser.misc.lineparser.AppendLineParser combines the data from the underlying file and self.new_data in __iter__.

Though here I also worry a bit about the performance and memory usage. Have you tried how this performs with some thousand history entries? I have around 13000 entries after a bit more than a month, so let's say 150k or 200k entries after a year?

Maybe it should only use the most recent N items (e.g. using itertools.islice), or only stuff visited more than a certain threshold?

Regarding the code style: You maybe already know I'm sometimes a bit more obsessive about this than I should be ;) If you prefer me to fix such issues directly, please tell me! For fill_model, that should probably be a @classmethod or @staticmethod and take an instance parameter.

Adds a basic completion model implementation around the global browser
history and registers that for the open command.

Modifies WebHistory to add an __iter__ method and to use a dict instead of a
set to store an entire HistoryEntry for each archived item instead of just the
URL. Brief tests showed that the lookup time for set and dict are very
similar. They are at least on the same order of magnitude. Testing membership
of a list on the other hand, as was the case before a set was used, was four
orders of magnitude slower on my machine.
I went to some effort to avoid duplipcating code which which leads to some
arguably ugly class method calling.
Each new HistoryEntry is emitted after being added to the global history
store. Current members of the HistoryEntry are `url` and `atime`. `title`
should be coming soon.
@toofar
Copy link
Member Author

toofar commented Mar 11, 2015

I changed the _old_urls to a dict with values of HistoryItem. That way the
atime and title are in memory too. This increases memory usage compared to the
set that just contains the URLs but the lookup times are very similar between
sets and dicts. Both are four orders of magnitude faster on my machine.
I just did some basic tests with a separate script ~150,000 items and basic
contains() tests for 20 items (both positive and negative tests) 10,000 times
for each of ordered dict, dict, set and list.

I added the WebHistory.__iter__() method which iterates over the old URLs
(unordered, in memory) and then the new ones. I don't see a need to leave the
old URLs on disk and access them as needed unless memory usage becomes an issue.
Although with all the archived URLs in a unordered if you want to iterate over
history in order you have to put everything into a list and sort it. You could use
an ordered dict instead but with them duplicates stay in the position of the first
insertion anyway.

I made the completion model fill_model methods into static methods because
there wouldn't be any use of the class parameter if they were class methods.

Anyway, if this is alright I am going to look into calling the addHistoryEntry hook
manually so we can save the title too, I guess that should go in a separate branch.
If there is just a small change or two that is stopping you merging and you would
rather fix that yourself than waiting a day or so for me to get back from you
should feel free to have you way with it.

@fosskers
Copy link

+1 for this.

The-Compiler added a commit that referenced this pull request Mar 11, 2015
This makes it possible to use Qt's QSortFilterProxyModel::lessThan option for
completions where it doesn't make sense to priorize matches starting with the
entered string, e.g. for URLs. In return, we get a *much* better performance
(several seconds when opening the completion).

See #531.
The-Compiler added a commit that referenced this pull request Mar 11, 2015
The-Compiler added a commit that referenced this pull request Mar 11, 2015
@The-Compiler
Copy link
Member

Whoa, this works a lot better than I thought it would, and it feels awesome to use! 👍

I also noticed some other subtle completion bugs while testing this, fixed above. Also I implemented a "dumb" sort in the CompletionFilterModel which works a lot faster (like your sort method does), which means it's not necessary to implement sort separately here. Also, it uses the first column with Role.sort as data role, so the third column can be used for a human-readable representation of the date/time.

I still want to do some other changes (e.g. the thing mentioned above) so I won't merge this yet, but I hope I'll have time for it this evening.

If you want to experiment, rebase to the current master and apply this patch on top of your branch:

commit 6674f74b07175fe17248b7890f7937dd7735f957
Author: Florian Bruhin <git@the-compiler.org>
Date:   Wed Mar 11 07:12:18 2015 +0100

    Use CompletionFilterModel's sort implementation.

diff --git a/qutebrowser/completion/completer.py b/qutebrowser/completion/completer.py
index 2f7b223..e57b3a8 100644
--- a/qutebrowser/completion/completer.py
+++ b/qutebrowser/completion/completer.py
@@ -19,7 +19,7 @@

 """Completer attached to a CompletionView."""

-from PyQt5.QtCore import pyqtSlot, QObject, QTimer
+from PyQt5.QtCore import pyqtSlot, QObject, QTimer, Qt

 from qutebrowser.config import config, configdata
 from qutebrowser.commands import cmdutils, runners
@@ -86,7 +86,8 @@ class Completer(QObject):
         self._models[usertypes.Completion.helptopic] = CFM(
             models.HelpCompletionModel(self), self)
         self._models[usertypes.Completion.url_history_and_quickmarks] = CFM(
-            models.UrlCompletionModel('url', self), self)
+            models.UrlCompletionModel('url', self), self,
+            dumb_sort=Qt.DescendingOrder)

     def _init_setting_completions(self):
         """Initialize setting completion models."""
diff --git a/qutebrowser/completion/models/completion.py b/qutebrowser/completion/models/completion.py
index cdfce48..c562f59 100644
--- a/qutebrowser/completion/models/completion.py
+++ b/qutebrowser/completion/models/completion.py
@@ -234,12 +234,6 @@ class UrlCompletionModel(base.BaseCompletionModel):
                                         WebHistoryCompletionModel.history_changed(
                                             self, e, self._histcat))

-    def sort(self, column, order=Qt.AscendingOrder):
-        # sort on atime, descending
-        # Ignoring the arguments because they are hardcoded in the CFM
-        # anyway.
-        self._histcat.sortChildren(2, Qt.DescendingOrder)
-

 class WebHistoryCompletionModel(base.BaseCompletionModel):

@@ -266,11 +260,14 @@ class WebHistoryCompletionModel(base.BaseCompletionModel):
             cat = model.new_category("History")

         for entry in histstore:
-            model.new_item(cat, entry.url, "", entry.atime)
+            atime = int(entry.atime)
+            model.new_item(cat, entry.url, "", str(atime), sort=atime)

     def history_changed(self, entry, cat):
         if entry.url:
-            self.new_item(cat, entry.url, "", str(entry.atime))
+            atime = int(entry.atime)
+            self.new_item(cat, entry.url, "", str(atime), sort=atime)
+

 class QuickmarkCompletionModel(base.BaseCompletionModel):

@The-Compiler The-Compiler self-assigned this Mar 11, 2015
@The-Compiler The-Compiler added enhancement component: ui Issues related to the user interface. labels Mar 11, 2015
@The-Compiler The-Compiler added this to the v0.2 milestone Mar 11, 2015
@The-Compiler
Copy link
Member

Some notes to myself (or you, just tell me if you start working on it!) what to check/change:

  • Is the quickmark_by_name model still needed?
  • models/completion.py probably should be split into multiple files, and the @staticmethods can be normal functions then.
  • Currently there are duplicated entries if something is in WebHistory._old_urls and in WebHistory._new_history. I'm not sure what's the best way to fix that though.
  • Should there be duplicates for URLs with different fragments? Maybe the fragment should always be stripped when adding the URL to the history? Especially for javascript-URLs with a # and empty fragment...
  • The third column can have a date in a human-readable format, ideally configurable with a new config option completion -> timestamp-format or so.

I'll also some line notes - feel free to fix them or not (then I'll do this evening hopefully). In either case, don't take it personally because I'm a perfectionist ;)

self._old_urls.add(url)
atime, url = line.rstrip().split(maxsplit=1)
# This de-duplicates history entries. We keep later ones in the
# file which usually the last ones accessed. If you want
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"which usually the last ones accessed" doesn't make sense, but I also don't understand if it was "which usually are the last ones accessed" - do you mean "in the dict"?

@The-Compiler
Copy link
Member

I just got an exception:

Traceback (most recent call last):
  File "/home/florian/proj/qutebrowser/git/qutebrowser/completion/models/completion.py", line 235, in <lambda>
    self, e, self._histcat))
  File "/home/florian/proj/qutebrowser/git/qutebrowser/completion/models/completion.py", line 269, in history_changed
    self.new_item(cat, entry.url, "", str(atime), sort=atime)
  File "/home/florian/proj/qutebrowser/git/qutebrowser/completion/models/base.py", line 82, in new_item
    idx = cat.rowCount()
RuntimeError: wrapped C/C++ object of type QStandardItem has been deleted

Log:

20:20:05 DEBUG    destroy    mainwindow:closeEvent:370 Closing window 2
20:20:05 DEBUG    destroy    webview:shutdown:309 Shutting down <qutebrowser.browser.webview.WebView url='http://pando.com/2015/03/01/internet-privacy-fund…' tab_id=5>.
20:20:05 DEBUG    misc       objreg:on_destroyed:118 schedule removal: 5
20:20:05 DEBUG    misc       objreg:on_destroyed:118 schedule removal: webview
20:20:05 DEBUG    misc       objreg:on_destroyed:118 schedule removal: hintmanager
20:20:05 DEBUG    misc       objreg:on_destroyed:118 schedule removal: 2
20:20:05 DEBUG    misc       objreg:on_destroyed:118 schedule removal: main-window
20:20:05 DEBUG    misc       objreg:on_destroyed:118 schedule removal: message-bridge
20:20:05 DEBUG    misc       objreg:on_destroyed:118 schedule removal: download-manager
20:20:05 DEBUG    misc       objreg:on_destroyed:118 schedule removal: tabbed-browser
20:20:05 DEBUG    misc       objreg:on_destroyed:118 schedule removal: statusbar
20:20:05 DEBUG    misc       objreg:on_destroyed:118 schedule removal: prompt
20:20:05 DEBUG    misc       objreg:on_destroyed:118 schedule removal: status-command
20:20:05 DEBUG    misc       objreg:on_destroyed:118 schedule removal: completion
20:20:05 DEBUG    misc       objreg:on_destroyed:118 schedule removal: completer
20:20:05 DEBUG    misc       objreg:on_destroyed:118 schedule removal: search-runner
20:20:05 DEBUG    misc       objreg:on_destroyed:118 schedule removal: mode-manager
20:20:05 DEBUG    misc       objreg:_on_destroyed:123 removed: 5
20:20:05 DEBUG    misc       objreg:_on_destroyed:123 removed: webview
20:20:05 DEBUG    misc       objreg:_on_destroyed:123 removed: hintmanager
20:20:05 DEBUG    misc       objreg:_on_destroyed:123 removed: 2
20:20:05 DEBUG    misc       objreg:_on_destroyed:123 removed: main-window
20:20:05 DEBUG    misc       objreg:_on_destroyed:123 removed: message-bridge
20:20:05 DEBUG    misc       objreg:_on_destroyed:123 removed: download-manager
20:20:05 DEBUG    misc       objreg:_on_destroyed:123 removed: tabbed-browser
20:20:05 DEBUG    misc       objreg:_on_destroyed:123 removed: statusbar
20:20:05 DEBUG    misc       objreg:_on_destroyed:123 removed: prompt
20:20:05 DEBUG    misc       objreg:_on_destroyed:123 removed: status-command
20:20:05 DEBUG    misc       objreg:_on_destroyed:123 removed: completion
20:20:05 DEBUG    misc       objreg:_on_destroyed:123 removed: completer
20:20:05 DEBUG    misc       objreg:_on_destroyed:123 removed: search-runner
20:20:05 DEBUG    misc       objreg:_on_destroyed:123 removed: mode-manager
20:20:05 DEBUG    destroy    lineparser:_prepare_save:70 Saving to /home/florian/.local/share/qutebrowser/history
20:20:05 DEBUG    destroy    lineparser:_prepare_save:70 Saving to /home/florian/.local/share/qutebrowser/cookies
20:20:30 DEBUG    ipc        ipc:handle_connection:101 Client connected.
20:20:30 DEBUG    ipc        ipc:on_ready_read:138 Read from socket: b'{"args": [], "cwd": "/home/florian"}\n'
20:20:30 DEBUG    ipc        ipc:on_ready_read:146 Processing: {"args": [], "cwd": "/home/florian"}

20:20:30 DEBUG    init       mainwindow:__init__:88 Initial mainwindow geometry: PyQt5.QtCore.QRect(50, 50, 800, 600)
20:20:30 DEBUG    init       mainwindow:__init__:93 Initializing downloads...
20:20:30 DEBUG    init       networkmanager:__init__:92 Initializing NetworkManager
20:20:30 DEBUG    init       networkmanager:__init__:97 NetworkManager init done
20:20:30 DEBUG    statusbar  bar:_hide_cmd_widget:308 Hiding cmd widget, queue: deque([])
20:20:30 DEBUG    statusbar  bar:_set_prompt_active:245 Setting prompt_active to False
20:20:30 DEBUG    statusbar  bar:_hide_prompt_widget:336 Hiding prompt widget, queue: deque([])
20:20:31 DEBUG    init       mainwindow:__init__:117 Initializing search...
20:20:31 DEBUG    init       mainwindow:__init__:122 Initializing modes...
20:20:31 DEBUG    init       app:_open_startpage:387 Opening startpage
20:20:31 DEBUG    url        urlutils:fuzzy_url:172 URL is a fuzzy address
20:20:31 DEBUG    url        urlutils:fuzzy_url:181 Converting fuzzy term https://www.duckduckgo.com to URL -> https://www.duckduckgo.com
20:20:31 DEBUG    webview    tabbedbrowser:tabopen:346 Creating new tab with URL PyQt5.QtCore.QUrl('https://www.duckduckgo.com')
20:20:31 DEBUG    init       networkmanager:__init__:92 Initializing NetworkManager
20:20:31 DEBUG    init       networkmanager:__init__:97 NetworkManager init done
20:20:31 DEBUG    webview    tabbedbrowser:_get_new_tab_idx:399 new-tab-position right -> opening new tab at -1, next left: 0 / right: 0
20:20:31 DEBUG    modes      tabbedbrowser:on_current_changed:547 Current tab changed, focusing <qutebrowser.browser.webview.WebView url='' tab_id=6>
20:20:31 DEBUG    modes      modeman:maybe_leave:93 Not in mode KeyMode.hint! (leave reason: tab changed)
20:20:31 DEBUG    modes      modeman:maybe_leave:93 Not in mode KeyMode.insert! (leave reason: tab changed)
20:20:31 DEBUG    webview    webview:openurl:323 New title: https://www.duckduckgo.com
20:20:31 DEBUG    webview    tabbedbrowser:on_title_changed:489 Changing title for idx 0 to 'https://www.duckduckgo.com'
20:20:31 DEBUG    signals    signalfilter:_filter_signals:91 emitting: cur_url_text_changed('https://www.duckduckgo.com') (tab 0)
20:20:31 DEBUG    webview    webpage:acceptNavigationRequest:547 acceptNavigationRequest, url https://www.duckduckgo.com/, type NavigationTypeOther, hint target None, open_target ClickTarget.normal
20:20:31 DEBUG    webview    webview:_set_load_status:170 load status for <qutebrowser.browser.webview.WebView url='' tab_id=6>: LoadStatus.loading
20:20:31 DEBUG    signals    signalfilter:_filter_signals:91 emitting: cur_load_status_changed('loading') (tab 0)
20:20:31 DEBUG    signals    signalfilter:_filter_signals:91 emitting: cur_load_started() (tab 0)
20:20:31 DEBUG    modes      modeman:maybe_leave:93 Not in mode KeyMode.insert! (leave reason: load started)
20:20:31 DEBUG    modes      modeman:maybe_leave:93 Not in mode KeyMode.hint! (leave reason: load started)
20:20:31 DEBUG    ipc        ipc:on_error:84 Socket error 1: QLocalSocket: Remote closed
20:20:31 DEBUG    ipc        ipc:on_disconnected:121 Client disconnected.
20:20:31 DEBUG    ipc        ipc:handle_connection:99 No new connection to handle.
20:20:31 DEBUG    webview    webpage:acceptNavigationRequest:547 acceptNavigationRequest, url https://duckduckgo.com/, type NavigationTypeOther, hint target None, open_target ClickTarget.normal
20:20:31 DEBUG    save       savemanager:mark_dirty:72 Marking history as dirty.
20:20:31 ERROR    misc       app:_exception_hook:550 Uncaught exception

@The-Compiler The-Compiler mentioned this pull request Mar 11, 2015
10 tasks
@The-Compiler
Copy link
Member

I improved/changed some things based on this PR and pushed that to the histcomplete branch. I opened a new pull request at #547 for the outstanding changes - to keep things organized let's continue discussing this there. This way, further PRs can also be made against the histcomplete branch.

Unfortunately this still needs some changes before being merged because of the exceptions and segfaults I'm seeing.

@The-Compiler The-Compiler removed this from the v0.2 milestone Mar 11, 2015
@The-Compiler The-Compiler added the component: completion Issues related to the commandline completion or history. label Jun 16, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component: completion Issues related to the commandline completion or history. component: ui Issues related to the user interface.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants