Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Completion for search terms #6049

Open
The-Compiler opened this issue Jan 20, 2021 · 7 comments
Open

Completion for search terms #6049

The-Compiler opened this issue Jan 20, 2021 · 7 comments
Labels
component: completion Issues related to the commandline completion or history. priority: 3 - wishlist Issues which are not important and/or where it's unclear whether they're feasible.

Comments

@The-Compiler
Copy link
Member

Splitting this off from #32 as there's quite some discussion in there about this - cc @rcorre, @samyak-jain, @andrewcarlotti.

@samyak-jain answered to my comment:

I won't have the time to dig deeper into this at the moment, but from the back of my head:

  • If we have a search engine URL in the config, how are we going to get autocompletion for that? Probably via OpenSearch (OpenSearch support #717), but I'm not sure if there even is a standardized way?
  • If there are APIs from search engines, are we allowed to use them? Do we need API keys or something?
  • How to do this in a privacy preserving way? We want to avoid sending things to third-parties whenever possible.
  • How to do this in a performant way? Currently there's no way for completions to be updated "on the fly" (i.e. the entire completion needs to be generated before it is shown), and doing blocking HTTP requests (or even delaying showing the completion before we get search engine results) probably isn't acceptable.
  • Probably more I'm not thinking of at the moment.

Right now, I'm treating this as a "wishlist" item - if someone can reasonably convince me that it's actually worth the complexity (or not as complex as I think after all) I might accept a contribution, but it's quite debatable whether it's worth doing it all - and either way, there still are a lot of unknowns.

Thanks for your comments. Here are some of my thoughts.

  • I think that's a good idea. I've looked at a few search engines and using OpenSearch discovery seems to work well in figuring out the auto complete url. We could maybe hard code some of the popular search engines and allow users to enter their own search engines where we can autodiscover using OpenSearch. This can simplify adding search engines because we can also get the URL to search with, programatically rather than asking users to give a template url which is the way things are currently done.
  • It seems to be working without any API Keys for now. Here is a sample for google: https://google.com/complete/search?client=firefox&q=test and a sample for duckduckgo: https://duckduckgo.com/ac/?q=test&type=list. The format across search engines seems fairly consistent. Regarding if we are allowed to use them. So far, there don't seem to be any issues. Of course, they can close these for use anytime but I don't see this getting closed as long as open source browsers like Chromium have these features (since they can always be reverse engineered by someone else). I agree that this part is out of our control and the best we can hope for is that search engines don't close this. This blog by google: https://developers.google.com/search/blog/2015/07/update-on-autocomplete-api seems to suggest that they are at least aware that people are doing this. They did say they would close this feature in 2015 but it has survived thus far. Even if google closes this, I suspect most other search engines wouldn't follow through.
  • There's no avoiding 3rd party requests of course. I think the best course of action is to make this opt-in and maybe make it clear in the documentation that enabling this option would require us to send requests to 3rd party services.
  • This would certainly be a blocker because suggestions would change with each character the user types. We need the ability to asynchrously send requests and update the ui in a non blocking way. Would love to get your inputs on this. Can we run this on QThreads and update the UI? If the implementation is too complicated, are there any other issues/features that can benefit from adding this? Just to make this more "worth it".

I personally don't mind giving this a shot if you feel this is feasbile though I totally understand if you don't have the bandwidth to review features that are not a priority. The way I see it, it certainly requires some bit of effort but I think it may be worth doing this since it requires making improvements to other components as well.
Interested to know if you think this is worth pursuing or if there are any additional concerns.

and @andrewcarlotti said:

  • How to do this in a performant way? Currently there's no way for completions to be updated "on the fly" (i.e. the entire completion needs to be generated before it is shown), and doing blocking HTTP requests (or even delaying showing the completion before we get search engine results) probably isn't acceptable.

I think it would be worth resolving this anyway, since this is currently also an issue with browser history completion. If I try opening a page that triggers searching my entire browser history from disk while I'm watching a video, then the video graphics stop updating and the audio stops a few second later (playback resumes once the history search is complete).

@The-Compiler The-Compiler added component: completion Issues related to the commandline completion or history. priority: 3 - wishlist Issues which are not important and/or where it's unclear whether they're feasible. labels Jan 20, 2021
@The-Compiler The-Compiler mentioned this issue Jan 20, 2021
26 tasks
@The-Compiler
Copy link
Member Author

@samyak-jain I don't really have a clear plan on how to go about this. Maybe @rcorre can say more, but my gut feeling would be that it'd go something like this:

  • First, I think we need to have classes for completions. As long as completion functions are simple functions, I don't think we have a nice way to add a good API (like updates for existing values) to them. See Use classes rather than functions for completion models #5537 for that. I also think that'd be a quite good first contribution, if that's something you'd want to look at.
  • At that point, we can start talking about how an API for updating existing completions could look like. Actually, :open is kind of a special case because it's a collection of completion "categories" (each a Qt list model). Thinking about it some more, it might even be possible already to have such a "dynamic" model as a category (which are "full" Qt models rather than functions) - but I have no idea how exactly. Either way, I still think it'd be good to think about a good qutebrowser API around it first.
  • Then, we'll need some support for OpenSearch first I'm guessing (OpenSearch support #717)? You say "I've looked at a few search engines and using OpenSearch discovery seems to work well in figuring out the auto complete url." yet the URLs you listed seem to be unofficial APIs? Did you get them via OpenSearch? Or how exactly? If we decide to implement OpenSearch support, note that we might not have a "proper" (i.e. secure) XML parser available yet, and I'm a bit reluctant about adding another dependency just for that. Perhaps it'd need to be optional, or perhaps Qt has something.
  • Finally, everything needs to fit together somehow. As for doing the request, I don't think QThreads are needed. Something asynchronous using QNetworkAccessManager (or qutebrowser.misc.httpclient) would likely be easier and simpler.

Let me note again this is quite at the bottom of my priorities at the moment, so any review/help will likely take a lot of time - and I still reserve the right to reject PRs if I feel like this isn't worth the incurred complexity (or other costs). Not trying to discourage you if you want to move forward with this, just making clear what to expect. 🙂


@andrewcarlotti That seems mostly unrelated. The sqlite completion already does lazy loading - the problem is that there doesn't seem to be a good way to query it asynchronously, and it's very unclear (both from Qt's and sqlite's side) what's allowed to be in a separate (non-main) thread and what isn't. There might be variety of other solutions in that direction as well (see e.g. #1099), and finally there's #3989 of course. 😉

@samyak-jain
Copy link

@The-Compiler Thanks for looking into this!

Regarding #5537, I would gladly work on that issue!
Regarding OpenSearch, the Url tag of type application/x-suggestions+json contains the URL for completions.

Example of how duckduckgo's opensearch xml looks like

<?xml version="1.0" encoding="utf-8"?>
<OpenSearchDescription xmlns="http://a9.com/-/spec/opensearch/1.1/">
<ShortName>DuckDuckGo</ShortName>
<Description>Search DuckDuckGo</Description>
<InputEncoding>UTF-8</InputEncoding>
<LongName>DuckDuckGo Search</LongName>
<Image height="16" width="16"></Image>
<Url type="text/html" method="get" template="https://duckduckgo.com/?q={searchTerms}"/>
<Url type="application/x-suggestions+json" template="https://duckduckgo.com/ac/?q={searchTerms}&amp;type=list"/>
</OpenSearchDescription>

Other search engines are similar.

Regarding requests, qutebrowser.misc.httpclient will not block the main thread, is that right?

I understand if this is not a priority, you can review this issue at your own pace. Thanks!

@The-Compiler
Copy link
Member Author

Regarding #5537, I would gladly work on that issue!

Great! Could you please leave a comment over there? Then I can assign the issue to you (GitHub only shows commenters for assignees). Let's perhaps also wait for an answer from @rcorre since he mentioned he wants to pick it up, so I'm not sure if he already started work on it at some point.

Regarding OpenSearch, the Url tag of type application/x-suggestions+json contains the URL for completions.

Ah, perfect! One thing I don't understand yet: How to find the opensearch file from a website? I thought there would be something like a <link rel="search", ...> but I can't find it? Surely I'm missing something. 🙂

Regarding requests, qutebrowser.misc.httpclient will not block the main thread, is that right?

Exactly! It does the request in Qt's main loop asynchronously and then triggers the success (or error) signal.

@samyak-jain
Copy link

Regarding how to get the opensearch file from the website, there doesn't seem to be a perfect solution to this unless I'm missing something.

According to the OpenSearch spec, the website is supposed to have the following inside the <head>, something like:

<link rel="search" href="/opensearch.xml" type="application/opensearchdescription+xml" title="Name of the search">

The problem is not all search engines seem to comply with this. Notable google doesn't. Duckduckgo however does.

It looks like the approach most browsers are taking here is:

  1. Hard code the URLs for some of the most common search engines like google. For example, Google's opensearch xml can be found here: https://www.google.com/searchdomaincheck?format=opensearch
  2. Search for opensearchdescription in the <head> tag of the search engine home page
  3. Allow people to manually enter the XML URL or the value for the completions URL itself.

I think implementing these 3 approaches should be enough.

Regarding xml parsing, your concerns are fair. The official python docs seem to recommend https://pypi.org/project/defusedxml/.

Not sure why this is not figured out in the stdlib, smh.

@rcorre
Copy link

rcorre commented Jan 20, 2021

Qt Core appears to have an XML parser: https://doc.qt.io/qt-5/qxmlstreamreader.html. I couldn't easily find if it suffers from the same vulnerabilities as the stdlib. I'm also not sure how scared we should be of such vulnerabilities. Based on https://docs.python.org/3/library/xml.html#xml-vulnerabilities, the stdlib is safe against DTD/external entity expansion, which I think are the scary ones on a local machine. Billion laughs/exponential blowup are scary for a server, but probably just annoying for a local machine. If we're fetching XML from a limited set of trusted sources, the only risk comes from an attacker either taking over the trusted domain or redirecting you from it (e.g. via DNS poisoning), and an attacker with that capability can do far worse things through your browser than burn some CPU :)

@The-Compiler
Copy link
Member Author

Agreed! But let's take this one over to #717 to keep things organized a bit 🙂

@The-Compiler
Copy link
Member Author

Another idea from IRC: Simply saving and suggesting prior search terms after specifying a search engine in :open (e.g. after :open -t gmaps new york, the completion after :open -t gmaps would suggest that). Adding that one to #32 since it's unrelated to the whole OpenSearch / suggestion API topic.

As for the completions, over in #6066, @leo848 suggested looking at SearX completion implementations, which might be a good starting point for the most common engines.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component: completion Issues related to the commandline completion or history. priority: 3 - wishlist Issues which are not important and/or where it's unclear whether they're feasible.
Projects
None yet
Development

No branches or pull requests

3 participants