Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor user agent handling and introduce site-specific quirks #5157

Merged
merged 8 commits into from
Dec 23, 2019

Conversation

The-Compiler
Copy link
Member

This PR refactors user-agent handling so we don't directly use the default UA from QtWebKit/QtWebEngine anymore (#513). To do so, the user_agent setting now is a template string, getting filled from information we parse from the default UA.

With that change, we can also fix issues like #2800 properly, as we now can get a proper UA without having a QWebEngineProfile/QWebPage handy.

I also added some code so specific ScopedValue settings can be hidden from qute://configdiff, which fixes #4076. Finally, this allows us to introduce "site specific quirks", i.e. custom user agents for badly behaving websites such as Google, WhatsApp or Slack (#4810 and others).

Since this turned out to be a quite big change, I'd love some feedback. Looking at the commits, for most commits one of you made some changes in those areas, so I'd appreciate if you'd at least review those if time allows - thanks!

We now use a format string for the user_agent setting and parse both backend's
default user agents to get the needed information.

Fixes #513
No point in unpacking them into their components just to pack them again.
WhatsApp Web:
See #4445

Google Accounts:
See #5147

Slack:
See #4669 (With older QtWebEngine versions, Slack refuses to work, despite
it (seemingly) working fine.)

Dell Support

Google Docs:
See #4805 and https://bugreports.qt.io/browse/QTBUG-69652
Note that the default UA with Chrome/... shows a "browser not supported"
warning.

Fixes #4810
@jgkamat
Copy link
Member

jgkamat commented Dec 20, 2019 via email

@The-Compiler
Copy link
Member Author

Seems fine to me. The reason for unpacking was because id was tracked per Values, but if we track it globally that's not needed.

Ah, thanks for the explanation!

The UA parsing changes seems like a hack that will break at any change in the format, but I don't personally care (I use a constant UA and I'm not going to maintain that).

We've already had that hack (for getting the Chromium version) and other projects (e.g. Otter) had something similar for a long time - thanks to user agents being a legacy compatibility disaster, it's very unlikely to change.

I also want to take a look at some crash reports where the Chromium version parsing didn't work - from a quick look, most are with qutebrowser in a weird state, but I'm not sure what'll happen with the current code.

Is there an escape-hatch to truly use the un-altered user-agent from the backend in case the format changes?

Nope. I don't think one is needed, but I'll think about it some more.

I would personally resolve the quirk UAs before putting them in settings on startup, and just use constants for overrides. I haven't profiled the regression here yet, but reading the UA from config is one of the slowest parts of per-request work in qutebrowser, and I'm not looking forward to even more overhead here.

That's why this PR adds caching there 😉 So this only matters for the first request to whatever website, and it prevents paying a cost for quirk UAs you're never going to use.

@jgkamat
Copy link
Member

jgkamat commented Dec 20, 2019 via email

Copy link
Member

@toofar toofar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems good. I'm sure you died a little more inside while writing this though. The web is such a wonderful place.

user_agents = {
'https://web.whatsapp.com/': no_qtwe_ua,
'https://accounts.google.com/ServiceLogin': firefox_ua,
'https://*.slack.com/*': new_chrome_ua,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slack complains on Qt 5.11 (it lets you dismiss the message now though, from memory) and doesn't complain on 5.13. I've only tested briefly on 5.12 and it doesn't appear to complain there either. So if you can bother checking for 5.11 before adding the slack one I think that would be appropriate.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about this - apparently Slack is quite aggressive in blocking old Chromium versions despite them working fine (at least according to what @jgkamat said in #4669). That's why I ended up just sending Chrome/99 no matter what Qt version, so we don't need to update this a lot in the future. Maybe it'd be more reasonable to send a "real" Chrome version and only on affected Qt versions? But then we somehow need to notice when it breaks again.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose tomorrow they could decide to start blocking whatever chome versions 5.12 is based on as well.

@@ -333,6 +381,8 @@ def init(args):
global_settings = WebEngineSettings(_SettingsWrapper())
global_settings.init_settings()

_init_site_specific_quirks()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe put this in _update_settings() too in case people have quirks off by default but then run into a broken site and don't want to restart. Although I suppose turning off still wouldn't work.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's a bit confusing when turning on a setting without restart works, but turning it off requires a restart. I'm also guessing people won't disable those much - the only reason I see to disable them is for testing the quirks.

desc: 'Enable quirks (such as faked user agent headers) needed to get
specific sites to work properly.'

# emacs: '
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this do?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It fixes emacs' buggy syntax highlighting which sometimes highlights the rest of the file as a string... There's another instance or two elsewhere in the file as well.

default: null
default: 'Mozilla/5.0 ({os_info})
AppleWebKit/{webkit_version} (KHTML, like Gecko)
{qt_key}/{qt_version} {upstream_browser_key}/{upstream_browser_version}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any downside to sticking qutebrowser/{qutebrowser_version} in here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will probably break many sites that have strict parsing of the UA

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe (I will test!) but if so I would think the same was true of the QtWebEngine bit.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My goal here was to keep the QtWebEngine user agent 1:1 the same to what it was before, and keeping the QtWebKit user agent somewhat close to stock as well (it sends something like qutebrowser/1.8.9 if an application sets that version, but Qt/5.14.0 for applications which don't set anything).

if not config.val.content.site_specific_quirks:
return

# default_ua = ("Mozilla/5.0 ({os_info}) "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this comment need to stay here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to keep it, as it makes it easier to add new user agent strings below by copy-pasting this one and adjusting it in some way.

@The-Compiler
Copy link
Member Author

Note to self: music.amazon.com is broken as well, see this Reddit post. From a quick test, I only can make it show different error messages but I can't actually make it work...

@The-Compiler The-Compiler merged commit 9ab1ea0 into master Dec 23, 2019
@The-Compiler
Copy link
Member Author

I also want to take a look at some crash reports where the Chromium version parsing didn't work - from a quick look, most are with qutebrowser in a weird state, but I'm not sure what'll happen with the current code.

Turns out it can - I fixed that in dcf8d83 and merged everything. Thanks @jgkamat and @toofar for the reviews!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

qute://configdiff shows built-in content.javascript.enabled settings
3 participants