New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use page title for filename with :download #2753
Conversation
Per #2652 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also cc @Kingdread because I still think he knows the download code better than I do
qutebrowser/browser/commands.py
Outdated
@@ -1444,7 +1444,8 @@ def download(self, url=None, dest_old=None, *, mhtml_=False, dest=None): | |||
else: | |||
qnam = tab.networkaccessmanager() | |||
download_manager.get(self._current_url(), user_agent=user_agent, | |||
qnam=qnam, target=target) | |||
qnam=qnam, target=target, | |||
title=self._current_title()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think passing a title here makes get_request
needlessly complicated for only this single use case.
Instead, you should be able to construct a downloads.FileDownloadTarget
here (similar to what's done above when --dest
is given), and pass that to .get()
instead. Then there shouldn't be any changes in qtnetworkdownloads.py
needed at all.
@@ -429,7 +429,8 @@ def get_request(self, request, *, target=None, **kwargs): | |||
QNetworkRequest.AlwaysNetwork) | |||
|
|||
if request.url().scheme().lower() != 'data': | |||
suggested_fn = urlutils.filename_from_url(request.url()) | |||
suggested_fn = (utils.sanitize_filename(title) + ".html" if title |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, good job at finding utils.sanitize_filename
! I compltely forgot about it
I think this method is a bit too simplistic, as it always takes the page title, even for non-HTML pages:
(the old method produced a I think the question is: How do we determine if we got a sensible filename out of the URL, or when should we use the page title? |
Hmm, damn. This is indeed something I didn't consider, and a good point... I have two ideas:
|
I'd probably add "if there's no extension", as many modern pages don't even use extensions anymore (for example, GitHub: Alternatively, we could put this behind a flag ( |
Any update, @iordanisg? |
Whoops, didn't mean to close that one, sorry. |
@The-Compiler: your whitelist idea sounds good to me, I hope I'll be able to work something out soon (busy week so far) @Kingdread: thanks for the feedback! |
About the failing download test: a page title is created on the fly (even if the page doesn't have one) only when running the tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is getting a bit more complex than I thought originally (sorry!)
I suggest moving it into a def suggested_fn_from_title(url, title):
(or so) function in qutebrowser/browser/downloads.py
.
Could you please also add an unittest in tests/unit/browser/webkit/test_downloads.py
for all those special cases? This can probably benefit from pytest's parametrization, see e.g. test_format_seconds
in tests/unit/utils/test_utils.py
for a simple example.
qutebrowser/browser/commands.py
Outdated
ext_whitelist = [".html", ".htm", ".php", ""] | ||
_, ext = os.path.splitext(self._current_url().path()) | ||
if ext.lower() in ext_whitelist and tab.title(): | ||
suggested_fn = utils.sanitize_filename(tab.title()) + ext |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When QtWebEngine auto-generates a title, this ends up as no title.html.html
. I'd suggest checking for if not tab.title().endswith(ext):
before adding it.
And I wait until the download is finished | ||
Then the downloaded file qutebrowser.png should exist | ||
|
||
Scenario: Using :download with no URL and no page title |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like QtWebEngine autogenerates a filename while QtWebKit doesn't, which also explains that you don't see it locally. After adding some unittests (see my review comment), I'd suggest just removing this end2end test entirely (but keep the other two).
qutebrowser/browser/downloads.py
Outdated
@@ -182,6 +182,26 @@ def transform_path(path): | |||
return path | |||
|
|||
|
|||
def suggested_fn_from_title(url, title=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
url
should be called url_path
to make it clear that it isn't a full URL.
qutebrowser/browser/downloads.py
Outdated
@@ -182,6 +182,26 @@ def transform_path(path): | |||
return path | |||
|
|||
|
|||
def suggested_fn_from_title(url, title=None): | |||
"""Suggest a filename depending on the URL extension and page title. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: Add an empty line here
qutebrowser/browser/downloads.py
Outdated
url: a string with the URL path | ||
title: the page title string | ||
|
||
Returns None if the extension is not in the whitelist |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: Put this under a Return:
similar like with Args:
'Installing qutebrowser _ qutebrowser.html'), | ||
('http://qutebrowser.org/INSTALL.html.html', | ||
'Installing qutebrowser | qutebrowser', | ||
'Installing qutebrowser _ qutebrowser.html'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this test - I think to resemble the behavior with the double extension, this should have a INSTALL.html
filename (not .html.html
) and a title which ends with .html
, no?
qutebrowser/browser/downloads.py
Outdated
_, ext = os.path.splitext(url) | ||
if ext.lower() in ext_whitelist and title: | ||
suggested_fn = utils.sanitize_filename(title) | ||
if not suggested_fn.endswith(ext): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe if not suggested_fn.lower().endswith(ext.lower())
?
('http://qutebrowser.org/page-with-no-title.html', | ||
'', | ||
None), | ||
]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also add a test with an upper-case extension (to make sure the title is used), and with an upper-case .HTML
in the title (see comment above).
Looks good to me now - @Kingdread what do you think? |
qutebrowser/browser/downloads.py
Outdated
_, ext = os.path.splitext(url_path) | ||
if ext.lower() in ext_whitelist and title: | ||
suggested_fn = utils.sanitize_filename(title) | ||
if not suggested_fn.lower().endswith(ext.lower()): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the test cases and the code I assume that the .html
is not appended if it's not in the original URL? Is that intended? I'd assume a title like "qutebrowser home page" on "http://qutebrowser.org" would result in a file qutebrowser home page.html
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, won't this add .php
even for HTML files? Like you get a index.php
locally, even if it's just the produced HTML content?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, good points - maybe it's be best to check if it ends in either .htm
or .html
and if not, just append .html
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very good points, indeed. What about when there is an uppercase .HTML
or .HTM
in the title? Would it make sense to keep it that way in the filename or convert it to lower case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it'd be fine to keep it, but I don't have a strong opinion either way. I'd go for "whatever is easier to implement" for this one.
@Kingdread anything else? Otherwise, this looks ready to merge to me. |
No, no more comments |
And merged! Thanks @Kingdread for the reviews, and @iordanisg for the contribution and your patience! |
Thank you both! |
This change is