Use page title for filename with :download #2753
Conversation
Per #2652 |
also cc @Kingdread because I still think he knows the download code better than I do |
@@ -1444,7 +1444,8 @@ def download(self, url=None, dest_old=None, *, mhtml_=False, dest=None): | |||
else: | |||
qnam = tab.networkaccessmanager() | |||
download_manager.get(self._current_url(), user_agent=user_agent, | |||
qnam=qnam, target=target) | |||
qnam=qnam, target=target, | |||
title=self._current_title()) |
The-Compiler
Jun 26, 2017
Member
I think passing a title here makes get_request
needlessly complicated for only this single use case.
Instead, you should be able to construct a downloads.FileDownloadTarget
here (similar to what's done above when --dest
is given), and pass that to .get()
instead. Then there shouldn't be any changes in qtnetworkdownloads.py
needed at all.
I think passing a title here makes get_request
needlessly complicated for only this single use case.
Instead, you should be able to construct a downloads.FileDownloadTarget
here (similar to what's done above when --dest
is given), and pass that to .get()
instead. Then there shouldn't be any changes in qtnetworkdownloads.py
needed at all.
@@ -429,7 +429,8 @@ def get_request(self, request, *, target=None, **kwargs): | |||
QNetworkRequest.AlwaysNetwork) | |||
|
|||
if request.url().scheme().lower() != 'data': | |||
suggested_fn = urlutils.filename_from_url(request.url()) | |||
suggested_fn = (utils.sanitize_filename(title) + ".html" if title |
The-Compiler
Jun 26, 2017
Member
Oh, good job at finding utils.sanitize_filename
! I compltely forgot about it 😆
Oh, good job at finding utils.sanitize_filename
! I compltely forgot about it
I think this method is a bit too simplistic, as it always takes the page title, even for non-HTML pages:
(the old method produced a I think the question is: How do we determine if we got a sensible filename out of the URL, or when should we use the page title? |
Hmm, damn. This is indeed something I didn't consider, and a good point... I have two ideas:
|
I'd probably add "if there's no extension", as many modern pages don't even use extensions anymore (for example, GitHub: Alternatively, we could put this behind a flag ( |
Any update, @iordanisg? |
Whoops, didn't mean to close that one, sorry. |
@The-Compiler: your whitelist idea sounds good to me, I hope I'll be able to work something out soon (busy week so far) @Kingdread: thanks for the feedback! |
About the failing download test: a page title is created on the fly (even if the page doesn't have one) only when running the tests. |
This is getting a bit more complex than I thought originally (sorry!) I suggest moving it into a |
ext_whitelist = [".html", ".htm", ".php", ""] | ||
_, ext = os.path.splitext(self._current_url().path()) | ||
if ext.lower() in ext_whitelist and tab.title(): | ||
suggested_fn = utils.sanitize_filename(tab.title()) + ext |
The-Compiler
Jul 6, 2017
Member
When QtWebEngine auto-generates a title, this ends up as no title.html.html
. I'd suggest checking for if not tab.title().endswith(ext):
before adding it.
When QtWebEngine auto-generates a title, this ends up as no title.html.html
. I'd suggest checking for if not tab.title().endswith(ext):
before adding it.
And I wait until the download is finished | ||
Then the downloaded file qutebrowser.png should exist | ||
|
||
Scenario: Using :download with no URL and no page title |
The-Compiler
Jul 6, 2017
Member
It looks like QtWebEngine autogenerates a filename while QtWebKit doesn't, which also explains that you don't see it locally. After adding some unittests (see my review comment), I'd suggest just removing this end2end test entirely (but keep the other two).
It looks like QtWebEngine autogenerates a filename while QtWebKit doesn't, which also explains that you don't see it locally. After adding some unittests (see my review comment), I'd suggest just removing this end2end test entirely (but keep the other two).
@@ -182,6 +182,26 @@ def transform_path(path): | |||
return path | |||
|
|||
|
|||
def suggested_fn_from_title(url, title=None): |
The-Compiler
Jul 6, 2017
Member
url
should be called url_path
to make it clear that it isn't a full URL.
url
should be called url_path
to make it clear that it isn't a full URL.
@@ -182,6 +182,26 @@ def transform_path(path): | |||
return path | |||
|
|||
|
|||
def suggested_fn_from_title(url, title=None): | |||
"""Suggest a filename depending on the URL extension and page title. |
The-Compiler
Jul 6, 2017
Member
nitpick: Add an empty line here
nitpick: Add an empty line here
url: a string with the URL path | ||
title: the page title string | ||
Returns None if the extension is not in the whitelist |
The-Compiler
Jul 6, 2017
Member
nitpick: Put this under a Return:
similar like with Args:
nitpick: Put this under a Return:
similar like with Args:
'Installing qutebrowser _ qutebrowser.html'), | ||
('http://qutebrowser.org/INSTALL.html.html', | ||
'Installing qutebrowser | qutebrowser', | ||
'Installing qutebrowser _ qutebrowser.html'), |
The-Compiler
Jul 6, 2017
Member
I don't understand this test - I think to resemble the behavior with the double extension, this should have a INSTALL.html
filename (not .html.html
) and a title which ends with .html
, no?
I don't understand this test - I think to resemble the behavior with the double extension, this should have a INSTALL.html
filename (not .html.html
) and a title which ends with .html
, no?
_, ext = os.path.splitext(url) | ||
if ext.lower() in ext_whitelist and title: | ||
suggested_fn = utils.sanitize_filename(title) | ||
if not suggested_fn.endswith(ext): |
The-Compiler
Jul 6, 2017
Member
Maybe if not suggested_fn.lower().endswith(ext.lower())
?
Maybe if not suggested_fn.lower().endswith(ext.lower())
?
('http://qutebrowser.org/page-with-no-title.html', | ||
'', | ||
None), | ||
]) |
The-Compiler
Jul 6, 2017
Member
Please also add a test with an upper-case extension (to make sure the title is used), and with an upper-case .HTML
in the title (see comment above).
Please also add a test with an upper-case extension (to make sure the title is used), and with an upper-case .HTML
in the title (see comment above).
Looks good to me now - @Kingdread what do you think? |
_, ext = os.path.splitext(url_path) | ||
if ext.lower() in ext_whitelist and title: | ||
suggested_fn = utils.sanitize_filename(title) | ||
if not suggested_fn.lower().endswith(ext.lower()): |
Kingdread
Jul 8, 2017
Contributor
From the test cases and the code I assume that the .html
is not appended if it's not in the original URL? Is that intended? I'd assume a title like "qutebrowser home page" on "http://qutebrowser.org" would result in a file qutebrowser home page.html
.
From the test cases and the code I assume that the .html
is not appended if it's not in the original URL? Is that intended? I'd assume a title like "qutebrowser home page" on "http://qutebrowser.org" would result in a file qutebrowser home page.html
.
Kingdread
Jul 8, 2017
Contributor
Also, won't this add .php
even for HTML files? Like you get a index.php
locally, even if it's just the produced HTML content?
Also, won't this add .php
even for HTML files? Like you get a index.php
locally, even if it's just the produced HTML content?
The-Compiler
Jul 8, 2017
Member
Hmm, good points - maybe it's be best to check if it ends in either .htm
or .html
and if not, just append .html
?
Hmm, good points - maybe it's be best to check if it ends in either .htm
or .html
and if not, just append .html
?
iordanisg
Jul 8, 2017
Author
Contributor
Very good points, indeed. What about when there is an uppercase .HTML
or .HTM
in the title? Would it make sense to keep it that way in the filename or convert it to lower case?
Very good points, indeed. What about when there is an uppercase .HTML
or .HTM
in the title? Would it make sense to keep it that way in the filename or convert it to lower case?
The-Compiler
Jul 8, 2017
Member
I think it'd be fine to keep it, but I don't have a strong opinion either way. I'd go for "whatever is easier to implement" for this one.
I think it'd be fine to keep it, but I don't have a strong opinion either way. I'd go for "whatever is easier to implement" for this one.
@Kingdread anything else? Otherwise, this looks ready to merge to me. |
No, no more comments |
And merged! Thanks @Kingdread for the reviews, and @iordanisg for the contribution and your patience! |
Thank you both! |
This change is