New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use page title for filename with :download #2753

Merged
merged 8 commits into from Jul 9, 2017

Conversation

Projects
None yet
4 participants
@iordanisg
Contributor

iordanisg commented Jun 26, 2017

This change is Reviewable

@iordanisg

This comment has been minimized.

Contributor

iordanisg commented Jun 26, 2017

Per #2652

@The-Compiler

also cc @Kingdread because I still think he knows the download code better than I do 😉

@@ -1444,7 +1444,8 @@ def download(self, url=None, dest_old=None, *, mhtml_=False, dest=None):
else:
qnam = tab.networkaccessmanager()
download_manager.get(self._current_url(), user_agent=user_agent,
qnam=qnam, target=target)
qnam=qnam, target=target,
title=self._current_title())

This comment has been minimized.

@The-Compiler

The-Compiler Jun 26, 2017

Collaborator

I think passing a title here makes get_request needlessly complicated for only this single use case.

Instead, you should be able to construct a downloads.FileDownloadTarget here (similar to what's done above when --dest is given), and pass that to .get() instead. Then there shouldn't be any changes in qtnetworkdownloads.py needed at all.

@@ -429,7 +429,8 @@ def get_request(self, request, *, target=None, **kwargs):
QNetworkRequest.AlwaysNetwork)
if request.url().scheme().lower() != 'data':
suggested_fn = urlutils.filename_from_url(request.url())
suggested_fn = (utils.sanitize_filename(title) + ".html" if title

This comment has been minimized.

@The-Compiler

The-Compiler Jun 26, 2017

Collaborator

Oh, good job at finding utils.sanitize_filename! I compltely forgot about it 😆

iordanisg added some commits Jun 26, 2017

@Kingdread

This comment has been minimized.

Collaborator

Kingdread commented Jun 28, 2017

I think this method is a bit too simplistic, as it always takes the page title, even for non-HTML pages:

[daniel@persephone]  ~ % ls /tmp/qutebrowser-basedir-4mql1yi0/download/
'Vanamo_Logo.png (1200x1200 pixels).html'
[daniel@persephone]  ~ % file /tmp/qutebrowser-basedir-4mql1yi0/download/Vanamo_Logo.png\ \(1200x1200\ pixels\).html
/tmp/qutebrowser-basedir-4mql1yi0/download/Vanamo_Logo.png (1200x1200 pixels).html: PNG image data, 1200 x 1200, 4-bit colormap, non-interlaced

(the old method produced a Vanamo_Logo.png file, which is fine)

I think the question is: How do we determine if we got a sensible filename out of the URL, or when should we use the page title?

@The-Compiler

This comment has been minimized.

Collaborator

The-Compiler commented Jun 28, 2017

Hmm, damn. This is indeed something I didn't consider, and a good point...

I have two ideas:

  • Use the title when there's no path in the URL, and have a whitelist of file extensions where it makes sense to do so too (as a first approximation, .html, .htm, .php, but probably more)
  • Don't do this at all. 😟
@Kingdread

This comment has been minimized.

Collaborator

Kingdread commented Jun 29, 2017

I'd probably add "if there's no extension", as many modern pages don't even use extensions anymore (for example, GitHub: https://github.com/qutebrowser/qutebrowser/pull/2753).

Alternatively, we could put this behind a flag (:download --title?), but then you would need to remember to use the flag.

@The-Compiler

This comment has been minimized.

Collaborator

The-Compiler commented Jul 5, 2017

Any update, @iordanisg?

@The-Compiler

This comment has been minimized.

Collaborator

The-Compiler commented Jul 5, 2017

Whoops, didn't mean to close that one, sorry.

@The-Compiler The-Compiler reopened this Jul 5, 2017

@iordanisg

This comment has been minimized.

Contributor

iordanisg commented Jul 5, 2017

@The-Compiler: your whitelist idea sounds good to me, I hope I'll be able to work something out soon (busy week so far)

@Kingdread: thanks for the feedback!

@iordanisg

This comment has been minimized.

Contributor

iordanisg commented Jul 6, 2017

About the failing download test: a page title is created on the fly (even if the page doesn't have one) only when running the tests.

@The-Compiler

This is getting a bit more complex than I thought originally (sorry!)

I suggest moving it into a def suggested_fn_from_title(url, title): (or so) function in qutebrowser/browser/downloads.py.
Could you please also add an unittest in tests/unit/browser/webkit/test_downloads.py for all those special cases? This can probably benefit from pytest's parametrization, see e.g. test_format_seconds in tests/unit/utils/test_utils.py for a simple example.

ext_whitelist = [".html", ".htm", ".php", ""]
_, ext = os.path.splitext(self._current_url().path())
if ext.lower() in ext_whitelist and tab.title():
suggested_fn = utils.sanitize_filename(tab.title()) + ext

This comment has been minimized.

@The-Compiler

The-Compiler Jul 6, 2017

Collaborator

When QtWebEngine auto-generates a title, this ends up as no title.html.html. I'd suggest checking for if not tab.title().endswith(ext): before adding it.

And I wait until the download is finished
Then the downloaded file qutebrowser.png should exist
Scenario: Using :download with no URL and no page title

This comment has been minimized.

@The-Compiler

The-Compiler Jul 6, 2017

Collaborator

It looks like QtWebEngine autogenerates a filename while QtWebKit doesn't, which also explains that you don't see it locally. After adding some unittests (see my review comment), I'd suggest just removing this end2end test entirely (but keep the other two).

@@ -182,6 +182,26 @@ def transform_path(path):
return path
def suggested_fn_from_title(url, title=None):

This comment has been minimized.

@The-Compiler

The-Compiler Jul 6, 2017

Collaborator

url should be called url_path to make it clear that it isn't a full URL.

@@ -182,6 +182,26 @@ def transform_path(path):
return path
def suggested_fn_from_title(url, title=None):
"""Suggest a filename depending on the URL extension and page title.

This comment has been minimized.

@The-Compiler

The-Compiler Jul 6, 2017

Collaborator

nitpick: Add an empty line here

url: a string with the URL path
title: the page title string
Returns None if the extension is not in the whitelist

This comment has been minimized.

@The-Compiler

The-Compiler Jul 6, 2017

Collaborator

nitpick: Put this under a Return: similar like with Args:

'Installing qutebrowser _ qutebrowser.html'),
('http://qutebrowser.org/INSTALL.html.html',
'Installing qutebrowser | qutebrowser',
'Installing qutebrowser _ qutebrowser.html'),

This comment has been minimized.

@The-Compiler

The-Compiler Jul 6, 2017

Collaborator

I don't understand this test - I think to resemble the behavior with the double extension, this should have a INSTALL.html filename (not .html.html) and a title which ends with .html, no?

_, ext = os.path.splitext(url)
if ext.lower() in ext_whitelist and title:
suggested_fn = utils.sanitize_filename(title)
if not suggested_fn.endswith(ext):

This comment has been minimized.

@The-Compiler

The-Compiler Jul 6, 2017

Collaborator

Maybe if not suggested_fn.lower().endswith(ext.lower())?

('http://qutebrowser.org/page-with-no-title.html',
'',
None),
])

This comment has been minimized.

@The-Compiler

The-Compiler Jul 6, 2017

Collaborator

Please also add a test with an upper-case extension (to make sure the title is used), and with an upper-case .HTML in the title (see comment above).

@The-Compiler

This comment has been minimized.

Collaborator

The-Compiler commented Jul 6, 2017

Looks good to me now - @Kingdread what do you think?

_, ext = os.path.splitext(url_path)
if ext.lower() in ext_whitelist and title:
suggested_fn = utils.sanitize_filename(title)
if not suggested_fn.lower().endswith(ext.lower()):

This comment has been minimized.

@Kingdread

Kingdread Jul 8, 2017

Collaborator

From the test cases and the code I assume that the .html is not appended if it's not in the original URL? Is that intended? I'd assume a title like "qutebrowser home page" on "http://qutebrowser.org" would result in a file qutebrowser home page.html.

This comment has been minimized.

@Kingdread

Kingdread Jul 8, 2017

Collaborator

Also, won't this add .php even for HTML files? Like you get a index.php locally, even if it's just the produced HTML content?

This comment has been minimized.

@The-Compiler

The-Compiler Jul 8, 2017

Collaborator

Hmm, good points - maybe it's be best to check if it ends in either .htm or .html and if not, just append .html?

This comment has been minimized.

@iordanisg

iordanisg Jul 8, 2017

Contributor

Very good points, indeed. What about when there is an uppercase .HTML or .HTM in the title? Would it make sense to keep it that way in the filename or convert it to lower case?

This comment has been minimized.

@The-Compiler

The-Compiler Jul 8, 2017

Collaborator

I think it'd be fine to keep it, but I don't have a strong opinion either way. I'd go for "whatever is easier to implement" for this one.

@The-Compiler

This comment has been minimized.

Collaborator

The-Compiler commented Jul 8, 2017

@Kingdread anything else? Otherwise, this looks ready to merge to me.

@Kingdread

This comment has been minimized.

Collaborator

Kingdread commented Jul 9, 2017

No, no more comments 😛

@The-Compiler The-Compiler merged commit c9fd182 into qutebrowser:master Jul 9, 2017

@The-Compiler

This comment has been minimized.

Collaborator

The-Compiler commented Jul 9, 2017

And merged! Thanks @Kingdread for the reviews, and @iordanisg for the contribution and your patience! 😄

@iordanisg

This comment has been minimized.

Contributor

iordanisg commented Jul 9, 2017

Thank you both!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment