Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use page title for filename with :download #2753

Merged
merged 8 commits into from Jul 9, 2017

Conversation

@iordanisg
Copy link
Contributor

@iordanisg iordanisg commented Jun 26, 2017

This change is Reviewable

@iordanisg
Copy link
Contributor Author

@iordanisg iordanisg commented Jun 26, 2017

Per #2652

Copy link
Member

@The-Compiler The-Compiler left a comment

also cc @Kingdread because I still think he knows the download code better than I do 😉

@@ -1444,7 +1444,8 @@ def download(self, url=None, dest_old=None, *, mhtml_=False, dest=None):
else:
qnam = tab.networkaccessmanager()
download_manager.get(self._current_url(), user_agent=user_agent,
qnam=qnam, target=target)
qnam=qnam, target=target,
title=self._current_title())

This comment has been minimized.

@The-Compiler

The-Compiler Jun 26, 2017
Member

I think passing a title here makes get_request needlessly complicated for only this single use case.

Instead, you should be able to construct a downloads.FileDownloadTarget here (similar to what's done above when --dest is given), and pass that to .get() instead. Then there shouldn't be any changes in qtnetworkdownloads.py needed at all.

@@ -429,7 +429,8 @@ def get_request(self, request, *, target=None, **kwargs):
QNetworkRequest.AlwaysNetwork)

if request.url().scheme().lower() != 'data':
suggested_fn = urlutils.filename_from_url(request.url())
suggested_fn = (utils.sanitize_filename(title) + ".html" if title

This comment has been minimized.

@The-Compiler

The-Compiler Jun 26, 2017
Member

Oh, good job at finding utils.sanitize_filename! I compltely forgot about it 😆

iordanisg added 3 commits Jun 26, 2017
@Kingdread
Copy link
Contributor

@Kingdread Kingdread commented Jun 28, 2017

I think this method is a bit too simplistic, as it always takes the page title, even for non-HTML pages:

[daniel@persephone]  ~ % ls /tmp/qutebrowser-basedir-4mql1yi0/download/
'Vanamo_Logo.png (1200x1200 pixels).html'
[daniel@persephone]  ~ % file /tmp/qutebrowser-basedir-4mql1yi0/download/Vanamo_Logo.png\ \(1200x1200\ pixels\).html
/tmp/qutebrowser-basedir-4mql1yi0/download/Vanamo_Logo.png (1200x1200 pixels).html: PNG image data, 1200 x 1200, 4-bit colormap, non-interlaced

(the old method produced a Vanamo_Logo.png file, which is fine)

I think the question is: How do we determine if we got a sensible filename out of the URL, or when should we use the page title?

@The-Compiler
Copy link
Member

@The-Compiler The-Compiler commented Jun 28, 2017

Hmm, damn. This is indeed something I didn't consider, and a good point...

I have two ideas:

  • Use the title when there's no path in the URL, and have a whitelist of file extensions where it makes sense to do so too (as a first approximation, .html, .htm, .php, but probably more)
  • Don't do this at all. 😟
@Kingdread
Copy link
Contributor

@Kingdread Kingdread commented Jun 29, 2017

I'd probably add "if there's no extension", as many modern pages don't even use extensions anymore (for example, GitHub: https://github.com/qutebrowser/qutebrowser/pull/2753).

Alternatively, we could put this behind a flag (:download --title?), but then you would need to remember to use the flag.

@The-Compiler
Copy link
Member

@The-Compiler The-Compiler commented Jul 5, 2017

Any update, @iordanisg?

@The-Compiler
Copy link
Member

@The-Compiler The-Compiler commented Jul 5, 2017

Whoops, didn't mean to close that one, sorry.

@The-Compiler The-Compiler reopened this Jul 5, 2017
@iordanisg
Copy link
Contributor Author

@iordanisg iordanisg commented Jul 5, 2017

@The-Compiler: your whitelist idea sounds good to me, I hope I'll be able to work something out soon (busy week so far)

@Kingdread: thanks for the feedback!

@iordanisg
Copy link
Contributor Author

@iordanisg iordanisg commented Jul 6, 2017

About the failing download test: a page title is created on the fly (even if the page doesn't have one) only when running the tests.

Copy link
Member

@The-Compiler The-Compiler left a comment

This is getting a bit more complex than I thought originally (sorry!)

I suggest moving it into a def suggested_fn_from_title(url, title): (or so) function in qutebrowser/browser/downloads.py.
Could you please also add an unittest in tests/unit/browser/webkit/test_downloads.py for all those special cases? This can probably benefit from pytest's parametrization, see e.g. test_format_seconds in tests/unit/utils/test_utils.py for a simple example.

ext_whitelist = [".html", ".htm", ".php", ""]
_, ext = os.path.splitext(self._current_url().path())
if ext.lower() in ext_whitelist and tab.title():
suggested_fn = utils.sanitize_filename(tab.title()) + ext

This comment has been minimized.

@The-Compiler

The-Compiler Jul 6, 2017
Member

When QtWebEngine auto-generates a title, this ends up as no title.html.html. I'd suggest checking for if not tab.title().endswith(ext): before adding it.

And I wait until the download is finished
Then the downloaded file qutebrowser.png should exist

Scenario: Using :download with no URL and no page title

This comment has been minimized.

@The-Compiler

The-Compiler Jul 6, 2017
Member

It looks like QtWebEngine autogenerates a filename while QtWebKit doesn't, which also explains that you don't see it locally. After adding some unittests (see my review comment), I'd suggest just removing this end2end test entirely (but keep the other two).

@@ -182,6 +182,26 @@ def transform_path(path):
return path


def suggested_fn_from_title(url, title=None):

This comment has been minimized.

@The-Compiler

The-Compiler Jul 6, 2017
Member

url should be called url_path to make it clear that it isn't a full URL.

@@ -182,6 +182,26 @@ def transform_path(path):
return path


def suggested_fn_from_title(url, title=None):
"""Suggest a filename depending on the URL extension and page title.

This comment has been minimized.

@The-Compiler

The-Compiler Jul 6, 2017
Member

nitpick: Add an empty line here

url: a string with the URL path
title: the page title string
Returns None if the extension is not in the whitelist

This comment has been minimized.

@The-Compiler

The-Compiler Jul 6, 2017
Member

nitpick: Put this under a Return: similar like with Args:

'Installing qutebrowser _ qutebrowser.html'),
('http://qutebrowser.org/INSTALL.html.html',
'Installing qutebrowser | qutebrowser',
'Installing qutebrowser _ qutebrowser.html'),

This comment has been minimized.

@The-Compiler

The-Compiler Jul 6, 2017
Member

I don't understand this test - I think to resemble the behavior with the double extension, this should have a INSTALL.html filename (not .html.html) and a title which ends with .html, no?

_, ext = os.path.splitext(url)
if ext.lower() in ext_whitelist and title:
suggested_fn = utils.sanitize_filename(title)
if not suggested_fn.endswith(ext):

This comment has been minimized.

@The-Compiler

The-Compiler Jul 6, 2017
Member

Maybe if not suggested_fn.lower().endswith(ext.lower())?

('http://qutebrowser.org/page-with-no-title.html',
'',
None),
])

This comment has been minimized.

@The-Compiler

The-Compiler Jul 6, 2017
Member

Please also add a test with an upper-case extension (to make sure the title is used), and with an upper-case .HTML in the title (see comment above).

@The-Compiler
Copy link
Member

@The-Compiler The-Compiler commented Jul 6, 2017

Looks good to me now - @Kingdread what do you think?

_, ext = os.path.splitext(url_path)
if ext.lower() in ext_whitelist and title:
suggested_fn = utils.sanitize_filename(title)
if not suggested_fn.lower().endswith(ext.lower()):

This comment has been minimized.

@Kingdread

Kingdread Jul 8, 2017
Contributor

From the test cases and the code I assume that the .html is not appended if it's not in the original URL? Is that intended? I'd assume a title like "qutebrowser home page" on "http://qutebrowser.org" would result in a file qutebrowser home page.html.

This comment has been minimized.

@Kingdread

Kingdread Jul 8, 2017
Contributor

Also, won't this add .php even for HTML files? Like you get a index.php locally, even if it's just the produced HTML content?

This comment has been minimized.

@The-Compiler

The-Compiler Jul 8, 2017
Member

Hmm, good points - maybe it's be best to check if it ends in either .htm or .html and if not, just append .html?

This comment has been minimized.

@iordanisg

iordanisg Jul 8, 2017
Author Contributor

Very good points, indeed. What about when there is an uppercase .HTML or .HTM in the title? Would it make sense to keep it that way in the filename or convert it to lower case?

This comment has been minimized.

@The-Compiler

The-Compiler Jul 8, 2017
Member

I think it'd be fine to keep it, but I don't have a strong opinion either way. I'd go for "whatever is easier to implement" for this one.

@The-Compiler
Copy link
Member

@The-Compiler The-Compiler commented Jul 8, 2017

@Kingdread anything else? Otherwise, this looks ready to merge to me.

@Kingdread
Copy link
Contributor

@Kingdread Kingdread commented Jul 9, 2017

No, no more comments 😛

@The-Compiler The-Compiler merged commit c9fd182 into qutebrowser:master Jul 9, 2017
@The-Compiler
Copy link
Member

@The-Compiler The-Compiler commented Jul 9, 2017

And merged! Thanks @Kingdread for the reviews, and @iordanisg for the contribution and your patience! 😄

@iordanisg
Copy link
Contributor Author

@iordanisg iordanisg commented Jul 9, 2017

Thank you both!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

4 participants
You can’t perform that action at this time.