Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use page title for filename with :download #2753

Merged
merged 8 commits into from Jul 9, 2017
Merged

Use page title for filename with :download #2753

merged 8 commits into from Jul 9, 2017

Conversation

iordanisg
Copy link
Contributor

@iordanisg iordanisg commented Jun 26, 2017

This change is Reviewable

@iordanisg
Copy link
Contributor Author

Per #2652

Copy link
Member

@The-Compiler The-Compiler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also cc @Kingdread because I still think he knows the download code better than I do 😉

@@ -1444,7 +1444,8 @@ def download(self, url=None, dest_old=None, *, mhtml_=False, dest=None):
else:
qnam = tab.networkaccessmanager()
download_manager.get(self._current_url(), user_agent=user_agent,
qnam=qnam, target=target)
qnam=qnam, target=target,
title=self._current_title())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think passing a title here makes get_request needlessly complicated for only this single use case.

Instead, you should be able to construct a downloads.FileDownloadTarget here (similar to what's done above when --dest is given), and pass that to .get() instead. Then there shouldn't be any changes in qtnetworkdownloads.py needed at all.

@@ -429,7 +429,8 @@ def get_request(self, request, *, target=None, **kwargs):
QNetworkRequest.AlwaysNetwork)

if request.url().scheme().lower() != 'data':
suggested_fn = urlutils.filename_from_url(request.url())
suggested_fn = (utils.sanitize_filename(title) + ".html" if title
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, good job at finding utils.sanitize_filename! I compltely forgot about it 😆

@Kingdread
Copy link
Contributor

I think this method is a bit too simplistic, as it always takes the page title, even for non-HTML pages:

[daniel@persephone]  ~ % ls /tmp/qutebrowser-basedir-4mql1yi0/download/
'Vanamo_Logo.png (1200x1200 pixels).html'
[daniel@persephone]  ~ % file /tmp/qutebrowser-basedir-4mql1yi0/download/Vanamo_Logo.png\ \(1200x1200\ pixels\).html
/tmp/qutebrowser-basedir-4mql1yi0/download/Vanamo_Logo.png (1200x1200 pixels).html: PNG image data, 1200 x 1200, 4-bit colormap, non-interlaced

(the old method produced a Vanamo_Logo.png file, which is fine)

I think the question is: How do we determine if we got a sensible filename out of the URL, or when should we use the page title?

@The-Compiler
Copy link
Member

Hmm, damn. This is indeed something I didn't consider, and a good point...

I have two ideas:

  • Use the title when there's no path in the URL, and have a whitelist of file extensions where it makes sense to do so too (as a first approximation, .html, .htm, .php, but probably more)
  • Don't do this at all. 😟

@Kingdread
Copy link
Contributor

Kingdread commented Jun 29, 2017

I'd probably add "if there's no extension", as many modern pages don't even use extensions anymore (for example, GitHub: https://github.com/qutebrowser/qutebrowser/pull/2753).

Alternatively, we could put this behind a flag (:download --title?), but then you would need to remember to use the flag.

@The-Compiler
Copy link
Member

Any update, @iordanisg?

@The-Compiler
Copy link
Member

Whoops, didn't mean to close that one, sorry.

@The-Compiler The-Compiler reopened this Jul 5, 2017
@iordanisg
Copy link
Contributor Author

@The-Compiler: your whitelist idea sounds good to me, I hope I'll be able to work something out soon (busy week so far)

@Kingdread: thanks for the feedback!

@iordanisg
Copy link
Contributor Author

iordanisg commented Jul 6, 2017

About the failing download test: a page title is created on the fly (even if the page doesn't have one) only when running the tests.

Copy link
Member

@The-Compiler The-Compiler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is getting a bit more complex than I thought originally (sorry!)

I suggest moving it into a def suggested_fn_from_title(url, title): (or so) function in qutebrowser/browser/downloads.py.
Could you please also add an unittest in tests/unit/browser/webkit/test_downloads.py for all those special cases? This can probably benefit from pytest's parametrization, see e.g. test_format_seconds in tests/unit/utils/test_utils.py for a simple example.

ext_whitelist = [".html", ".htm", ".php", ""]
_, ext = os.path.splitext(self._current_url().path())
if ext.lower() in ext_whitelist and tab.title():
suggested_fn = utils.sanitize_filename(tab.title()) + ext
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When QtWebEngine auto-generates a title, this ends up as no title.html.html. I'd suggest checking for if not tab.title().endswith(ext): before adding it.

And I wait until the download is finished
Then the downloaded file qutebrowser.png should exist

Scenario: Using :download with no URL and no page title
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like QtWebEngine autogenerates a filename while QtWebKit doesn't, which also explains that you don't see it locally. After adding some unittests (see my review comment), I'd suggest just removing this end2end test entirely (but keep the other two).

@@ -182,6 +182,26 @@ def transform_path(path):
return path


def suggested_fn_from_title(url, title=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

url should be called url_path to make it clear that it isn't a full URL.

@@ -182,6 +182,26 @@ def transform_path(path):
return path


def suggested_fn_from_title(url, title=None):
"""Suggest a filename depending on the URL extension and page title.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: Add an empty line here

url: a string with the URL path
title: the page title string

Returns None if the extension is not in the whitelist
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: Put this under a Return: similar like with Args:

'Installing qutebrowser _ qutebrowser.html'),
('http://qutebrowser.org/INSTALL.html.html',
'Installing qutebrowser | qutebrowser',
'Installing qutebrowser _ qutebrowser.html'),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this test - I think to resemble the behavior with the double extension, this should have a INSTALL.html filename (not .html.html) and a title which ends with .html, no?

_, ext = os.path.splitext(url)
if ext.lower() in ext_whitelist and title:
suggested_fn = utils.sanitize_filename(title)
if not suggested_fn.endswith(ext):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe if not suggested_fn.lower().endswith(ext.lower())?

('http://qutebrowser.org/page-with-no-title.html',
'',
None),
])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also add a test with an upper-case extension (to make sure the title is used), and with an upper-case .HTML in the title (see comment above).

@The-Compiler
Copy link
Member

Looks good to me now - @Kingdread what do you think?

_, ext = os.path.splitext(url_path)
if ext.lower() in ext_whitelist and title:
suggested_fn = utils.sanitize_filename(title)
if not suggested_fn.lower().endswith(ext.lower()):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the test cases and the code I assume that the .html is not appended if it's not in the original URL? Is that intended? I'd assume a title like "qutebrowser home page" on "http://qutebrowser.org" would result in a file qutebrowser home page.html.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, won't this add .php even for HTML files? Like you get a index.php locally, even if it's just the produced HTML content?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, good points - maybe it's be best to check if it ends in either .htm or .html and if not, just append .html?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good points, indeed. What about when there is an uppercase .HTML or .HTM in the title? Would it make sense to keep it that way in the filename or convert it to lower case?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd be fine to keep it, but I don't have a strong opinion either way. I'd go for "whatever is easier to implement" for this one.

@The-Compiler
Copy link
Member

@Kingdread anything else? Otherwise, this looks ready to merge to me.

@Kingdread
Copy link
Contributor

No, no more comments 😛

@The-Compiler The-Compiler merged commit c9fd182 into qutebrowser:master Jul 9, 2017
@The-Compiler
Copy link
Member

And merged! Thanks @Kingdread for the reviews, and @iordanisg for the contribution and your patience! 😄

@iordanisg
Copy link
Contributor Author

Thank you both!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants