Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+1] shell command's ability to open local files + tests #1710

Merged
merged 23 commits into from Jan 29, 2016

Conversation

@redapple
Copy link
Contributor

@redapple redapple commented Jan 21, 2016

This is a continuation of @side2k's #1579

Leonid Amirov and others added 8 commits Nov 2, 2015
…(): when given URI where scheme is present, but not 'http' the function gave bad result
…ttp://example.com"; "example" requests "file://example"; "./example.com" requests "file://example.com"
Continuation of #1579
@codecov-io
Copy link

@codecov-io codecov-io commented Jan 21, 2016

Current coverage is 83.32%

Merging #1710 into master will increase coverage by +0.02% as of 904dd46

Powered by Codecov. Updated on successful CI builds.

if url.startswith(pattern):
url = any_to_uri(url)
break
url = add_http_if_no_scheme(url)

This comment has been minimized.

@kmike

kmike Jan 22, 2016
Member

I think this code should be extracted to a helper function, it will make testing easier.

redapple added 3 commits Jan 22, 2016
@redapple
Copy link
Contributor Author

@redapple redapple commented Jan 22, 2016

@alecxe , @kmike , I added some tests for various cases,
but I'm not sure current implementation from #1579 covers your usecases.
For example, it doesn't test if the argument is a file, just guesses it.
Toughts?

(sorry @nyov for wrong mention)

@kmike
Copy link
Member

@kmike kmike commented Jan 22, 2016

I think it is good not to test for a file presence; IMHO example.com should be an http url even if there is example.com file in the current folder.

I'd prefer to only support explicit relative and absolute paths: index.html and subdir/index.html will be prepended with 'http://' in this case (there could be host subdir), but ./index.html and ./subdir/index.html will use file://. In this case rules will be unambiguous.

I'm not sure the current implementation is correct either - e.g. does it handle localhost?

@redapple
Copy link
Contributor Author

@redapple redapple commented Jan 22, 2016

@kmike , I added a few different tests, including localhost,
hopefully catching the most common cases.

@kmike

This comment has been minimized.

Copy link

@kmike kmike commented on tests/test_command_shell.py in 7a51d37 Jan 22, 2016

It may be cleaner to use pytest.mark.parametrize or just write a doctest.

@redapple
Copy link
Contributor Author

@redapple redapple commented Jan 22, 2016

@kmike , importing pytest seems to mess up the subsequent tests.
any idea what might be happening?

@kmike
Copy link
Member

@kmike kmike commented Jan 22, 2016

Whoa, zero ideas. Maybe a pytest bug? In Scrapy testing suite pytest is pinned to an older version because of blocker bugs in pytest 2.8; maybe these bugs are fixed and we can upgrade pytest now?

@kmike
Copy link
Member

@kmike kmike commented Jan 22, 2016

this bug was a blocker: pytest-dev/pytest#1057 - it seems to be fixed in pytest 2.8.3

@redapple
Copy link
Contributor Author

@redapple redapple commented Jan 22, 2016

Thanks @kmike , I will give it a try.

This reverts commit 1a30a77.
@redapple
Copy link
Contributor Author

@redapple redapple commented Jan 23, 2016

@kmike, I didn't manage to import pytest without failing subsequent tests, so I left the parameterized tests at the previous commit for the moment, in order to discuss tests inputs & outputs at least

parts = urlparse(url)
if not match:
scheme = "http:" if parts.netloc else "http://"
url = scheme + url

This comment has been minimized.

@kmike

kmike Jan 23, 2016
Member

add_http_if_no_scheme function changed, but its tests were not - do you know if there are any differences in behaviour in old and new version? Not sure if it matters, but old version looks more optimized. parse_url -> urlparse is a reasonable change.

This comment has been minimized.

@redapple

redapple Jan 25, 2016
Author Contributor

@kmike , I haven't checked that part of the change.
add_http_if_no_scheme seems to be used only for scrapy shell
Tests look thorough enough for usual cases

Still, I will move parts = urlparse(url) inside if not match: branch at least

@kmike
Copy link
Member

@kmike kmike commented Jan 23, 2016

@redapple I think the implementation, inputs and outputs are all good.

@redapple redapple added this to the Scrapy 1.1 milestone Jan 25, 2016
@redapple redapple added this to the Scrapy 1.1 milestone Jan 25, 2016
@redapple
Copy link
Contributor Author

@redapple redapple commented Jan 26, 2016

@alecxe , @nramirezuy , @kmike ,
are you ok with the updated docs?

scrapy shell file:///absolute/path/to/file.html

.. warning:: :command:`shell` will interpret ``index.html`` as a domain name,
not as a relative path to a local file, and will trigger a DNS lookup error::

This comment has been minimized.

@kmike

kmike Jan 26, 2016
Member

What about making this a note, not a warning? It'd be also good to explain why does it work the way it works (just by looking at command you can tell if http or file protocol will be used). Otherwise we may get pull requests to 'fix the limitation'.

This comment has been minimized.

@nramirezuy

nramirezuy Jan 26, 2016
Contributor

Yea I agree a Note seems more situated. Maybe link the Github issue where the discussion happened ?

This comment has been minimized.

@redapple

redapple Jan 26, 2016
Author Contributor

@kmike , @nramirezuy , how does that read now?
I did not include the Github conversation link, but I believe the explanation is enough.

@kmike kmike changed the title shell command's ability to open local files + tests [MRG+1] shell command's ability to open local files + tests Jan 26, 2016
@kmike
Copy link
Member

@kmike kmike commented Jan 26, 2016

👍 perfect

@alecxe
Copy link

@alecxe alecxe commented Jan 26, 2016

Looks pretty much detailed, thanks!

@nramirezuy
Copy link
Contributor

@nramirezuy nramirezuy commented Jan 26, 2016

I like it too.
How about adding something like "Too know more about" or "For more information" and link the Github issue? Also do this in the future.

@redapple
Copy link
Contributor Author

@redapple redapple commented Jan 26, 2016

@nramirezuy , maybe just include the link in the release notes. what do you think?

@nramirezuy
Copy link
Contributor

@nramirezuy nramirezuy commented Jan 26, 2016

Yea it works, I was thinking more on linking the discussion.

@@ -13,6 +15,30 @@
from scrapy.utils.spider import spidercls_for_request, DefaultSpider


def guess_scheme(url):

This comment has been minimized.

@kmike

kmike Jan 27, 2016
Member

What do you think about moving this function to scrapy.utils.url? It may be convenient to use this function to process spider arguments or arguments of custom commands.

This comment has been minimized.

@redapple

redapple Jan 28, 2016
Author Contributor

@kmike , done.

def guess_scheme(url):
"""Given an URL as string,
returns a FileURI if it looks like a file path,
otherwise returns an HTTP URL

This comment has been minimized.

@kmike

kmike Jan 28, 2016
Member

argh, a last nitpick: pep-257 says

The docstring is a phrase ending in a period. It prescribes the function or method's effect as a command ("Do this", "Return that"), not as a description; e.g. don't write "Returns the pathname ...".

This comment has been minimized.

@redapple

redapple Jan 28, 2016
Author Contributor

Done... but I know I realized the description is wrong: if input has a scheme, it's returned as-is

This comment has been minimized.

@kmike

kmike Jan 28, 2016
Member

a good catch, we should document it

This comment has been minimized.

@redapple

redapple Jan 28, 2016
Author Contributor

@kmike , what about this?

    """Add an URL scheme if missing, returning either a File URI for
    filepath-like input or HTTP URL otherwise.
    """

This comment has been minimized.

@kmike

kmike Jan 28, 2016
Member

What about Add an URL scheme if missing: file:// for filepath-like input or http:// otherwise.?

Add an URL scheme if missing, returning either a File URI for filepath-like input or HTTP URL otherwise sounds a bit ambiguous - it is not clear if file:// or http:// url is returned always, or only if scheme is missing.

@redapple redapple force-pushed the redapple:1550-shell_file-cont branch from fee8ed6 to e9f6b98 Jan 28, 2016
from twisted.trial import unittest
from twisted.internet import defer

from scrapy.utils.testsite import SiteTest
from scrapy.utils.testproc import ProcessTest
from scrapy.utils.url import guess_scheme

This comment has been minimized.

@kmike

kmike Jan 29, 2016
Member

unused import?

This comment has been minimized.

@redapple

redapple Jan 29, 2016
Author Contributor

right... :-/

kmike added a commit that referenced this pull request Jan 29, 2016
[MRG+1] shell command's ability to open local files + tests
@kmike kmike merged commit a35aec7 into scrapy:master Jan 29, 2016
2 checks passed
2 checks passed
codecov/patch 100.00% of diff hit (target 100.00%)
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@kmike
Copy link
Member

@kmike kmike commented Jan 29, 2016

👍

@redapple redapple deleted the redapple:1550-shell_file-cont branch Jul 8, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

5 participants