Improved test coverage #525
Conversation
LGTM. /cc @pablohoffman @kmike |
deploy command has been deprecated on scrapy and moved to scrapyd as a standalone command The Other than that, the change LGTM. |
Yay tests! P.S. Does anybody knows what's the difference between Build-Depends and Depends? There are some duplicated entries there, I wonder if they are needed. |
# so we need to sort them for comparison | ||
self.assertEqual(sorted(lx.extract_links(self.response), key=lambda x: x.url), [ | ||
Link(url='http://example.com/sample2.html', text=u'sample 2'), | ||
Link(url='http://example.com/sample3.html', text=u'sample 3 repetition'), |
redapple
Jan 30, 2014
Contributor
@alexanderlukanin13 , I've been looking at the recent Travis build failures
https://travis-ci.org/scrapy/scrapy/jobs/17914775
(working on these fixes: #570)
The last one that's bugging me is this repetition link from sgml_linkextractor.html
According to https://github.com/scrapy/scrapy/blob/master/scrapy/contrib/linkextractors/sgml.py#L50
links = unique_list(links, key=lambda link: link.url) if self.unique else links
shouldn't the extractor pick up the non-"repetition" link?
a local test on my machine passes with Link(url='http://example.com/sample3.html', text=u'sample 3 repetition')
so I don't know anymore :)
@alexanderlukanin13 , I've been looking at the recent Travis build failures
https://travis-ci.org/scrapy/scrapy/jobs/17914775
(working on these fixes: #570)
The last one that's bugging me is this repetition link from sgml_linkextractor.html
According to https://github.com/scrapy/scrapy/blob/master/scrapy/contrib/linkextractors/sgml.py#L50
links = unique_list(links, key=lambda link: link.url) if self.unique else links
shouldn't the extractor pick up the non-"repetition" link?
a local test on my machine passes with Link(url='http://example.com/sample3.html', text=u'sample 3 repetition')
so I don't know anymore :)
redapple
Jan 30, 2014
Contributor
Ah yeah, this line shuffles the extracted links
urlstext = set([(clean_url(url).encode(response_encoding), clean_text(text))
for url, _, text in links_text])
@dangra , @pablohoffman , @nramirezuy , @kmike ,
what's the expected behaviour of RegexLinkExtractor
regarding duplicate URLs (that may have different link text)? take first one in document? random?
Ah yeah, this line shuffles the extracted links
urlstext = set([(clean_url(url).encode(response_encoding), clean_text(text))
for url, _, text in links_text])
@dangra , @pablohoffman , @nramirezuy , @kmike ,
what's the expected behaviour of RegexLinkExtractor
regarding duplicate URLs (that may have different link text)? take first one in document? random?
redapple
Jan 30, 2014
Contributor
FYI, fixing it as recommended by @dangra (change from set
to scrapy.utils.python.unique
in order to grab the first link)
redapple@73ae6b8
FYI, fixing it as recommended by @dangra (change from set
to scrapy.utils.python.unique
in order to grab the first link)
redapple@73ae6b8
In preparation for python 3 compatibility, these unit tests improve test coverage of some modules which use
urllib
functions.six
is added to requirements (I assume we are going to add it anyway). Version 1.5.2 is required because there was a bug insix.moves
in version 1.4.Coverage is achieved:
In next PR, I plan to replace various
urllib
exports withsix.moves.urllib
.