Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved test coverage #525

Merged
merged 5 commits into from Jan 15, 2014
Merged

Conversation

@alexanderlukanin13
Copy link
Contributor

@alexanderlukanin13 alexanderlukanin13 commented Jan 11, 2014

In preparation for python 3 compatibility, these unit tests improve test coverage of some modules which use urllib functions.

six is added to requirements (I assume we are going to add it anyway). Version 1.5.2 is required because there was a bug in six.moves in version 1.4.

Coverage is achieved:

scrapy/contrib/downloadermiddleware/robotstxt.py  97%
scrapy/contrib/linkextractors/htmlparser.py       96%
scrapy/contrib/linkextractors/regex.py            100%

In next PR, I plan to replace various urllib exports with six.moves.urllib.

@dangra
Copy link
Member

@dangra dangra commented Jan 14, 2014

LGTM. /cc @pablohoffman @kmike

@pablohoffman
Copy link
Member

@pablohoffman pablohoffman commented Jan 15, 2014

deploy command has been deprecated on scrapy and moved to scrapyd as a standalone command scrapyd-deploy: https://github.com/scrapy/scrapyd/blob/master/bin/scrapyd-deploy. I wouldn't introduce any more changes (even only tests) on scrapy repo and instead do it on the scrapyd repo.

The six>=1.5.2 requirement should be updated accordingly in setup.py and debian/control (yeah, it sucks having 3 places to define dependencies but that's a topic for a separate discussion).

Other than that, the change LGTM.

kmike added a commit that referenced this pull request Jan 15, 2014
Improved test coverage
@kmike kmike merged commit c92f52c into scrapy:master Jan 15, 2014
1 check passed
1 check passed
default The Travis CI build passed
Details
@kmike
Copy link
Member

@kmike kmike commented Jan 15, 2014

Yay tests!

P.S. Does anybody knows what's the difference between Build-Depends and Depends? There are some duplicated entries there, I wonder if they are needed.

@alexanderlukanin13 alexanderlukanin13 deleted the alexanderlukanin13:urllib_test_coverage branch Jan 15, 2014
# so we need to sort them for comparison
self.assertEqual(sorted(lx.extract_links(self.response), key=lambda x: x.url), [
Link(url='http://example.com/sample2.html', text=u'sample 2'),
Link(url='http://example.com/sample3.html', text=u'sample 3 repetition'),

This comment has been minimized.

@redapple

redapple Jan 30, 2014
Contributor

@alexanderlukanin13 , I've been looking at the recent Travis build failures
https://travis-ci.org/scrapy/scrapy/jobs/17914775

(working on these fixes: #570)

The last one that's bugging me is this repetition link from sgml_linkextractor.html

According to https://github.com/scrapy/scrapy/blob/master/scrapy/contrib/linkextractors/sgml.py#L50

links = unique_list(links, key=lambda link: link.url) if self.unique else links

shouldn't the extractor pick up the non-"repetition" link?

a local test on my machine passes with Link(url='http://example.com/sample3.html', text=u'sample 3 repetition') so I don't know anymore :)

This comment has been minimized.

@redapple

redapple Jan 30, 2014
Contributor

Ah yeah, this line shuffles the extracted links

        urlstext = set([(clean_url(url).encode(response_encoding), clean_text(text))
                        for url, _, text in links_text])

@dangra , @pablohoffman , @nramirezuy , @kmike ,
what's the expected behaviour of RegexLinkExtractor regarding duplicate URLs (that may have different link text)? take first one in document? random?

This comment has been minimized.

@redapple

redapple Jan 30, 2014
Contributor

FYI, fixing it as recommended by @dangra (change from set to scrapy.utils.python.unique in order to grab the first link)
redapple@73ae6b8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

5 participants