Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+1] Enable genspider command outside project folder #2052

Merged
merged 3 commits into from
Jul 6, 2016
Merged

[MRG+1] Enable genspider command outside project folder #2052

merged 3 commits into from
Jul 6, 2016

Conversation

stummjr
Copy link
Member

@stummjr stummjr commented Jun 12, 2016

This PR enables the genspider CLI command even when the working dir is not a scrapy project.

The rationale behind is that some users (including me) are used to create standalone spiders and run it with the runspider command, because it's a quick and convenient way to fire up simple spiders. Having genspider available would make it even quicker.

@codecov-io
Copy link

codecov-io commented Jun 12, 2016

Current coverage is 83.32%

Merging #2052 into master will decrease coverage by <.01%

@@             master      #2052   diff @@
==========================================
  Files           161        161          
  Lines          8678       8682     +4   
  Methods           0          0          
  Messages          0          0          
  Branches       1272       1274     +2   
==========================================
+ Hits           7231       7234     +3   
  Misses         1196       1196          
- Partials        251        252     +1   

Powered by Codecov. Last updated by 759a555...081595a

@redapple redapple changed the title Enable genspider command outside project folder [MRG+1] Enable genspider command outside project folder Jun 14, 2016
@eliasdorneles
Copy link
Member

+1, this is cool!

Only thing missing now is to update docs. :)

genspider is listed as a project-only command here: http://doc.scrapy.org/en/latest/topics/commands.html#available-tool-commands

The short description for genspider should also be updated, and it would be nice to mention the meaning of the arguments when using standalon (like, how name is now used for the spider file name).

@stummjr
Copy link
Member Author

stummjr commented Jun 20, 2016

This PR fails when setting the template for crawl, xmlfeed or csvfeed, because those templates include code to import the items module.

I'm going to have a look on how to fix it and then I update the PR.

@stummjr
Copy link
Member Author

stummjr commented Jul 1, 2016

I was looking at how to make this work for the 'crawl', 'xmlfeed' and 'csvfeed' templates. The issue is that those templates import the <classname>Item module and we don't have such module when generating a standalone spider.

We could solve that by removing the import and uses of the Item class in the spider code, as it is in the 'basic' template. The crawl template would become:

import scrapy
from scrapy.linkextractors import LinkExtractor
from scrapy.spiders import CrawlSpider, Rule


class $classname(CrawlSpider):
    name = '$name'
    allowed_domains = ['$domain']
    start_urls = ['http://www.$domain/']

    rules = (
        Rule(LinkExtractor(allow=r'Items/'), callback='parse_item', follow=True),
    )

    def parse_item(self, response):
        pass

Or, we could employ a template engine, such as jinja2, but it looks like an overkill.

Thoughts?

@eliasdorneles
Copy link
Member

+1 on changing the crawl template.
Since we've added support for dict items, we should probably not assume user is declaring items.

@eliasdorneles
Copy link
Member

@stummjr looks good!
Can you update the docs?
Will be ready to merge after that. :)

@eliasdorneles
Copy link
Member

The build is failing because of unrelated coverage error.
I bet it's the new version of pytest-cov released today.
Try updating test/requirements.txt with a version specifier for pytest-cov<=2.3.0 to see if build passes.

@eliasdorneles
Copy link
Member

sorry, I should've said pytest-cov!=2.3.0!

@eliasdorneles
Copy link
Member

Thanks @stummjr !

@eliasdorneles eliasdorneles merged commit 0ab7c1f into scrapy:master Jul 6, 2016
eliasdorneles added a commit that referenced this pull request Jul 8, 2016
[backport][1.1] Enable genspider command outside project folder (PR #2052)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants