Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate parts of Scrapy CrawlSpider Rule flexibility into DDS #11

Open
holgerd77 opened this issue Aug 6, 2012 · 4 comments
Open
Labels

Comments

@holgerd77
Copy link
Owner

With CrawlSpider Rules there exists a more powerful tool in Scrapy (http://doc.scrapy.org/en/0.14/topics/spiders.html) to crawl pages from different urls following a certain pattern than it is actually realized in DDS with pagination.

See the following Google Groups discussion thread for reference:
https://groups.google.com/forum/?fromgroups#!topic/django-dynamic-scraper/tQJMpcbqbfc

It would be desirable to integrate at least a part of it.

Ideas:

  • Application of one '"allow"-Rule could be integrated as a pagination type together with the pagination_append_str attribute without changing the DB structure
@kevinwan
Copy link

kevinwan commented Aug 7, 2012

we may inherit the CrawlSpider, just as the similar way that you implement the DjangoBaseSpider by inheriting from BaseSpider?

how do you think about that?

@holgerd77
Copy link
Owner Author

I think, it should be no problem to replace BaseSpider with CrawlSpider, but that still leaves the task to integrate some of its functionality into DDS in an appropriate way. Or would this replacement already help you in some way?

@undernewmanagement
Copy link

Is there still interest in this feature? My team might want to develop this feature.

@holgerd77
Copy link
Owner Author

Definitely still interesting. Before you start developing it would be definitely good/helpful if you lay out here how you would implement this feature and how it fits in the existing DDS structure, regarding code, DB and admin UI.

It is also a prerequisite for a new feature to be accepted that all the unit tests pass, see:
http://django-dynamic-scraper.readthedocs.org/en/latest/development.html#running-the-test-suite

If you want to make a pull request ping me before issuing, I would create a separate experimental branch for merging.

Cheers
Holger

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants