Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a page which explains important Scrapy concepts in a single page #1569

Open
kmike opened this issue Oct 29, 2015 · 4 comments
Open

Add a page which explains important Scrapy concepts in a single page #1569

kmike opened this issue Oct 29, 2015 · 4 comments

Comments

@kmike
Copy link
Member

kmike commented Oct 29, 2015

As @plafl said: "Scrapy is very extensible but that has a cost too. There are too many concepts: spiders, items, middlewares, pipelines, exporters, extensions, signals, settings. As a newcomer I would like to know which problem they solve."

+1 :) I think we should add a page which will explain all this in a single place - what are these concepts, when and why to use them.

@kmike kmike added the docs label Oct 29, 2015
@Granitosaurus
Copy link
Contributor

Isn't that pretty much http://doc.scrapy.org/en/latest/topics/architecture.html?highlight=scrapy%20architecture ? Which is my favorite page in the docs.

@kmike
Copy link
Member Author

kmike commented Nov 5, 2015

@Granitas yeah, it is close, a good catch.

This page is not targeted for beginners though, e.g. both for downloader and for spider middleware it says just "They provide a convenient mechanism for extending Scrapy functionality by plugging custom code." without explaining which custom code should go to a spider mw and which should go to downloader mw. You can figure it out by meditating over the architecture overview picture, but it is not an easy task if you're just starting. Also, it doesn't explain extensions at all.

kmike added a commit that referenced this issue Mar 25, 2016
* spiders don't have to work on specific domains;
* explain what to use Downloader middleware for
  and what to use Spider middleware for;
* Engine no longer locates spiders based on domains;
* "Spider middleware output direction" step was missing.

See also: GH-1569.
kmike added a commit that referenced this issue Mar 25, 2016
* spiders don't have to work on specific domains;
* explain what to use Downloader middleware for
  and what to use Spider middleware for;
* Engine no longer locates spiders based on domains;
* "Spider middleware output direction" step was missing.

See also: GH-1569.
redapple pushed a commit that referenced this issue Mar 31, 2016
* spiders don't have to work on specific domains;
* explain what to use Downloader middleware for
  and what to use Spider middleware for;
* Engine no longer locates spiders based on domains;
* "Spider middleware output direction" step was missing.

See also: GH-1569.
@darshanime
Copy link
Contributor

I can write this page, I have a few questions to get started:

  1. is the page is aimed at the extensions developer
  2. should it read like an article or should it contain code
  3. what is the desired length of the article (how many words)
  4. what concepts should be focused on

@Gallaecio
Copy link
Member

This seems to go in the lines of a question I recently tried to answer on StackOverflow: https://stackoverflow.com/q/54421455/939364

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants