Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+1] some improvements to overview page #1106

Merged
merged 7 commits into from Mar 27, 2015

Conversation

eliasdorneles
Copy link
Member

Hey folks!

Here is my proposal for addressing issue #609 (and replaces PR #1023, while keeping some ideas).

Since the overview is "the pitch", I tried my best to make it short and to the point.

Summary of the changes:

  • Added example spider showcasing both scraping and crawling (link following)
  • Wrote an explanation of what the code does, without delving much into details
  • Summarized table of features in the end, and reordered them based on my gut feeling
  • Cut some text
  • Cut some more

Note: the example spider is also showcasing the features from PRs #1081 and #1086, that I assume will be also in Scrapy 1.0 release.

So, what do you think, does this look good?

Thank you!

In the ``parse`` callback, we scrape the links to the questions and
yield a few more requests to be processed, registering for them
the method ``parse_question`` as the callback to be called when the
requests are complete.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think here we should explain that these requests are scheduled and processed asynchronously.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, let me fix that!


For more information about XPath see the `XPath reference`_.
Here you notice one of the main advantages about Scrapy: requests are
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there was a single request in start_urls; it should be easier to see the advantage for requests sent from parse method

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was in doubt about that one... Better move it one paragraph down.

@kmike
Copy link
Member

kmike commented Mar 26, 2015

@eliasdorneles a good overview, I like it 👍

I'm trying to attack it from a position of a person who can hack together a spider using requests + concurrent.futures + pyquery + json. Please don't take it as a criticism :) Why should such person bother with Scrapy?

@eliasdorneles
Copy link
Member Author

Don't worry @kmike, I appreciate your feedback a good deal, always good points! :)

@eliasdorneles
Copy link
Member Author

Hey @kmike -- I've just updated addressing your concerns and did some more editing.
Can you please have a look again?
Thank you!

provide any API or mechanism to access that info programmatically. Scrapy can
help you extract that information.
Once you're ready to dive in more, you can :ref:`follow the tutorial
and build a full-blown Scrapy project <intro-tutorial>`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this note can be moved to the end - it is unclear if users should continue reading the overview, of if they should go to the tutorial. It seems the reason you've put it here is that in addition to 'scrapy runspider' there is 'scrapy crawl' with a full-blown project support, and you wanted to mention it. We can add project support to a list of Scrapy advantages - Scrapy helps to organize the code, so that projects with tens and hundreds of spiders are still manageable.

@kmike
Copy link
Member

kmike commented Mar 26, 2015

//cc @pablohoffman @shaneaevans @dangra and everyone else - thoughts? Use https://github.com/eliasdorneles/scrapy/blob/overview-page-improvements/docs/intro/overview.rst link to read it.

I think this introduction is nearly perfect :)
+1 to merge it once we have the required PRs merged.

@kmike kmike changed the title some improvements to overview page [MRG+1] some improvements to overview page Mar 26, 2015
@nyov
Copy link
Contributor

nyov commented Mar 27, 2015

Everyone Else here. Looks good, I like it.

Well, meh, I actually clicked on that AAWS link, thought I would get some info on how to extract API data. Amazon is big enough, maybe this could point to somewhere else, like something on http://www.programmableweb.com/ ?

I would ask for a single, tiny, response.xpath query in the spider example, just to let old-timers know they aren't deprecated yet :)

Some minor things like misplaced or missing commas, but that can be ignored.

curita added a commit that referenced this pull request Mar 27, 2015
[MRG+1] some improvements to overview page
@curita curita merged commit f4e241a into scrapy:master Mar 27, 2015
@eliasdorneles
Copy link
Member Author

Hey @nyov -- since this is already merged, please feel free to send a PR to fix those commas or whatever. =)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants