-
Notifications
You must be signed in to change notification settings - Fork 10.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG+1] docs: update overview spider code to use toscrape.com and minor changes #2249
Conversation
So, this will replace the spider example code from the overview that scrapes questions from StackOverflow by a spider scraping quotes (much like the one in the tutorial), and upates the text around it to be consistent. There are also minor wording changes plus a small Sphinx/reST syntax fix on the features list at the bottom (it was creating a definition list, causing one line to be bold).
of them as they finish. | ||
an argument. In the ``parse`` callback we loop through the quote elements | ||
using a CSS Selector, yield a Python dict with the extracted quote text and author, | ||
look for a link to the next page and schedules another request using the same |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and schedules another request using the same
: schedules
should be in 1st person of plural.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed!
Current coverage is 83.36% (diff: 100%)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
'link': response.url, | ||
} | ||
next_page = response.css('li.next a::attr("href")').extract_first() | ||
if next_page: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think next_page is None
could be slightly better because an empty string is a valid relative URL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed!
hey @kmike, I'll merge this to move things forward, feel free to point out if you have any other concern. |
So, this will replace the spider example code from the overview that
scrapes questions from StackOverflow by a spider scraping quotes (much
like the one in the tutorial), and upates the text around it to be
consistent.
There are also minor wording changes plus a small Sphinx/reST syntax fix
on the features list at the bottom (it was creating a definition list,
causing one line to be bold).
Does this look good?
Thanks!