[WIP] scrapy-streaming examples #4

Merged
merged 11 commits into from Aug 19, 2016

Projects

None yet

5 participants

@aron-bordin
Member
aron-bordin commented Jun 1, 2016 edited

PR Overview

I'll be updating this branch with scrapy-streaming examples, while implementing them in the project development.

Examples

  1. check_response_status - This spider open a list of domains and check which domain is returning a valid status.
  2. extract_dmoz_links - This example is covered in the quickstart section. It gets a list of websites with Python related articles
  3. request_image - This demo shows how to download binary data using scrapy-streaming.
  4. request_utf8 - How to crawl webpages with utf-8 data.
Example Python R Java More languages ...
1 x
2 x
3 x
4 x x
@codecov-io
codecov-io commented Jun 1, 2016 edited

Current coverage is 88.21% (diff: 100%)

Merging #4 into master will not change coverage

@@             master         #4   diff @@
==========================================
  Files            11         11          
  Lines           246        246          
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
  Hits            217        217          
  Misses           29         29          
  Partials          0          0          

Powered by Codecov. Last update e3f8e33...a1edab3

@aron-bordin aron-bordin referenced this pull request Jun 1, 2016
Open

Communication Protocol #5

@aron-bordin
Member
aron-bordin commented Jun 9, 2016 edited

This folder will contains as much as possible practical examples of scrapy-streaming, and some usage tips. It's ok to have spiders using these popular websites, such as github here ?

(I removed the extract_github_data example in this PR from quickstart section, so I'd like to confirm if I can keep it here)
cc: @eLRuLL , @redapple

-- edited:
Currently, examples [3] and [4] are using a github spider.

@eLRuLL
Member
eLRuLL commented Jun 16, 2016

according to the robots.txt file it is disallowing everything inside github for normal user-agents, maybe we could download those responses once? and make some examples with files, what do you think @redapple ?

@redapple
Contributor

@eLRuLL , I would not include GitHub, especially since they have a very comprehensive API.
@stummjr is working on scrapy (or scrapinghub) maintainted live website to build scraping example against. Not sure of the status of this being live yet though.

@stummjr
Member
stummjr commented Jun 16, 2016

@redapple it's live on http://dev.scrapinghub.com/~valdir but it's gonna be moved to a proper place soon.

@aron-bordin aron-bordin referenced this pull request Jun 27, 2016
Open

R package helper #8

@eLRuLL
Member
eLRuLL commented Jul 11, 2016

@aron-bordin please remove the Github examples.

This was referenced Jul 20, 2016
@eLRuLL eLRuLL merged commit dd41de4 into scrapy-plugins:master Aug 19, 2016

3 checks passed

codecov/patch Coverage not affected when comparing e3f8e33...a1edab3
Details
codecov/project 88.21% (+0.00%) compared to e3f8e33
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment