[WIP] scrapy-streaming examples #4

merged 11 commits into from Aug 19, 2016


None yet

5 participants

aron-bordin commented Jun 1, 2016 edited

PR Overview

I'll be updating this branch with scrapy-streaming examples, while implementing them in the project development.


  1. check_response_status - This spider open a list of domains and check which domain is returning a valid status.
  2. extract_dmoz_links - This example is covered in the quickstart section. It gets a list of websites with Python related articles
  3. request_image - This demo shows how to download binary data using scrapy-streaming.
  4. request_utf8 - How to crawl webpages with utf-8 data.
Example Python R Java More languages ...
1 x
2 x
3 x
4 x x
codecov-io commented Jun 1, 2016 edited

Current coverage is 88.21% (diff: 100%)

Merging #4 into master will not change coverage

@@             master         #4   diff @@
  Files            11         11          
  Lines           246        246          
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
  Hits            217        217          
  Misses           29         29          
  Partials          0          0          

Powered by Codecov. Last update e3f8e33...a1edab3

@aron-bordin aron-bordin referenced this pull request Jun 1, 2016

Communication Protocol #5

aron-bordin commented Jun 9, 2016 edited

This folder will contains as much as possible practical examples of scrapy-streaming, and some usage tips. It's ok to have spiders using these popular websites, such as github here ?

(I removed the extract_github_data example in this PR from quickstart section, so I'd like to confirm if I can keep it here)
cc: @eLRuLL , @redapple

-- edited:
Currently, examples [3] and [4] are using a github spider.

eLRuLL commented Jun 16, 2016

according to the robots.txt file it is disallowing everything inside github for normal user-agents, maybe we could download those responses once? and make some examples with files, what do you think @redapple ?


@eLRuLL , I would not include GitHub, especially since they have a very comprehensive API.
@stummjr is working on scrapy (or scrapinghub) maintainted live website to build scraping example against. Not sure of the status of this being live yet though.

stummjr commented Jun 16, 2016

@redapple it's live on http://dev.scrapinghub.com/~valdir but it's gonna be moved to a proper place soon.

@aron-bordin aron-bordin referenced this pull request Jun 27, 2016

R package helper #8

eLRuLL commented Jul 11, 2016

@aron-bordin please remove the Github examples.

This was referenced Jul 20, 2016
@eLRuLL eLRuLL merged commit dd41de4 into scrapy-plugins:master Aug 19, 2016

3 checks passed

codecov/patch Coverage not affected when comparing e3f8e33...a1edab3
codecov/project 88.21% (+0.00%) compared to e3f8e33
continuous-integration/travis-ci/pr The Travis CI build passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment