Streaming commands / Communication #3

merged 5 commits into from Jun 1, 2016


None yet

3 participants


PR Overview


In this PR we have the crawl and streaming commands.

crawl works as in scrapy. Inside project, it first tries to load a scrapy spider, and then if not found, try to load an external spider.
streaming commands is used to run external spiders without creating a project, using streaming path_of_command or streaming path -a arg1,arg2 -a extra_arg


There is an initial work in the streaming core. Right now it's possible to connect with an external process, and I've done some work in the communication layer

These communication layers are not yet tested, because they are under development (work of the next week) and may be modified.


I've implemented this class to buffer the incoming data and just process it after receiving a entire line (message).
I've thrown some MessageError in this class and in the CommunicationMap. These exception is being used to grab problems in the json format provided by the external spider.

codecov-io commented May 29, 2016 edited

Current coverage is 88.31%

Merging #3 into master will decrease coverage by 0.57%

@@             master         #3   diff @@
  Files             5         11     +6   
  Lines            72        248   +176   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
+ Hits             64        219   +155   
- Misses            8         29    +21   
  Partials          0          0          

Powered by Codecov. Last updated by 303d882...cb17209

@aron-bordin aron-bordin changed the title from [WIP] Streaming commands / Communication to Streaming commands / Communication May 31, 2016
aron-bordin added some commits May 27, 2016
@aron-bordin aron-bordin intial streaming 4792ca2
@aron-bordin aron-bordin initial communication procotol 65c5903
@aron-bordin aron-bordin commands tests 360aae7
@aron-bordin aron-bordin separating communication logic from protocol 4327520
@aron-bordin aron-bordin streaming args and readme
aron-bordin commented May 31, 2016 edited

This PR is ready to be tested.

This adds the streaming and crawl commands to scrapy.


You can use the simple spider defined here:

streaming command

Put it somewhere and mark it as executable.

Then, you can execute using scrapy streaming or scrapy streaming python -a

crawl command

inside a scrapy project, add the external.json file as defined here

Then, you can use scrapy list to get a list of spiders, including external ones.

Then, you can run it using scrapy crawl spider_name output

2016-05-30 22:17:20 [scrapy] INFO: Scrapy 1.2.0dev2 started (bot: scrapybot)
2016-05-30 22:17:20 [scrapy] INFO: Overridden settings: {}
2016-05-30 22:17:20 [root] INFO: working


These commands are documented here:

cc: @eLRuLL

@aron-bordin aron-bordin referenced this pull request Jun 1, 2016

Communication Protocol #5

@eLRuLL eLRuLL merged commit bea89a7 into scrapy-plugins:master Jun 1, 2016

1 of 3 checks passed

codecov/patch 84.41% of diff hit (target 88.89%)
codecov/project 88.21% (-0.66%) compared to 303d882
continuous-integration/travis-ci/pr The Travis CI build passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment