Install scrapy-sink
using pip
:
pip install scrapy-sink
- Add the
settings.py
of your Scrapy project like this:
SINK_ADDR = 'http://api.mysite.com/v1/scraped-results'
You can also only set it in your environment variable
- Enable the pipeline by adding it to
ITEM_PIPELINES
in yoursettings.py
file:
ITEM_PIPELINES = {
'scrapy_sink.pipelines.SinkPipeline': 9999,
}
The order should after your persist pipeline such as save to database and after your preprocess pipeline.
- set enviroment variable
SINK_ADDR
to your target url that receving scraped result
export SINK_ADDR=http://api.mysite.com/v1/scraped-results
- import
Sink
class to your script
from scrapy_sink.simple import Sink
- initialize
Sink
instance
sink = Sink('your_site_name')
- post result throgh
Sink
instance
## todo: implement the payload extract action
payload = parse_scraped_page_as_dict(response.text)
sink.feed(payload)
no need to change your code
Please use github issue
PRs are always welcomed.