scrapy-datadog-extension is a Scrapy extension to send metrics from your spiders executions to Datadog (scrapy stats).
There is no public pre-packaged version yet. If you want to use it you
will have to clone the project and make it installable easilly from the
requirements.txt
.
First, you will need to include the extension to the EXTENSIONS
dict located
in your settings.py
file. For example:
EXTENSIONS = {
'scrapy-datadog-extension': 1,
}
Then you need to provide the followings variables, directly from the scrapinghub settings of your jobs:
DATADOG_API_KEY
: Your Datadog API key.DATADOG_APP_KEY
: Your Datadog APP key.DATADOG_CUSTOM_TAGS
: List of tags to bind on metricsDATADOG_CUSTOM_METRICS
: Sub list of metrics to send to DatadogDATADOG_METRICS_PREFIX
: What prefix you want to apply to all of your metrics, e.g.:kp.
Basically, this extension will, on the spider_closed
signal execution, collect
the scrapy stats associated to a given projct/spider/job and extract a list
of variables listed in a stats_to_collect
list, custom variables will be also
be added:
elapsed_time
: which is a simple computation offinish-time - start_time
.done
: a simple counter, acting like a ping to indicate that a job is ran regularly.
At the end, we have a list of metrics, with tags associated (to enable better filtering from Datadog):
project
: The scrapinghub project ID.spider_name
: The scrapinghub spider name as defined in the spider class.
Then, everything is sent to Datadog, using the Datadog API.
- Sometimes, when the
spider_closed
is executed right after the job completion, some scrapy stats are missing so we send incomplete list of metrics, preventing us to rely 100% on this extension.
- Include the name of the project/spider/job instead of simply send its ID.
- Make the
stats_to_collect
configurable from the ScrapingHub spiders settings console. - Find a way to ensure that all the scrapy stats are collected prior to send them.