Add failed and success count stats to feedstorage backends #4850

joaquingx · 2020-10-17T20:41:42Z

Resolves #3947

Example:

import scrapy
from scrapy.crawler import CrawlerProcess


class QuotesToScrapeSpider(scrapy.Spider):
    name = "quotes"

    custom_settings = {
        "DOWNLOAD_DELAY": 1,
        "COOKIES_DISABLED": True,  # mistyped, should be enabled
        "CONCURRENCY": 5,
        "FEEDS": {
            "file:///tmp/tmp-%(batch_time)s.json": {
                "format": "json",
            },
            "s3://mybucket/path/to/export-%(batch_time)s.csv": {
                "format": "csv",
            },
        },
        "FEED_EXPORT_BATCH_ITEM_COUNT": 5,
    }

    def start_requests(self):
        yield scrapy.Request(url='http://quotes.toscrape.com/', callback=self.parse)

    def parse(self, response):
        for quote in response.css("div.quote"):
            yield {
                "quote": quote.css("span.text::text").extract(),
                "author": quote.css("small.author::text").extract(),
                "tags": quote.css("a.tag::text").extract()
            }
            break
        next = response.css("li.next a::attr(href)").extract_first()
        if next:
            yield scrapy.Request(url=response.urljoin(next), callback=self.parse)


process = CrawlerProcess()
process.crawl(QuotesToScrapeSpider)
process.start()

if S3 fails to store, stats will be:

{'downloader/request_bytes': 2692,
 'downloader/request_count': 10,
 'downloader/request_method_count/GET': 10,
 'downloader/response_bytes': 23026,
 'downloader/response_count': 10,
 'downloader/response_status_count/200': 10,
 'elapsed_time_seconds': 11.61577,
 'feedexport/failed_count/S3FeedStorage': 2,
 'feedexport/success_count/FileFeedStorage': 2,
 'finish_reason': 'finished',
 'finish_time': datetime.datetime(2020, 10, 17, 20, 41, 6, 16674),
 'item_scraped_count': 10,
 'log_count/DEBUG': 218,
 'log_count/ERROR': 2,
 'log_count/INFO': 16,
 'memusage/max': 70389760,
 'memusage/startup': 70389760,
 'request_depth_max': 9,
 'response_received_count': 10,
 'scheduler/dequeued': 10,
 'scheduler/dequeued/memory': 10,
 'scheduler/enqueued': 10,
 'scheduler/enqueued/memory': 10,
 'start_time': datetime.datetime(2020, 10, 17, 20, 40, 54, 400904)}

Ready to review 😄

codecov · 2020-10-17T21:13:15Z

Codecov Report

Merging #4850 into master will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master    #4850   +/-   ##
=======================================
  Coverage   87.86%   87.87%           
=======================================
  Files         160      160           
  Lines        9749     9755    +6     
  Branches     1439     1437    -2     
=======================================
+ Hits         8566     8572    +6     
  Misses        926      926           
  Partials      257      257

Impacted Files	Coverage Δ
scrapy/extensions/feedexport.py	`95.32% <100.00%> (+0.10%)`	⬆️

joaquingx · 2020-10-23T02:03:05Z

CI failed, im not sure why -> https://travis-ci.org/github/scrapy/scrapy/jobs/738154019#L182 can help me here please @Gallaecio

eLRuLL · 2020-10-23T02:08:07Z

CI failed, im not sure why -> https://travis-ci.org/github/scrapy/scrapy/jobs/738154019#L182 can help me here please @Gallaecio

@joaquingx it looks like a temporary problem, I think @Gallaecio should be able to restart the job, but I think you can too if you push a commit again

Gallaecio

Nice!

elacuesta · 2020-10-30T13:25:35Z

This is great, thanks. I was wondering if you would consider changing the approach regarding this small bit:

diff --git tests/test_feedexport.py tests/test_feedexport.py
index 1a77cec7..2ce8d7ff 100644
--- tests/test_feedexport.py
+++ tests/test_feedexport.py
@@ -8,6 +8,7 @@ import tempfile
 import warnings
 from abc import ABC, abstractmethod
 from collections import defaultdict
+from contextlib import ExitStack
 from io import BytesIO
 from logging import getLogger
 from pathlib import Path
@@ -782,8 +783,11 @@ class FeedExportTest(FeedExportTestBase):
             },
         }
         crawler = get_crawler(ItemSpider, settings)
-        with MockServer() as mockserver, \
-                mock.patch("scrapy.extensions.feedexport.FileFeedStorage.store", side_effect=KeyError("foo")):
+        with ExitStack() as stack:
+            mockserver = stack.enter_context(MockServer())
+            stack.enter_context(
+                mock.patch("scrapy.extensions.feedexport.FileFeedStorage.store", side_effect=KeyError("foo"))
+            )
             yield crawler.crawl(mockserver=mockserver)
         self.assertIn("feedexport/failed_count/FileFeedStorage", crawler.stats.get_stats())
         self.assertEqual(crawler.stats.get_value("feedexport/failed_count/FileFeedStorage"), 1)

I took it from this SO answer. Please excuse my nitpicking, this is not a blocker in any sense, I just really don't like backslash break lines 😅

joaquingx · 2020-11-04T21:03:19Z

@elacuesta Hey, thanks, it would improve the code. Changes are done!

joaquingx changed the title ~~[WIP] Add failed and success count to slot store errback~~ [WIP] Add failed and success count stats to feedstorage backends Oct 19, 2020

joaquingx force-pushed the set-stats-for-feed-exporter-extension branch from 14b207b to 44cc533 Compare October 21, 2020 13:23

joaquingx changed the title ~~[WIP] Add failed and success count stats to feedstorage backends~~ Add failed and success count stats to feedstorage backends Oct 22, 2020

Add failed and success count to slot store errback. Add tests.

c5f06de

joaquingx force-pushed the set-stats-for-feed-exporter-extension branch from 71713fd to c5f06de Compare October 23, 2020 04:25

Gallaecio approved these changes Oct 30, 2020

View reviewed changes

Remove backslash break.

f2a9e10

elacuesta approved these changes Nov 11, 2020

View reviewed changes

Gallaecio merged commit 85604e1 into scrapy:master Nov 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add failed and success count stats to feedstorage backends #4850

Add failed and success count stats to feedstorage backends #4850

joaquingx commented Oct 17, 2020 •

edited by Gallaecio

Loading

codecov bot commented Oct 17, 2020 •

edited

Loading

joaquingx commented Oct 23, 2020

eLRuLL commented Oct 23, 2020

Gallaecio left a comment

elacuesta commented Oct 30, 2020

joaquingx commented Nov 4, 2020

Add failed and success count stats to feedstorage backends #4850

Add failed and success count stats to feedstorage backends #4850

Conversation

joaquingx commented Oct 17, 2020 • edited by Gallaecio Loading

codecov bot commented Oct 17, 2020 • edited Loading

Codecov Report

joaquingx commented Oct 23, 2020

eLRuLL commented Oct 23, 2020

Gallaecio left a comment

Choose a reason for hiding this comment

elacuesta commented Oct 30, 2020

joaquingx commented Nov 4, 2020

joaquingx commented Oct 17, 2020 •

edited by Gallaecio

Loading

codecov bot commented Oct 17, 2020 •

edited

Loading