Skip to content

Commit

Permalink
Add SPIDERMON_MONITOR_SKIPPING_RULES docs to settings (#447)
Browse files Browse the repository at this point in the history
* Move SPIDERMON_MONITOR_SKIPPING_RULES docs to settings page and improve it

* Update docs/source/settings.rst

Co-authored-by: Adrián Chaves <adrian@chaves.io>

---------

Co-authored-by: Adrián Chaves <adrian@chaves.io>
  • Loading branch information
VMRuiz and Gallaecio committed May 7, 2024
1 parent 562c024 commit 3ae805b
Show file tree
Hide file tree
Showing 2 changed files with 52 additions and 41 deletions.
52 changes: 52 additions & 0 deletions docs/source/settings.rst
Original file line number Diff line number Diff line change
Expand Up @@ -355,3 +355,55 @@ Considering the spider returns the following items:
'spidermon_item_scraped_count/dict/field4/field4.1/field4.1.2': 1
'spidermon_item_scraped_count/dict/field4/field4.1/field4.1.3': 1
SPIDERMON_MONITOR_SKIPPING_RULES
--------------------------------
Default: ``None``

A dictionary where keys represent the names of the monitors to be skipped, and the corresponding values are lists containing either method names or lists defining skip conditions.

When defining skip rules based on values, the list must follow the pattern:

``["stat_name", "comparison_operator", "threshold_value"]``.

Here, ``stat_name`` refers to the name of the Scrapy Stat being evaluated, ``comparison_operator`` indicates the type of comparison to perform (e.g., "==", "<", ">="), and ``threshold_value`` sets the threshold for the comparison.

Additionally, custom skip rules can be defined using Python functions. These functions should accept a single argument (typically named ``monitor``) representing the monitor being evaluated and return a boolean value indicating whether the monitor should be skipped (``True``) or not (``False``).

Below are examples illustrating how skip rules can be configured in the settings.

Example #1: Skip monitor based on stat values

.. code-block:: python
class QuotesSpider(scrapy.Spider):
name = "quotes"
custom_settings = {
"SPIDERMON_FIELD_COVERAGE_RULES": {
"dict/quote": 1,
"dict/author": 1,
},
"SPIDERMON_MONITOR_SKIPPING_RULES": {
"Field Coverage Monitor": [["item_scraped_count", "==", 0]],
}
}
Example #2: Skip monitor based on a custom function

.. code-block:: python
def skip_function(monitor):
return datetime.datetime.today().weekday() == 4 # Don't test on Fridays
class QuotesSpider(scrapy.Spider):
name = "quotes"
custom_settings = {
"SPIDERMON_FIELD_COVERAGE_RULES": {
"dict/quote": 1,
"dict/author": 1,
},
"SPIDERMON_MONITOR_SKIPPING_RULES": {
"Field Coverage Monitor": [skip_function],
}
}
41 changes: 0 additions & 41 deletions spidermon/contrib/scrapy/monitors/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,47 +10,6 @@


class BaseScrapyMonitor(Monitor, SpiderMonitorMixin):
"""
Monitor can be skipped based on conditions given in the settings.
The purpose is to skip a monitor based on stat value or any custom
function. A scenario could be skipping the Field Coverage Monitor
when a spider produced no items. Following is a code block of
examples of how we can configure the skip rules in settings.
Example #1: skip rules based on stat values
.. code-block:: python
class QuotesSpider(scrapy.Spider):
name = "quotes"
custom_settings = {
"SPIDERMON_FIELD_COVERAGE_RULES": {
"dict/quote": 1,
"dict/author": 1,
},
"SPIDERMON_MONITOR_SKIPPING_RULES": {
"Field Coverage Monitor": [["item_scraped_count", "==", 0]],
}
}
Example #2: skip rules based on a custom function
.. code-block:: python
def skip_function(monitor):
return "item_scraped_count" not in monitor.data.stats
class QuotesSpider(scrapy.Spider):
name = "quotes"
custom_settings = {
"SPIDERMON_FIELD_COVERAGE_RULES": {
"dict/quote": 1,
"dict/author": 1,
},
"SPIDERMON_MONITOR_SKIPPING_RULES": {
"Field Coverage Monitor": [skip_function],
}
}
"""

longMessage = False
ops = {
">": operator.gt,
Expand Down

0 comments on commit 3ae805b

Please sign in to comment.