-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support restricting the coverage monitor fields #416
Comments
Skipping the coverage of certain patterns may not solve the problem, considering that for data not well structured, we may get keys unexpected from our targets. Regarding the stats limit in Scrapy Cloud, maybe changing what HubStorageStatsCollector uploads as stats is a more reliable alternative and is specific to the platform, so we don't introduce new Zyte specific code to Spidermon. |
The alternative of
If
If
What do you think??? |
I like the depth approach the most 👍 Mauricio implemented |
Fixed by #433 |
Background
Currently, the coverage monitor tracks and reports the coverage of all fields, including nested fields (i.e., keys inside dictionary values specifically). It follows all nested field levels.
This sometimes isn't desired, especially for dictionaries that have non-standardized keys. The coverage tracking can stay the same if there's a fixed schema for those dictionaries or too many nested levels. This is particularly troublesome when running spiders in ScrapyCloud, as there's a hard limit on the stats's storage size.
Alternatives
SPIDERMON_LIST_FIELDS_COVERAGE_LEVELS
, but for dictionaries.<key: field count>
pair could be stored as a separate entry in a ScrapyCloud collection to avoid triggering its size limitation.All of them could be viable and coexist as they address different parts of the problem.
Let me know what you think!
The text was updated successfully, but these errors were encountered: