New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add parse_with_rules method and log a warning when _parse_response is… #5856
base: master
Are you sure you want to change the base?
Conversation
… called directly or when it is overriden.
Better warning message. Co-authored-by: Adrián Chaves <adrian@chaves.io>
…scrapy into fix-issue-4463
scrapy/spiders/crawl.py
Outdated
"will be removed in future Scrapy releases. Please use the " | ||
"CrawlSpider.parse_with_rules method instead." | ||
) | ||
cb_kwargs.pop("direct_call", None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting strategy. I think it might be OK to go this route, as long as we use a more descriptive and less public name, e.g. _direct_parse_response_call
.
However, there may be better approaches here.
For example, you could define a new private method, move the body of _parse_response
to that method, and call that method from both _parse_method
and parse_with_rules
. In _parse_method
, however, you also log a warning. Then you do not need to mess up with cb_kwargs, which is meant to end users, not for internal handlings.
It might also be good to log these warnings only once, and not once per callback call. So, you could run the method inheritance check on __init__
, and define a private boolean attribute in __init__
as well to keep track of whether or not _parse_response
has been called, to only log a warning on the first of such calls.
Finally, I think the parse_with_rules
should be async as well. If you tried that already and had issues, mind that I think you need to await your internal call to _parse_response
(or to the new private method) for things to work as expected. That is, rather than:
async def parse_with_rules(…):
return self._parse_response(…)
I think you need:
async def parse_with_rules(…):
return await self._parse_response(…)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took the approach that you suggested but I was not able to make parse_with_rules
and _parse_response
async. It raises this error
return await self._parse_with_rules(response, callback, cb_kwargs, follow)
TypeError: object async_generator can't be used in 'await' expression
if I put async/await
in the methods. I think it is because of yield on the body of the original function. It is working with only the _parse_with_rules
being async but I don't know how much the others functions not being async would affect the operation of other parts of the code.
I also created 3 tests to check if the warnings would be raising properly. Should I add more to test other stuff?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_parse_with_rules
is an async generator so you cannot await it, you need:
async def _parse_response(self, response, callback, cb_kwargs, follow=True):
...
async for request_or_item in self._parse_with_rules(response, callback, cb_kwargs, follow):
yield request_or_item
Codecov Report
@@ Coverage Diff @@
## master #5856 +/- ##
==========================================
+ Coverage 88.85% 88.87% +0.01%
==========================================
Files 162 162
Lines 11057 11072 +15
Branches 1801 1803 +2
==========================================
+ Hits 9825 9840 +15
Misses 954 954
Partials 278 278
|
warnings.warn( | ||
"The CrawlSpider._parse_response method is deprecated: it " | ||
"will be removed in future Scrapy releases. Please use the " | ||
"CrawlSpider.parse_with_rules method instead." | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about mentioning where the problem is? Users may not realize they are subclassing CrawlSpider, or where, or they may be using a Scrapy plugin, or subclassing a CrawlSpider subclass from a third-party, etc.
warnings.warn( | |
"The CrawlSpider._parse_response method is deprecated: it " | |
"will be removed in future Scrapy releases. Please use the " | |
"CrawlSpider.parse_with_rules method instead." | |
) | |
cls_path = f"{cls.__module__}.{cls.__qualname__}" | |
warnings.warn( | |
f"The CrawlSpider._parse_response method, that the " | |
f"{cls_path} class reimplements, is deprecated: it will be " | |
f"removed in future Scrapy releases. Please use the " | |
f"CrawlSpider.parse_with_rules method instead." | |
) |
cls = self.__class__ | ||
self._is_warned = False | ||
is_overriden = method_is_overridden(cls, CrawlSpider, "_parse_response") | ||
if is_overriden: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should maintain backward compatibility: if they indeed override _parse_response
, we need to call it where needed
I think something like this may work:
if is_overriden: | |
if is_overriden: | |
self.parse_with_rules = self._parse_response |
if not self._is_warned: | ||
self._is_warned = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to distinguish warning from overriding and warning from use.
If you override, you get 1 warning. If you use, you get 1 warning. If you do both, you get 2 warnings.
warnings.warn( | ||
"The CrawlSpider._parse_response method is deprecated: it " | ||
"will be removed in future Scrapy releases. Please use the " | ||
"CrawlSpider.parse_with_rules method instead." | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should make the warning appear on the called line, the one that actually calls the method, rather than on the method itself.
warnings.warn( | |
"The CrawlSpider._parse_response method is deprecated: it " | |
"will be removed in future Scrapy releases. Please use the " | |
"CrawlSpider.parse_with_rules method instead." | |
) | |
warnings.warn( | |
"The CrawlSpider._parse_response method is deprecated: it " | |
"will be removed in future Scrapy releases. Please use the " | |
"CrawlSpider.parse_with_rules method instead.", | |
stacklevel=2, | |
) |
… called directly or when it is overriden.
This is a working in progress. Although the code is working when I tested it manually, it lacks automated tests. I don't know yet how to make the test work with async methods.
Fixes #4463