-
Notifications
You must be signed in to change notification settings - Fork 10.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: Remove top-level reactor imports from CrawlerProces/CrawlerRunner examples #6374
Conversation
I've just recalled we have testcases for similar things, in |
Co-authored-by: Andrey Rakhmatullin <wrar@wrar.name>
Co-authored-by: Andrey Rakhmatullin <wrar@wrar.name>
@wRAR I've taken a look and seems that sleeping.py is the only one that would be good to add to docs (let me know if you find other interesting), e.g: scenarios where user needs to wait few seconds to something get processed by site but don't want to I/O block other requests using from time import time
from urllib.parse import parse_qsl
from twisted.internet import reactor
from twisted.internet.defer import Deferred
from scrapy import Spider
from scrapy.crawler import CrawlerRunner
from scrapy.http import Request
from scrapy.utils.log import configure_logging
from scrapy.utils.project import get_project_settings
class MySpider1(Spider):
name = "my_spider"
io_block = False # change to simulate I/O blocking
def start_requests(self):
for x in range(10):
yield Request(url=f"https://httpbin.org/anything?sleep_for={x}")
def parse(self, response):
from twisted.internet import reactor
d = Deferred()
start = time()
def yield_item():
end = time()
self.logger.info(f"[finished] zZzZZ {sleep_for}, took: {end - start}")
yield response.json()
query_params = dict(parse_qsl(response.url.split("?")[-1]))
sleep_for = int(query_params["sleep_for"])
if sleep_for % 2 == 0:
self.logger.info(f"[start] zZzZ... {sleep_for}")
if self.io_block:
from time import sleep
sleep(sleep_for)
return yield_item()
else:
reactor.callLater(sleep_for, d.callback, yield_item())
return d
else:
return response.json()
settings = get_project_settings()
configure_logging(settings)
runner = CrawlerRunner(settings)
d = runner.crawl(MySpider1)
d.addBoth(lambda _: reactor.stop())
reactor.run() But it seems too verbose to me, is there a way to simplify this? So we don't need to write |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #6374 +/- ##
==========================================
- Coverage 85.00% 84.92% -0.08%
==========================================
Files 161 162 +1
Lines 11962 12112 +150
Branches 1872 1839 -33
==========================================
+ Hits 10168 10286 +118
- Misses 1512 1538 +26
- Partials 282 288 +6 |
I see I phrased that poorly, I meant that we should add (or modify) test cases so that things we have put into docs are actually tested, can you please check that if you didn't already? As for |
(I've forgot that I wanted the test changes if they are needed, please don't merge this without them) |
Co-authored-by: Andrey Rakhmatullin <wrar@wrar.name>
Co-authored-by: Andrey Rakhmatullin <wrar@wrar.name>
Co-authored-by: Andrey Rakhmatullin <wrar@wrar.name>
Oh yeah I will check later. |
@wRAR Things that I noticed while creating the test:
To get the exception we need to define from scrapy import Spider
from scrapy.crawler import CrawlerRunner
from scrapy.utils.log import configure_logging
class NoRequestsSpider(Spider):
name = "no_request"
custom_settings = {
"TWISTED_REACTOR": "twisted.internet.asyncioreactor.AsyncioSelectorReactor",
}
def start_requests(self):
return []
configure_logging({"LOG_FORMAT": "%(levelname)s: %(message)s", "LOG_LEVEL": "DEBUG"})
INSTALL_REACTOR = False
if INSTALL_REACTOR:
from scrapy.utils.reactor import install_reactor
install_reactor("twisted.internet.asyncioreactor.AsyncioSelectorReactor")
runner = CrawlerRunner()
d = runner.crawl(NoRequestsSpider)
from twisted.internet import reactor
from twisted.python import log
import sys
d.addErrback(log.err)
d.addErrback(sys.exit(1))
d.addBoth(callback=lambda _: reactor.stop())
reactor.run() Let me know if you think we should add this test or is there a better way to write this script. |
Not having the error logging is #6047, or at least directly related to it. Not sure why we need to call |
Nice, so is good to merge then! |
Thanks! |
fix #6361