Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[wip] reactor handlesignals #6030

Closed
wants to merge 2 commits into from

Conversation

GeorgeA92
Copy link
Contributor

@GeorgeA92 GeorgeA92 commented Aug 29, 2023

Fix to #6024

From my observations of scrapy code and related twisted pull request I conclude that call of reactor._handleSignals() or it's equivalent from new twisted version - are not needed inside install_shutdown_handlers to run scrapy.

At least on my local environment with both new/old twisted versions I didn't noticed anything unexpected.

@codecov
Copy link

codecov bot commented Aug 29, 2023

Codecov Report

Merging #6030 (8c99f44) into master (0e7fc60) will decrease coverage by 0.01%.
The diff coverage is n/a.

❗ Current head 8c99f44 differs from pull request most recent head 47bcf1a. Consider uploading reports for the commit 47bcf1a to get more accurate results

@@            Coverage Diff             @@
##           master    #6030      +/-   ##
==========================================
- Coverage   88.92%   88.92%   -0.01%     
==========================================
  Files         162      162              
  Lines       11445    11443       -2     
  Branches     1861     1861              
==========================================
- Hits        10178    10176       -2     
  Misses        962      962              
  Partials      305      305              
Files Changed Coverage Δ
scrapy/utils/ossignal.py 75.00% <ø> (-2.78%) ⬇️

@wRAR
Copy link
Member

wRAR commented Aug 29, 2023

I think Twisted sets handlers for INT and BREAK conditionally but for TERM unconditionally, so I would expect our TERM handler to be overridden in at least some cases if this line is removed.

@wRAR
Copy link
Member

wRAR commented Aug 30, 2023

After thinking about it for some more I think it makes sense that calling or not calling this doesn't make a difference, whatever the call order. But can you please check if with old Twisted a simple scrapy crawl has the TERM handler last set by Scrapy or by Twisted?

@GeorgeA92
Copy link
Contributor Author

In CrawlerProcess.start. reactor.run - called with installSignalHandlers=False which means that reactor will not apply it's default handlers. And as far as I understand in this case handlers from scrapy install_shutdown_handlers will be the only applied signal handlers

scrapy/scrapy/crawler.py

Lines 389 to 397 in a320e5f

if install_signal_handlers:
install_shutdown_handlers(self._signal_shutdown)
resolver_class = load_object(self.settings["DNS_RESOLVER"])
resolver = create_instance(resolver_class, self.settings, self, reactor=reactor)
resolver.install_on_reactor()
tp = reactor.getThreadPool()
tp.adjustPoolsize(maxthreads=self.settings.getint("REACTOR_THREADPOOL_MAXSIZE"))
reactor.addSystemEventTrigger("before", "shutdown", self.stop)
reactor.run(installSignalHandlers=False) # blocking call

While if we will run scrapy with CrawlerRunner with simply reactor.run() as documented on https://docs.scrapy.org/en/latest/topics/practices.html#running-multiple-spiders-in-the-same-process - reactor will run with reactor default signal handlers (and no scrapy install_shutdown_handlers as it called in CrawlerProcess only)

@wRAR
Copy link
Member

wRAR commented Aug 30, 2023

reactor.run - called with installSignalHandlers=False

Oh!

Then yeah, it probably doesn't make sense. OTOH twisted.internet.posixbase.PosixReactorBase._handleSignals() also installs a SIGCHLD handler, but I don't know how useful is it for Scrapy.

@wRAR
Copy link
Member

wRAR commented Sep 7, 2023

After discussing this with @kmike we want this:

  1. An integration test that makes sure that spawning a spider process and sending it a Ctrl-C (or a similar signal) once leads to a graceful shutdown while sending it twice leads to an immediate exit. We may already have such test, I haven't checked.
  2. Making sure that spawning a process with subprocess and with reactor.spawnProcess() works on a POSIX OS. It will also be interesting to check if this process becomes a zombie or is reaped correctly, both with old and new code. It will also be interesting to check twisted.internet.utils.getProcessValue() as it may depend on SIGCHLD handling.
  3. Making sure that the new code works on Scrapy Cloud as expected.

@Gallaecio
Copy link
Member

We may already have such test, I haven't checked.

I don’t think we do. It may be useful for #4749 as well.

@wRAR
Copy link
Member

wRAR commented Sep 20, 2023

For (2):

  • subprocess.run() doesn't leave a zombie
  • getProcessValue() returns the exit code and doesn't leave a zombie
  • reactor.spawnProcess leaves a zombie and doesn't fire the protocol's processEnded handler

The last point is an important difference, I don't want to break this. So we want to either call the SIGCHLD-related Twisted code directly in install_shutdown_handlers, or remove installSignalHandlers=False and try to work with that.

For the first option it's again calling some private code which I don't want to do. For the second option it means Twisted will overwrite our SIGTERM and SIGBREAK handlers but leave our SIGINT handler. We can try finding a later point at which to install our handlers.

@wRAR
Copy link
Member

wRAR commented Sep 20, 2023

https://stackoverflow.com/questions/74929947/twisted-application-ignoring-a-certain-unix-signal-is-it-possible suggests reactor.addSystemEventTrigger('after', 'startup', install_your_handlers)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants