New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
analyzers, cache: Allow to shut down gracefully #392
Conversation
cmd/analyzer/analyzer.go
Outdated
select { | ||
case <-signalChan: | ||
// We received the first Ctrl+C; try to shut down gracefully. | ||
a.Shutdown() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the caller of Start, either runAnalyzer or func in rootMain calls .Shutdown. should only have to return, rather than putting these a.Shutdown calls in various places. are we able to return?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh are we not able to return because we don't cancel the context until something calls Shutdown thus calling cancelAnalyzers? seems like it would be more natural to receive the signal and only call cancelAnalyzers. no a.Shutdown(). and let us return normally
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that might also let us extract this into a helper that only creates a ctx that cancels on ctrl+c
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't fully understand this comment. Is your main concern that the caller of Start()
now needs to be careful to call Shutdown()
, therefore a clean shutdown is not guaranteed? Not sure what kind of returning you had in mind; where and why would the callers want to return?
This might be relevant: The analysis Service
(which owns/controls all the analyzers; the other service we have is the API service) was originally designed with explicit cleanup, i.e. with Start+Shutdown. So the Service's Shutdown()
is always called when appropriate, e.g. here (and in another spot; there are only two calls to Start() anyway).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup, referring to that. i.e. it's (or was previously at least) up to the thing using the analyzer Service
(1) to call its Start and (2) to call its Shutdown.
my understanding of this change is that Shutdown is now something that sometimes gets called by the service itself inside Start, sometimes from outside. in fact, if I'm following this right, both?
I feel it's a smaller change and less complicated to cancel the context and let the subordinate analyzers finish, let the Start return, and let the thing using the analyzer Service
call Shutdown as before
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shutdown is now something that sometimes gets called by the service itself inside Start, sometimes from outside
That was indeed funky. Thank you for noticing.
Your observation caused me to think about the code more, and I noticed more smells that I then refactored away:
- The
Shutdown()
method of theService
had to be called by the caller ofService::Start()
. Then it makes more sense to just calldefer Shutdown()
fromStart()
and not burden the caller. We do so now. - The code introduced by this PR repeated that sin (Start+Shutdown) at the Analyzer level. I changed it now to also use
defer
. - Several
Shutdown()
methods throughout the codebase were IMO misleadingly named. They did not actively cause anything to shut down; they performed resource cleanup and assumed that any active processing had completed. I renamed them appropriately, IIRC to eithercleanup()
orClose()
.
I also refactored the code around this comment thread so it no longer waits for ctrl+C in a separate goroutine; I find the new approach much easier to grok.
I added all these changes as a separate commit, should you prefer to view them that way, but I think it makes more sense to diff against main.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oo looking good
little asymmetry, but don't let this distract you: initializing those resources is in a public function NewMain
etc, while cleaning them up is in a private method cleanup
. once you make a service, you have no way out other than to Start it and let Start clean things up when it's done
62cb459
to
a3150be
Compare
analyzer/runtime/runtime.go
Outdated
func (m *Main) Start() { | ||
ctx := context.Background() | ||
func (m *Main) Start(ctx context.Context) { | ||
defer m.cleanup() | ||
|
||
if err := m.prework(); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also pass the ctx
into prework
so that this cancelable context is also used in SendBatch
there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, are there any other context.Background's around?
8467cd6
to
14f8632
Compare
@@ -143,6 +149,14 @@ func (m *Main) Start() { | |||
m.cfg.Range.From, m.cfg.Range.To)) | |||
} | |||
|
|||
func (m *Main) cleanup() { | |||
if err := m.cfg.Source.Close(); err != nil { | |||
m.logger.Error("failed to cleanly close consensus data source", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consensus -> runtime (I think?)
cmd/analyzer/analyzer.go
Outdated
a.logger.Info("all analyzers have exited cleanly") | ||
return | ||
case <-signalChan: | ||
a.logger.Info("received second interrupt, forcing exit") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which is a little sophisticated. do we specifically want to program a custom second-ctrl-c behavior? if not, we could unregister the handler and let the second ctrl+c do whatever the default action is. exit the program immediately
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like it! Implemented it now. For my taste, the code is about the same complexity to understand though. There's no more explicit handling of a second ctrl+C, which is nicer and simpler (and more agnostic of surrounding code, which might theoretically have its own handlers); that's good. But there is the additional cognitive burden of tracking whether the custom signal interceptor is active or not.
I also feel the image is harder to grok than the code :). Also, not sure it's correct; the far left path takes three ctrl+C to reach "process exited"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, in the previous code, the second ctrl+c makes this function return, which transfers control to the defer
s. first, the defer to unregister the signal notifier. second, to call cleanup
. cleanup
I'm pretty sure is fast, it's only closing the target storage postgresql db connection. but what happens after that? the analyzer goroutines may still be running. I think go keeps the process alive until those goroutines finish. but with no signal handler installed, a third ctrl+c does make the program exit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks!
fa44ffa
to
c4695c5
Compare
This PR was motivated by #360 which introduces an on-disk cache (KV store), but does not close it cleanly when exiting. This PR adds a cleanup mechanism to the cache, and (partly by necessity, partly for code hygiene) extends the cleanup mechanism to all the analyzers.
At a high level, all analyzers are now provided with an external
context.Context
in theirStart()
method, and they are on the lookout for thectx.Done()
channel closing. The top-level code closes the channel on regular program termination and on SIGINT (Ctrl+C).There is an additional change piggybacking on this PR:
cbor
to serialize its keys. It previously usedgob
, which in my testing complained for certain keys. (I think null pointers? I didn't investigate further, as I think it's preferable to not introduce new serialization formats (here, gob) into the project anyway.)Testing: Ran in three different scenarios, verified that KV store (which is the furthest down the stack of cleanup methods) closes correctly: