New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
force quit ipfs-cluster-service on second ctrl-c #358
Conversation
5376ed0
to
a6fb614
Compare
a6fb614
to
8f236da
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apart from some minor differences in opinion about naming this LGTM. I'm personally not worried about code climate cognitive complexity.
return cfg, &cfgs{clusterCfg, apiCfg, ipfshttpCfg, consensusCfg, trackerCfg, monCfg, diskInfCfg, numpinInfCfg} | ||
} | ||
|
||
type cfgs struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My pleasure 😄 The giant function signatures were bugging me.
ipfs-cluster-service/main.go
Outdated
@@ -222,20 +222,20 @@ configuration. | |||
}, | |||
Action: func(c *cli.Context) error { | |||
userSecret, userSecretDefined := userProvidedSecret(c.Bool("custom-secret")) | |||
cfg, clustercfg, _, _, _, _, _, _, _ := makeConfigs() | |||
defer cfg.Shutdown() // wait for saves | |||
cfgmgr, cfgs := makeConfigs() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This very subjective, but I'm wondering if there is a name for this variable that's easier to read. Can't pin down exactly what it is about this name that trips me up so it might just be me though. Some alternative ideas: Just mgr
? Just cfg
? cfgMgr
? configManager
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy to do cfgMgr
, it is more idiomatic anyway. @ZenGround0 @hsanjuan to clarify, the config manager loads the initial configuration from file, generates multiaddrs for the new node, and then saves that configuration back to file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that sounds mostly right. The config manager registers all of the other configs in makeConfigs to "manage" them. It only generates default values in the case of a call to the init
subcommand and from then on loads the saved config values after registration. As you've noticed, the manager watches for certain changes to its registered component configs and then saves the new values. Also a cluster calls the manager's Validate function to validate each registered component config during initialization.
8f236da
to
23d69e9
Compare
@ZenGround0 naming change has been made 👍 and thanks for the explanation of the config manager |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small things, lgtm otherwise.
It may need rebase though, when I merge the libp2p-http stuff. we'll see.
ipfs-cluster-service/main.go
Outdated
} | ||
go func() { | ||
alreadyExiting = true | ||
cluster.Shutdown() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
err =
missing
ipfs-cluster-service/main.go
Outdated
for { | ||
select { | ||
case <-signalChan: | ||
err = cluster.Shutdown() | ||
checkErr("shutting down cluster", err) | ||
if alreadyExiting { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is potentially racy? If 2 signals are received quickly, it may try to read alreadyExisting
at the same time it's writing it. Easiest solution is to move alreadyExiting = true
out of the goroutine.
23d69e9
to
c4ec779
Compare
Cool, thanks! @lanzafame needs rebase though! |
c4ec779
to
b94b959
Compare
@hsanjuan rebase done 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lanzafame I have been thinking... because forcing an exit is a potentially really bad thing, we shouldn't allow the second ctrl-c too close to the first one. So we should discard any second ctrl-c that happens within, say, within 10 seconds of the first one.
After 10 seconds, we should inform the user that, from that point, she can force-close cluster: something like Shutdown is taking long. Press Ctrl-c again to manually kill cluster. Note that this may corrupt the local cluster state.
Also, let's fix that code climate warning. |
@hsanjuan I have run the codeclimate analysis locally on the branch and it isn't complaining about the cyclometric complexity of
I understand where you are coming from with this, I had the same thought, but is there really much benefit to forcing the user to wait 10 seconds? I agree that we should definitely print the warning message regarding corrupting the cluster state, but I think we should leave it up to the user to decide if they want to risk corrupting their cluster state. IMHO, a prompt to confirm force quitting, might be a better option. Thoughts? |
b94b959
to
afc1131
Compare
@lanzafame what I'm fearing is that you can kill go-ipfs with double ctrl-c without any consequences, but in cluster you might interrupt the snapshotting process and screw the snapshot. So this should only be done if snapshotting gets stuck. We should also avoid accidental double ctrl-c. Perhaps we can make it 3 ctrl-c. One shutdown, second prints warning about a third. Third kills cluster? |
@hsanjuan the latest push I did, used the `yesNoPrompt` on the second
ctrl-c to make sure that the user definitely wants to quit with the
possibility of corruption.
On Wed, 28 Mar 2018 at 18:06 Hector Sanjuan ***@***.***> wrote:
@lanzafame <https://github.com/lanzafame> what I'm fearing is that you
can kill go-ipfs with double ctrl-c without any consequences, but in
cluster you might interrupt the snapshotting process and screw the
snapshot. So this should only be done if snapshotting gets stuck. We should
also avoid accidental double ctrl-c.
Perhaps we can make it 3 ctrl-c. One shutdown, second prints warning about
a third. Third kills cluster?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#358 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AFpnaIPScWl7ubNGQHIMo5VBtLU0S3qkks5ti0RqgaJpZM4S4CMq>
.
--
Adrian Lanzafame
m: 0435419266
e: adrianlanzafame92@gmail.com
gh: github.com/lanzafame
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lanzafame I'm not much into interactive questioning.
It may be that the signal does not come from a ctrl-c either but from a kill. In general I think it's anti-pattern when a daemon which is supposed to be running in the background blocks and asks something interactively.
Plus unmute logs don't respect the set log levels, and errors should not be lost because we're waiting on some user info.
Plus second ctrl-c re-launches the shutudown goroutine.
And third ctrl-c would relaunch the shutdown goroutine and re-ask while the other prompt is still waiting to read an answer.
Can we do the 3 ctrl-c option? 1) shutdown 2) warning message 3) kill
Good point. I will do the triple.
On Wed., 28 Mar. 2018, 18:32 Hector Sanjuan, ***@***.***> wrote:
***@***.**** requested changes on this pull request.
@lanzafame <https://github.com/lanzafame> I'm not much into interactive
questioning.
It may be that the signal does not come from a ctrl-c either but from a
kill. In general I think it's anti-pattern when a daemon which is supposed
to be running in the background blocks and asks something interactively.
Plus unmute logs don't respect the set log levels, and errors should not
be lost because we're waiting on some user info.
Plus second ctrl-c re-launches the shutudown goroutine.
And third ctrl-c would relaunch the shutdown goroutine and re-ask while
the other prompt is still waiting to read an answer.
Can we do the 3 ctrl-c option? 1) shutdown 2) warning message 3) kill
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#358 (review)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AFpnaGh96UcgTFLekfuve8PTca2T02L4ks5ti0qOgaJpZM4S4CMq>
.
--
Adrian Lanzafame
m: 0435419266
e: adrianlanzafame92@gmail.com
gh: github.com/lanzafame
|
afc1131
to
685f58e
Compare
@hsanjuan changes up 👍 |
@lanzafame something is funky with sharness tests. Can you have a look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but tests should be green. Also, holding this off until I have released 0.3.5.
My guess is that our sharness helper function cluster-kill now has the power to force a shutdown and is causing the corrupted state that we are warning about. We could make cluster_kill parse output and hold off for a longer backoff if it sees the first signal has been received. Increasing the backoff time entirely might also give some relief to sharness tests but is probably going to be racy. |
but it only |
duh thanks for the correction |
It seems that the following test is the first to fail and also is the cause of cascading failures in subsequent tests:
But when running sharness tests locally, this one passes, and at time of commenting none of the others have failed (will edit if they do). I have kicked off the Travis CI builds to see if they pass, if they don't I will dig in but the failure seems rather unrelated to the changed code. |
ipfs-cluster-service/main.go
Outdated
switch ctrlcCount { | ||
case 1: | ||
go func() { | ||
time.Sleep(1 * time.Minute) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course, stupid me forgets to remove the sleep statement before pushing up 🤦♂️ . Needed it to test cluster 'hanging'.
Refactor daemon() to reduce code complexity. Refactor configuration in ipfs-cluster-service. License: MIT Signed-off-by: Adrian Lanzafame <adrianlanzafame92@gmail.com>
685f58e
to
31bce16
Compare
@hsanjuan @ZenGround0 So my |
Resolves: #258.