force quit ipfs-cluster-service on second ctrl-c #358

lanzafame · 2018-03-23T01:26:27Z

Resolves: #258.

coveralls · 2018-03-23T02:02:11Z

Coverage decreased (-0.1%) to 66.352% when pulling 685f58e on feat/cmd/svc/force-quit into f8acd4f on master.

ZenGround0

Apart from some minor differences in opinion about naming this LGTM. I'm personally not worried about code climate cognitive complexity.

ZenGround0 · 2018-03-23T21:38:44Z

ipfs-cluster-service/main.go

+	return cfg, &cfgs{clusterCfg, apiCfg, ipfshttpCfg, consensusCfg, trackerCfg, monCfg, diskInfCfg, numpinInfCfg}
+}
+
+type cfgs struct {


Thanks for this

My pleasure 😄 The giant function signatures were bugging me.

ZenGround0 · 2018-03-23T21:44:09Z

ipfs-cluster-service/main.go

@@ -222,20 +222,20 @@ configuration.
 			},
 			Action: func(c *cli.Context) error {
 				userSecret, userSecretDefined := userProvidedSecret(c.Bool("custom-secret"))
-				cfg, clustercfg, _, _, _, _, _, _, _ := makeConfigs()
-				defer cfg.Shutdown() // wait for saves
+				cfgmgr, cfgs := makeConfigs()


This very subjective, but I'm wondering if there is a name for this variable that's easier to read. Can't pin down exactly what it is about this name that trips me up so it might just be me though. Some alternative ideas: Just mgr? Just cfg? cfgMgr? configManager?

I'm happy to do cfgMgr, it is more idiomatic anyway. @ZenGround0 @hsanjuan to clarify, the config manager loads the initial configuration from file, generates multiaddrs for the new node, and then saves that configuration back to file?

Yeah that sounds mostly right. The config manager registers all of the other configs in makeConfigs to "manage" them. It only generates default values in the case of a call to the init subcommand and from then on loads the saved config values after registration. As you've noticed, the manager watches for certain changes to its registered component configs and then saves the new values. Also a cluster calls the manager's Validate function to validate each registered component config during initialization.

lanzafame · 2018-03-26T02:04:53Z

@ZenGround0 naming change has been made 👍 and thanks for the explanation of the config manager

hsanjuan

Small things, lgtm otherwise.

It may need rebase though, when I merge the libp2p-http stuff. we'll see.

hsanjuan · 2018-03-26T09:43:16Z

ipfs-cluster-service/main.go

+			}
+			go func() {
+				alreadyExiting = true
+				cluster.Shutdown()


err = missing

hsanjuan · 2018-03-26T09:48:46Z

ipfs-cluster-service/main.go

 	for {
 		select {
 		case <-signalChan:
-			err = cluster.Shutdown()
-			checkErr("shutting down cluster", err)
+			if alreadyExiting {


This is potentially racy? If 2 signals are received quickly, it may try to read alreadyExisting at the same time it's writing it. Easiest solution is to move alreadyExiting = true out of the goroutine.

hsanjuan · 2018-03-26T13:25:38Z

Cool, thanks!

@lanzafame needs rebase though!

lanzafame · 2018-03-26T23:29:18Z

@hsanjuan rebase done 👍

hsanjuan

@lanzafame I have been thinking... because forcing an exit is a potentially really bad thing, we shouldn't allow the second ctrl-c too close to the first one. So we should discard any second ctrl-c that happens within, say, within 10 seconds of the first one.

After 10 seconds, we should inform the user that, from that point, she can force-close cluster: something like Shutdown is taking long. Press Ctrl-c again to manually kill cluster. Note that this may corrupt the local cluster state.

hsanjuan · 2018-03-27T12:08:33Z

Also, let's fix that code climate warning.

lanzafame · 2018-03-28T06:33:59Z

Also, let's fix that code climate warning.

@hsanjuan I have run the codeclimate analysis locally on the branch and it isn't complaining about the cyclometric complexity of daemon there. I am guessing it hasn't properly grabbed the latest changes.

I have been thinking... because forcing an exit is a potentially really bad thing, we shouldn't allow the second ctrl-c too close to the first one. So we should discard any second ctrl-c that happens within, say, within 10 seconds of the first one.

I understand where you are coming from with this, I had the same thought, but is there really much benefit to forcing the user to wait 10 seconds? I agree that we should definitely print the warning message regarding corrupting the cluster state, but I think we should leave it up to the user to decide if they want to risk corrupting their cluster state. IMHO, a prompt to confirm force quitting, might be a better option. Thoughts?

hsanjuan · 2018-03-28T08:05:56Z

@lanzafame what I'm fearing is that you can kill go-ipfs with double ctrl-c without any consequences, but in cluster you might interrupt the snapshotting process and screw the snapshot. So this should only be done if snapshotting gets stuck. We should also avoid accidental double ctrl-c.

Perhaps we can make it 3 ctrl-c. One shutdown, second prints warning about a third. Third kills cluster?

lanzafame · 2018-03-28T08:17:31Z

@hsanjuan the latest push I did, used the `yesNoPrompt` on the second ctrl-c to make sure that the user definitely wants to quit with the possibility of corruption.

On Wed, 28 Mar 2018 at 18:06 Hector Sanjuan ***@***.***> wrote: @lanzafame <https://github.com/lanzafame> what I'm fearing is that you can kill go-ipfs with double ctrl-c without any consequences, but in cluster you might interrupt the snapshotting process and screw the snapshot. So this should only be done if snapshotting gets stuck. We should also avoid accidental double ctrl-c. Perhaps we can make it 3 ctrl-c. One shutdown, second prints warning about a third. Third kills cluster? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#358 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AFpnaIPScWl7ubNGQHIMo5VBtLU0S3qkks5ti0RqgaJpZM4S4CMq> .

-- Adrian Lanzafame m: 0435419266 e: adrianlanzafame92@gmail.com gh: github.com/lanzafame

hsanjuan

@lanzafame I'm not much into interactive questioning.

It may be that the signal does not come from a ctrl-c either but from a kill. In general I think it's anti-pattern when a daemon which is supposed to be running in the background blocks and asks something interactively.

Plus unmute logs don't respect the set log levels, and errors should not be lost because we're waiting on some user info.

Plus second ctrl-c re-launches the shutudown goroutine.

And third ctrl-c would relaunch the shutdown goroutine and re-ask while the other prompt is still waiting to read an answer.

Can we do the 3 ctrl-c option? 1) shutdown 2) warning message 3) kill

lanzafame · 2018-03-28T08:34:04Z

Good point. I will do the triple.

On Wed., 28 Mar. 2018, 18:32 Hector Sanjuan, ***@***.***> wrote: ***@***.**** requested changes on this pull request. @lanzafame <https://github.com/lanzafame> I'm not much into interactive questioning. It may be that the signal does not come from a ctrl-c either but from a kill. In general I think it's anti-pattern when a daemon which is supposed to be running in the background blocks and asks something interactively. Plus unmute logs don't respect the set log levels, and errors should not be lost because we're waiting on some user info. Plus second ctrl-c re-launches the shutudown goroutine. And third ctrl-c would relaunch the shutdown goroutine and re-ask while the other prompt is still waiting to read an answer. Can we do the 3 ctrl-c option? 1) shutdown 2) warning message 3) kill — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#358 (review)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AFpnaGh96UcgTFLekfuve8PTca2T02L4ks5ti0qOgaJpZM4S4CMq> .

-- Adrian Lanzafame m: 0435419266 e: adrianlanzafame92@gmail.com gh: github.com/lanzafame

lanzafame · 2018-03-28T09:17:27Z

@hsanjuan changes up 👍

hsanjuan · 2018-03-28T11:03:52Z

@lanzafame something is funky with sharness tests. Can you have a look?

hsanjuan

LGTM, but tests should be green. Also, holding this off until I have released 0.3.5.

ZenGround0 · 2018-03-28T15:15:45Z

My guess is that our sharness helper function cluster-kill now has the power to force a shutdown and is causing the corrupted state that we are warning about.

We could make cluster_kill parse output and hold off for a longer backoff if it sees the first signal has been received. Increasing the backoff time entirely might also give some relief to sharness tests but is probably going to be racy.

hsanjuan · 2018-03-28T15:34:16Z

but it only cluster-kill only sends the signal once, so it shouldn't affect anything

ZenGround0 · 2018-03-28T17:07:19Z

duh thanks for the correction

lanzafame · 2018-03-29T01:01:17Z

It seems that the following test is the first to fail and also is the cause of cascading failures in subsequent tests:

*** t0025-ctl-status-report-commands.sh ***
not ok 1 - cluster-ctl can read id
#	
#	    id=`cluster_id`
#	    [ -n "$id" ] && ( ipfs-cluster-ctl id | egrep -q "$id" )
#

But when running sharness tests locally, this one passes, and at time of commenting none of the others have failed (will edit if they do). I have kicked off the Travis CI builds to see if they pass, if they don't I will dig in but the failure seems rather unrelated to the changed code.

lanzafame · 2018-03-29T01:58:52Z

ipfs-cluster-service/main.go

+	switch ctrlcCount {
+	case 1:
+		go func() {
+			time.Sleep(1 * time.Minute)


Of course, stupid me forgets to remove the sleep statement before pushing up 🤦‍♂️ . Needed it to test cluster 'hanging'.

Refactor daemon() to reduce code complexity. Refactor configuration in ipfs-cluster-service. License: MIT Signed-off-by: Adrian Lanzafame <adrianlanzafame92@gmail.com>

lanzafame · 2018-03-29T02:37:01Z

@hsanjuan @ZenGround0 So my time.Sleep stuff up, was what was causing the weird errors.

ghost assigned lanzafame Mar 23, 2018

ghost added the status/in-progress In progress label Mar 23, 2018

lanzafame force-pushed the feat/cmd/svc/force-quit branch from 5376ed0 to a6fb614 Compare March 23, 2018 01:51

lanzafame force-pushed the feat/cmd/svc/force-quit branch from a6fb614 to 8f236da Compare March 23, 2018 03:03

ZenGround0 previously approved these changes Mar 23, 2018

View reviewed changes

lanzafame dismissed ZenGround0’s stale review via 23d69e9 March 26, 2018 02:03

lanzafame force-pushed the feat/cmd/svc/force-quit branch from 8f236da to 23d69e9 Compare March 26, 2018 02:03

hsanjuan requested changes Mar 26, 2018

View reviewed changes

lanzafame force-pushed the feat/cmd/svc/force-quit branch from 23d69e9 to c4ec779 Compare March 26, 2018 10:12

hsanjuan previously approved these changes Mar 26, 2018

View reviewed changes

hsanjuan mentioned this pull request Mar 26, 2018

Release 0.3.5 #330

Closed

lanzafame dismissed hsanjuan’s stale review via b94b959 March 26, 2018 23:28

lanzafame force-pushed the feat/cmd/svc/force-quit branch from c4ec779 to b94b959 Compare March 26, 2018 23:28

hsanjuan requested changes Mar 27, 2018

View reviewed changes

lanzafame force-pushed the feat/cmd/svc/force-quit branch from b94b959 to afc1131 Compare March 28, 2018 07:27

hsanjuan requested changes Mar 28, 2018

View reviewed changes

lanzafame force-pushed the feat/cmd/svc/force-quit branch from afc1131 to 685f58e Compare March 28, 2018 09:16

hsanjuan previously approved these changes Mar 28, 2018

View reviewed changes

lanzafame commented Mar 29, 2018

View reviewed changes

cmd/svc: force quit ipfs-cluster-service on 2nd ctrl-c

31bce16

Refactor daemon() to reduce code complexity. Refactor configuration in ipfs-cluster-service. License: MIT Signed-off-by: Adrian Lanzafame <adrianlanzafame92@gmail.com>

lanzafame dismissed hsanjuan’s stale review via 31bce16 March 29, 2018 02:03

lanzafame force-pushed the feat/cmd/svc/force-quit branch from 685f58e to 31bce16 Compare March 29, 2018 02:03

hsanjuan approved these changes Mar 29, 2018

View reviewed changes

hsanjuan added RFM Ready for Merge and removed status/in-progress In progress labels Mar 29, 2018

hsanjuan merged commit c651aed into master Apr 4, 2018

hsanjuan deleted the feat/cmd/svc/force-quit branch April 4, 2018 10:22

hsanjuan mentioned this pull request Apr 5, 2018

Release 0.4.0 #372

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

force quit ipfs-cluster-service on second ctrl-c #358

force quit ipfs-cluster-service on second ctrl-c #358

lanzafame commented Mar 23, 2018

coveralls commented Mar 23, 2018 •

edited

ZenGround0 left a comment •

edited

ZenGround0 Mar 23, 2018

lanzafame Mar 24, 2018

ZenGround0 Mar 23, 2018

lanzafame Mar 24, 2018

ZenGround0 Mar 24, 2018

lanzafame commented Mar 26, 2018

hsanjuan left a comment

hsanjuan Mar 26, 2018

hsanjuan Mar 26, 2018

hsanjuan commented Mar 26, 2018

lanzafame commented Mar 26, 2018

hsanjuan left a comment

hsanjuan commented Mar 27, 2018

lanzafame commented Mar 28, 2018

hsanjuan commented Mar 28, 2018

lanzafame commented Mar 28, 2018 via email

hsanjuan left a comment

lanzafame commented Mar 28, 2018 via email

lanzafame commented Mar 28, 2018

hsanjuan commented Mar 28, 2018

hsanjuan left a comment

ZenGround0 commented Mar 28, 2018

hsanjuan commented Mar 28, 2018

ZenGround0 commented Mar 28, 2018

lanzafame commented Mar 29, 2018

lanzafame Mar 29, 2018

lanzafame commented Mar 29, 2018

force quit ipfs-cluster-service on second ctrl-c #358

force quit ipfs-cluster-service on second ctrl-c #358

Conversation

lanzafame commented Mar 23, 2018

coveralls commented Mar 23, 2018 • edited

ZenGround0 left a comment • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lanzafame commented Mar 26, 2018

hsanjuan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hsanjuan commented Mar 26, 2018

lanzafame commented Mar 26, 2018

hsanjuan left a comment

Choose a reason for hiding this comment

hsanjuan commented Mar 27, 2018

lanzafame commented Mar 28, 2018

hsanjuan commented Mar 28, 2018

lanzafame commented Mar 28, 2018 via email

hsanjuan left a comment

Choose a reason for hiding this comment

lanzafame commented Mar 28, 2018 via email

lanzafame commented Mar 28, 2018

hsanjuan commented Mar 28, 2018

hsanjuan left a comment

Choose a reason for hiding this comment

ZenGround0 commented Mar 28, 2018

hsanjuan commented Mar 28, 2018

ZenGround0 commented Mar 28, 2018

lanzafame commented Mar 29, 2018

Choose a reason for hiding this comment

lanzafame commented Mar 29, 2018

coveralls commented Mar 23, 2018 •

edited

ZenGround0 left a comment •

edited