Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure idempotence #306

Closed
wants to merge 1 commit into from

Conversation

bridgetkromhout
Copy link
Collaborator

@bridgetkromhout bridgetkromhout commented Jul 15, 2018

When I had an off-by-one error in the IPs file, I ended up with unexpected results:

  • The /etc/hosts file would preserve incorrect previous mappings of nodeN and an IP (and this is a problem if they must be grouped correctly to be able to reach each other, for security-group reasons)

  • k8s would preserve previous erroneous attempts at cluster configuration

This is my attempt to make a re-run with a group of different IPs actually succeed. Works in my testing but I'm interested in what you think, @jpetazzo.

@bridgetkromhout
Copy link
Collaborator Author

bridgetkromhout commented Jul 15, 2018

I have one cluster this doesn't fix:

70.37.61.214
70.37.60.126
70.37.56.140

which should actually be:

70.37.59.121
70.37.53.116
70.37.61.214

@bridgetkromhout
Copy link
Collaborator Author

bridgetkromhout commented Jul 15, 2018

Also: 70.37.59.121
70.37.53.116
70.37.63.79

which should be:

70.37.60.126
70.37.56.140
70.37.63.79

@bridgetkromhout
Copy link
Collaborator Author

Looks like if node1 was originally something different, ~docker/.ssh/authorized_keys on the nodes 2 and 3 probably need to be re-copied from node1.

@bridgetkromhout
Copy link
Collaborator Author

I think the off-by-one was due to a mis-merge or mis-split of files of IPs.

@bridgetkromhout
Copy link
Collaborator Author

I've generated all my items for the workshop at OSCON, but it's probably worth vetting this and getting it in, in general.

@jpetazzo
Copy link
Owner

Just to keep track of what we just discussed live:

When I initially wrote the scripts, I wanted to make it possible to re-run them during the workshop, e.g. to fix a catastrophic deployment issue. Therefore, I tried as best as I could to not affect clusters that work correctly.

This PR would execute kubeadm reset, which wipes out the configuration of the cluster ... And would defeat that purpose.

We have (at least) the following options:

  • add a new command, e.g. workshopctl reset <settingsfile> <tagname> to basically execute kubeadm reset everywhere (and perhaps other reset-y commands)
  • ditch away my initial assumptions, and execute kubeadm reset systematically (as the PR proposes)
  • switch behavior with a flag (as you mentioned when we chatted about it)
  • ohhh another idea: as we have workshopctl test, we could have workshopctl testandreset (or whatever!) that would run tests and reset the clusters that fail the tests; then all we have to do is rerun the deployment command, and it will run only on the reset clusters

WDYT?

@bridgetkromhout
Copy link
Collaborator Author

add a new command, e.g. workshopctl reset to basically execute kubeadm reset everywhere (and perhaps other reset-y commands)

I'm fine with this (although the flag switch also sounds simple).

ditch away my initial assumptions, and execute kubeadm reset systematically (as the PR proposes)

I think your initial assumptions are valid, though! (Now that I understand them.)

switch behavior with a flag (as you mentioned when we chatted about it)

I think this may be the simplest to implement?

ohhh another idea: as we have workshopctl test, we could have workshopctl testandreset (or whatever!) that would run tests and reset the clusters that fail the tests; then all we have to do is rerun the deployment command, and it will run only on the reset clusters

This appears to be more complicated than we really need?

@jpetazzo
Copy link
Owner

I'd go with a workshopctl reset or workshopctl kubereset command (just because no other command has flags so it would be weird to add flag parsing just for that odd one).

It should be fairly easy to implement; want me to do it?

@jpetazzo jpetazzo added the tools label Aug 27, 2018
@bridgetkromhout
Copy link
Collaborator Author

I'd go with a workshopctl reset or workshopctl kubereset command (just because no other command has flags so it would be weird to add flag parsing just for that odd one).

Probably kubereset in case we want to add a "reset all the things" later?

It should be fairly easy to implement; want me to do it?

Regretfully I have zero bandwidth right now, so although I don't want you to have to do unnecessary labor, I cannot do it right now.

@jpetazzo jpetazzo closed this in f543b54 Sep 29, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants