-
Notifications
You must be signed in to change notification settings - Fork 38.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
e2e.go: Add -deployment, add a kops deployment method #33518
Conversation
cc @kubernetes/test-infra-maintainers @kubernetes/sig-aws |
7b3088e
to
c21276b
Compare
c21276b
to
b4a3166
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exciting! This looks great although I have concerns about the download bit.
return finishRunning("up", exec.Command("./hack/e2e-internal/e2e-up.sh")) | ||
} | ||
|
||
// Is the e2e cluster up? | ||
func IsUp() bool { | ||
func (b bash) IsUp() bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will you sync up with Joe who is switching this to error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done and rebased.
if err != nil { | ||
return fmt.Errorf("error creating deployer: %v", err) | ||
} | ||
defer deploy.Destroy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we xmlWrap this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Obviated, killed Destroy
in the last rev after offline convo.
var err error | ||
binaryURL := os.Getenv("KOPS_BINARY_DIR_URL") | ||
if binaryURL == "" { | ||
binaryURL, err = download("https://storage.googleapis.com/kops-ci/bin/latest-ci.txt") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does this code live?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The repo is at https://github.com/kubernetes/kops
} | ||
|
||
func download(url string) (string, error) { | ||
resp, err := http.Get(url) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this handle intermittent flakes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Obviated by next rev.
return nil, fmt.Errorf("KOPS_STATE_STORE must be set to a valid S3 path for kops deployment") | ||
} | ||
// Presume the kops binary was supplied to us. | ||
binary := os.Getenv("KOPS_BINARY") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer we make this a flag. I would like us to move away from magical environment variables that are hard to discover.
What do you think about making the download and cleanup of the kops binary happen outside e2e.go? These two tasks do not seem germane to running tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Talked offline, and then I worked some more with this. I think I can do the download entirely in shell in .yaml
pretty easily, so I'm not sure it's worth the separate binary. I reworked it to be all flags and take the necessary bits coming in, and I can iterate further from there.
PS: Please sync up with @spxtr about how to test this before committing it. Also note that this requires pushing a new kubekins-e2e image once it is committed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kops interaction LGTM!
var err error | ||
binaryURL := os.Getenv("KOPS_BINARY_DIR_URL") | ||
if binaryURL == "" { | ||
binaryURL, err = download("https://storage.googleapis.com/kops-ci/bin/latest-ci.txt") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The repo is at https://github.com/kubernetes/kops
log.Printf("Can't get cluster size, sleeping: %v", err) | ||
continue | ||
} | ||
if n < k.nodes { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The master is registered as a node, as is not included in --node-count, so I think we want nodes + 1
(at least until we start testing HA master :-) )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I meant to fix that, oops. :)
// Assume that if we already have it, it's good. | ||
return nil | ||
} | ||
if err := finishRunning("kops export", exec.Command(k.binary, "export", "kubecfg", k.cluster)); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is nice, although actually kops update
will actually already have exported it for you - as you inevitably (?) want the kubecfg if you're dealing with the cluster...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was actually so you could point it at an existing kops-cluster and just say -test
and it would Just Work, which is way better than GCE.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, in case the kubecfg
section is unclear, this is striving to ensure that if you point it at a specific cluster, it tests that cluster, hence the /tmp
kubecfg that's basically isolated to the cluster in question. There's a longstanding badness in the GCE bash-isms where it's just implicitly adding it to the default kubecfg, which can get sliced if you run -up -test
in parallel, e.g.
func (k kops) Down() error { | ||
// We do a "kops get" first so the exit status of "kops delete" is | ||
// more sensical in the case of a non-existant cluster. ("kops | ||
// delete" will exit with status 1 on a non-existant cluster) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think we should treat that as not-an-error in kops? Or maybe a different exit code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not clear on that. I actually didn't feel like fighting the exit status. Maybe it should be a 0 exit code for delete
, but a 1 exit code for get
?
b4a3166
to
8193aa1
Compare
This looks like a great start, I will need to spend some time to understand how this can be used with kubeadm. |
@errordeveloper: Yeah, I gave @mikedanese a heads up I was doing this, so it should be a reasonable start. |
fe223e9
to
33ff8c5
Compare
Jenkins verification failed for commit 33ff8c577a5651dafa55b6d21fae0af3bb3c19a9. Full PR test history. The magic incantation to run this job again is |
33ff8c5
to
3f92d11
Compare
This is dipping a toe in the water. Companion to kubernetes/kubernetes#33518, and I won't merge it until that's in and a new docker image is pushed.
@zmerlynn thanks for that, so @mikedanese actually mentioned it to on @kubernetes/sig-cluster-lifecycle call today. I am now thinking about it. Until now, I was only thinking about sort of acceptance tests for kubeadm, where we have rpm/deb packages, dockerd and systemd all setup and run kubeadm, which is all much lower level then this, and we need to detect breakages at the level early, but we do need e2e test to run on top soon also. |
@errordeveloper: My goal here is actually to get |
+1 @errordeveloper I think those are two different kinds of layers you're mentioning. "Acceptance testing" for kubeadm (which is more lightweight) also sounds good to me indeed. |
3f92d11
to
ce28398
Compare
I confirmed manually from build logs that, even though some of the builds were failing, the builds were doing what we wanted. I just popped the manual testing commit. |
Jenkins GCI GKE smoke e2e failed for commit ce28398. Full PR test history. The magic incantation to run this job again is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall. It might be worth making "kops-nodes" just "nodes" and including that behavior in the bash deployer, but that can be done in the future if we want.
@@ -157,6 +167,11 @@ func run() error { | |||
} | |||
} | |||
|
|||
deploy, err := getDeployer() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest moving this into main
and passing the deployer to run
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
if err != nil { | ||
return nil, err | ||
} | ||
defer f.Close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm slightly worried about relying on the existence of a temp file after we've closed it, but it's probably fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ioutil.TempFile semantics leave the file until you unlink it, so this should be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good.
// TODO(zmerlynn): More cluster validation. This should perhaps be | ||
// added to kops and not here, but this is a fine place to loop | ||
// for now. | ||
for stop := time.Now().Add(10 * time.Minute); time.Now().Before(stop); time.Sleep(30 * time.Second) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is pretty neat.
|
||
func (k kops) ClusterSize() (int, error) { | ||
if err := k.SetupKubecfg(); err != nil { | ||
return -1, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason to return -1 over 0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. If anyone is looking at anything but err, I want to throw it in their face. :)
if err := k.SetupKubecfg(); err != nil { | ||
return -1, err | ||
} | ||
o, err := exec.Command("kubectl", "get", "nodes", "--no-headers").Output() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like this is independent of the deployer.
I'm a little wary of parsing raw command output, but I think this is relatively innocuous.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved it out of the kops
impl so that all it did was call the deployer SetupKubecfg
This splits off all the bash stuff into an interface, and plumbs through a separate interface to bring up a cluster using "kops" instead. Right now it assumes kops == AWS.
ce28398
to
d905478
Compare
@spxtr: PTAL. Also have no idea what's going on with @k8s-ci-robot right now, but the GKE builder is failing consistently. |
if err != nil { | ||
return nil, err | ||
} | ||
defer f.Close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good.
Jenkins GCI GCE e2e failed for commit d905478. Full PR test history. The magic incantation to run this job again is |
Jenkins GKE smoke e2e failed for commit d905478. Full PR test history. The magic incantation to run this job again is |
Automatic merge from submit-queue |
This is dipping a toe in the water. Companion to kubernetes/kubernetes#33518, and I won't merge it until that's in and a new docker image is pushed.
This is dipping a toe in the water. Companion to kubernetes/kubernetes#33518, and I won't merge it until that's in and a new docker image is pushed.
This is dipping a toe in the water. Companion to kubernetes/kubernetes#33518, and I won't merge it until that's in and a new docker image is pushed.
This is dipping a toe in the water. Companion to kubernetes/kubernetes#33518, and I won't merge it until that's in and a new docker image is pushed.
What this PR does / why we need it: Adds a
kops
deployment method to e2e.go, so we can add full e2e coverage for akops
based bringup.Special notes for your reviewer: A timely review would be appreciated given the wide-ish touchpoints through the file. I just had a pretty bad rebase here.
Release note:
This splits off all the bash stuff into an interface, and plumbs
through a separate interface to bring up a cluster using "kops"
instead. Right now it assumes kops == AWS.
This change is