Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

e2e.go: Add -deployment, add a kops deployment method #33518

Merged
merged 1 commit into from
Sep 27, 2016

Conversation

zmerlynn
Copy link
Member

@zmerlynn zmerlynn commented Sep 26, 2016

What this PR does / why we need it: Adds a kops deployment method to e2e.go, so we can add full e2e coverage for a kops based bringup.

Special notes for your reviewer: A timely review would be appreciated given the wide-ish touchpoints through the file. I just had a pretty bad rebase here.

Release note:

Adds the -deployment option to e2e.go, adds the ability to run e2e.go using a `kops` deployment.

This splits off all the bash stuff into an interface, and plumbs
through a separate interface to bring up a cluster using "kops"
instead. Right now it assumes kops == AWS.


This change is Reviewable

@zmerlynn
Copy link
Member Author

cc @kubernetes/test-infra-maintainers @kubernetes/sig-aws

@k8s-github-robot k8s-github-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. release-note-label-needed labels Sep 26, 2016
@zmerlynn zmerlynn added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-label-needed labels Sep 26, 2016
Copy link
Contributor

@fejta fejta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exciting! This looks great although I have concerns about the download bit.

return finishRunning("up", exec.Command("./hack/e2e-internal/e2e-up.sh"))
}

// Is the e2e cluster up?
func IsUp() bool {
func (b bash) IsUp() bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will you sync up with Joe who is switching this to error?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done and rebased.

if err != nil {
return fmt.Errorf("error creating deployer: %v", err)
}
defer deploy.Destroy()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we xmlWrap this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Obviated, killed Destroy in the last rev after offline convo.

var err error
binaryURL := os.Getenv("KOPS_BINARY_DIR_URL")
if binaryURL == "" {
binaryURL, err = download("https://storage.googleapis.com/kops-ci/bin/latest-ci.txt")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does this code live?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}

func download(url string) (string, error) {
resp, err := http.Get(url)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this handle intermittent flakes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Obviated by next rev.

return nil, fmt.Errorf("KOPS_STATE_STORE must be set to a valid S3 path for kops deployment")
}
// Presume the kops binary was supplied to us.
binary := os.Getenv("KOPS_BINARY")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer we make this a flag. I would like us to move away from magical environment variables that are hard to discover.

What do you think about making the download and cleanup of the kops binary happen outside e2e.go? These two tasks do not seem germane to running tests.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Talked offline, and then I worked some more with this. I think I can do the download entirely in shell in .yaml pretty easily, so I'm not sure it's worth the separate binary. I reworked it to be all flags and take the necessary bits coming in, and I can iterate further from there.

@fejta
Copy link
Contributor

fejta commented Sep 27, 2016

PS: Please sync up with @spxtr about how to test this before committing it. Also note that this requires pushing a new kubekins-e2e image once it is committed.

@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 27, 2016
Copy link
Member

@justinsb justinsb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kops interaction LGTM!

var err error
binaryURL := os.Getenv("KOPS_BINARY_DIR_URL")
if binaryURL == "" {
binaryURL, err = download("https://storage.googleapis.com/kops-ci/bin/latest-ci.txt")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log.Printf("Can't get cluster size, sleeping: %v", err)
continue
}
if n < k.nodes {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The master is registered as a node, as is not included in --node-count, so I think we want nodes + 1 (at least until we start testing HA master :-) )

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I meant to fix that, oops. :)

// Assume that if we already have it, it's good.
return nil
}
if err := finishRunning("kops export", exec.Command(k.binary, "export", "kubecfg", k.cluster)); err != nil {
Copy link
Member

@justinsb justinsb Sep 27, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is nice, although actually kops update will actually already have exported it for you - as you inevitably (?) want the kubecfg if you're dealing with the cluster...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was actually so you could point it at an existing kops-cluster and just say -test and it would Just Work, which is way better than GCE.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, in case the kubecfg section is unclear, this is striving to ensure that if you point it at a specific cluster, it tests that cluster, hence the /tmp kubecfg that's basically isolated to the cluster in question. There's a longstanding badness in the GCE bash-isms where it's just implicitly adding it to the default kubecfg, which can get sliced if you run -up -test in parallel, e.g.

func (k kops) Down() error {
// We do a "kops get" first so the exit status of "kops delete" is
// more sensical in the case of a non-existant cluster. ("kops
// delete" will exit with status 1 on a non-existant cluster)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we should treat that as not-an-error in kops? Or maybe a different exit code?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not clear on that. I actually didn't feel like fighting the exit status. Maybe it should be a 0 exit code for delete, but a 1 exit code for get?

@zmerlynn
Copy link
Member Author

@fejta: PTAL. Syncing with @spxtr now.

@k8s-github-robot k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 27, 2016
@errordeveloper
Copy link
Member

This looks like a great start, I will need to spend some time to understand how this can be used with kubeadm.

@zmerlynn
Copy link
Member Author

@errordeveloper: Yeah, I gave @mikedanese a heads up I was doing this, so it should be a reasonable start.

@k8s-ci-robot
Copy link
Contributor

Jenkins verification failed for commit 33ff8c577a5651dafa55b6d21fae0af3bb3c19a9. Full PR test history.

The magic incantation to run this job again is @k8s-bot verify test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

zmerlynn added a commit to zmerlynn/test-infra that referenced this pull request Sep 27, 2016
This is dipping a toe in the water. Companion to
kubernetes/kubernetes#33518, and I won't merge
it until that's in and a new docker image is pushed.
@errordeveloper
Copy link
Member

@zmerlynn thanks for that, so @mikedanese actually mentioned it to on @kubernetes/sig-cluster-lifecycle call today. I am now thinking about it. Until now, I was only thinking about sort of acceptance tests for kubeadm, where we have rpm/deb packages, dockerd and systemd all setup and run kubeadm, which is all much lower level then this, and we need to detect breakages at the level early, but we do need e2e test to run on top soon also.

@zmerlynn
Copy link
Member Author

@errordeveloper: My goal here is actually to get kops basic deployment into submit queue / PR builder territory, so if someone totally trashes AWS for some obscure reason, it's really easy to see. See kubernetes/test-infra#681 for a proposed updown job, which is a copy of one I added for GKE that just runs a barebones deployment / test networking / down. I've found the networking conformance tests are an excellent very short sniff test for basic cluster operation in the past (like O(<1m) to run, so it makes a really good smoke).

@luxas
Copy link
Member

luxas commented Sep 27, 2016

+1

@errordeveloper I think those are two different kinds of layers you're mentioning.
Indeed, the cluster has to be deployed first (via kube-up, kops or kubeadm), but after that I think the e2e process should be pretty the same.

"Acceptance testing" for kubeadm (which is more lightweight) also sounds good to me indeed.
But we need a place (maybe test-infra) where we can put the few kubeadm commands needed for bootstrapping a cluster in an automated fashion in order to be able to test kubeadm properly.

@zmerlynn
Copy link
Member Author

I confirmed manually from build logs that, even though some of the builds were failing, the builds were doing what we wanted. I just popped the manual testing commit.

@k8s-ci-robot
Copy link
Contributor

Jenkins GCI GKE smoke e2e failed for commit ce28398. Full PR test history.

The magic incantation to run this job again is @k8s-bot gci gke e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

Copy link
Contributor

@spxtr spxtr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall. It might be worth making "kops-nodes" just "nodes" and including that behavior in the bash deployer, but that can be done in the future if we want.

@@ -157,6 +167,11 @@ func run() error {
}
}

deploy, err := getDeployer()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest moving this into main and passing the deployer to run.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

if err != nil {
return nil, err
}
defer f.Close()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm slightly worried about relying on the existence of a temp file after we've closed it, but it's probably fine.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ioutil.TempFile semantics leave the file until you unlink it, so this should be fine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

// TODO(zmerlynn): More cluster validation. This should perhaps be
// added to kops and not here, but this is a fine place to loop
// for now.
for stop := time.Now().Add(10 * time.Minute); time.Now().Before(stop); time.Sleep(30 * time.Second) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty neat.


func (k kops) ClusterSize() (int, error) {
if err := k.SetupKubecfg(); err != nil {
return -1, err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to return -1 over 0?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. If anyone is looking at anything but err, I want to throw it in their face. :)

if err := k.SetupKubecfg(); err != nil {
return -1, err
}
o, err := exec.Command("kubectl", "get", "nodes", "--no-headers").Output()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this is independent of the deployer.

I'm a little wary of parsing raw command output, but I think this is relatively innocuous.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved it out of the kops impl so that all it did was call the deployer SetupKubecfg

This splits off all the bash stuff into an interface, and plumbs
through a separate interface to bring up a cluster using "kops"
instead. Right now it assumes kops == AWS.
@zmerlynn
Copy link
Member Author

@spxtr: PTAL. Also have no idea what's going on with @k8s-ci-robot right now, but the GKE builder is failing consistently.

if err != nil {
return nil, err
}
defer f.Close()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

@zmerlynn zmerlynn added lgtm "Looks good to me", indicates that a PR is ready to be merged. retest-not-required labels Sep 27, 2016
@k8s-ci-robot
Copy link
Contributor

Jenkins GCI GCE e2e failed for commit d905478. Full PR test history.

The magic incantation to run this job again is @k8s-bot gci gce e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

@k8s-ci-robot
Copy link
Contributor

Jenkins GKE smoke e2e failed for commit d905478. Full PR test history.

The magic incantation to run this job again is @k8s-bot gke e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

@k8s-github-robot
Copy link

Automatic merge from submit-queue

@k8s-github-robot k8s-github-robot merged commit 1f33091 into kubernetes:master Sep 27, 2016
@zmerlynn zmerlynn deleted the e2e-kops-up-2 branch September 27, 2016 22:41
zmerlynn added a commit to zmerlynn/test-infra that referenced this pull request Sep 28, 2016
This is dipping a toe in the water. Companion to
kubernetes/kubernetes#33518, and I won't merge
it until that's in and a new docker image is pushed.
zmerlynn added a commit to zmerlynn/test-infra that referenced this pull request Sep 28, 2016
This is dipping a toe in the water. Companion to
kubernetes/kubernetes#33518, and I won't merge
it until that's in and a new docker image is pushed.
zmerlynn added a commit to zmerlynn/test-infra that referenced this pull request Sep 28, 2016
This is dipping a toe in the water. Companion to
kubernetes/kubernetes#33518, and I won't merge
it until that's in and a new docker image is pushed.
zmerlynn added a commit to zmerlynn/test-infra that referenced this pull request Oct 3, 2016
This is dipping a toe in the water. Companion to
kubernetes/kubernetes#33518, and I won't merge
it until that's in and a new docker image is pushed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/test lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants