Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

e2e: add ssh mcd test #541

Merged
merged 1 commit into from Mar 21, 2019

Conversation

Projects
None yet
6 participants
@kikisdeliveryservice
Copy link
Member

kikisdeliveryservice commented Mar 11, 2019

Test verifies that MCD updates MCP, daemons and
writes new ssh keys to node filesystems.

Closes #546

@kikisdeliveryservice

This comment has been minimized.

Copy link
Member Author

kikisdeliveryservice commented Mar 12, 2019

hmm passed before added the done check but seeing lots of timeouts in ci that seem unrelated to mco.

/retest

@kikisdeliveryservice

This comment has been minimized.

Copy link
Member Author

kikisdeliveryservice commented Mar 12, 2019

/retest

@cgwalters

This comment has been minimized.

Copy link
Contributor

cgwalters commented Mar 13, 2019

This is probably OK, but we could go the extra mile and actually verify that the SSH key ends up on the node's filesystem. Basically the API equivalent of oc rsh pods/machine-config-daemon-xyz ls /rootfs/var/roothome/.ssh/authorized_keys.

@kikisdeliveryservice

This comment has been minimized.

Copy link
Member Author

kikisdeliveryservice commented Mar 13, 2019

@cgwalters Yep, I was thinking doing a real check would make sense too! First pass was me poking around with the e2e for the first time. Ty!

@kikisdeliveryservice kikisdeliveryservice force-pushed the kikisdeliveryservice:e2e-ssh branch from daf0563 to 145baa5 Mar 14, 2019

@kikisdeliveryservice

This comment has been minimized.

Copy link
Member Author

kikisdeliveryservice commented Mar 14, 2019

Ok so I think I was checking the Done annotation incorrectly before. Trying something new.

Update: Ok cool, that worked!

@ashcrow

This comment has been minimized.

Copy link
Member

ashcrow commented Mar 14, 2019

level=info msg="Waiting up to 30m0s for the cluster to initialize..."
level=fatal msg="failed to initialize the cluster: timed out waiting for the condition"
@kikisdeliveryservice

This comment has been minimized.

Copy link
Member Author

kikisdeliveryservice commented Mar 14, 2019

@ashcrow we are hitting aws issues all over openshift rn.

@kikisdeliveryservice kikisdeliveryservice force-pushed the kikisdeliveryservice:e2e-ssh branch 2 times, most recently from dd77dc6 to a52163b Mar 15, 2019

@kikisdeliveryservice kikisdeliveryservice force-pushed the kikisdeliveryservice:e2e-ssh branch from a52163b to e96aeff Mar 19, 2019

@openshift-ci-robot openshift-ci-robot added size/M and removed size/L labels Mar 19, 2019

@kikisdeliveryservice kikisdeliveryservice force-pushed the kikisdeliveryservice:e2e-ssh branch from e96aeff to c54ca87 Mar 19, 2019

@kikisdeliveryservice

This comment has been minimized.

Copy link
Member Author

kikisdeliveryservice commented Mar 19, 2019

For tomorrow: need to figure out a way to do this: (oc debug isn't working on my local cluster)

  • oc get pods -n openshift-machine-config-operator --field-selector spec.nodeNAME=node.Name
  • oc rsh -n openshift-machine-config-operator machine-config-daemon-zt85f
  • cat /rootfs/home/core/.ssh/authorized_keys to search file for test key

@kikisdeliveryservice kikisdeliveryservice force-pushed the kikisdeliveryservice:e2e-ssh branch from ef9ad17 to 7c647e2 Mar 19, 2019

@kikisdeliveryservice

This comment has been minimized.

Copy link
Member Author

kikisdeliveryservice commented Mar 19, 2019

basically i have a worker node name. so i need to get the daemon pod name that goes along with it, then rsh into that daemon and grab the authorized_keys file and check that the test key is inside.

the way to get a list of all the mcd is here:

mcdList, err := cs.Pods("openshift-machine-config-operator").List(listOptions)

should i should be able to somehow filter that list to get that daemon pod whose spec.nodeName matches worker node.Name without having to use exec.Command to execute oc get pods -n openshift-machine-config-operator --field-selector ...

maybe some sort of filter for spec.nodeName on cs.Pods().List or cs.Pods().Get()?

@cgwalters

This comment has been minimized.

Copy link
Contributor

cgwalters commented Mar 19, 2019

@kikisdeliveryservice kikisdeliveryservice force-pushed the kikisdeliveryservice:e2e-ssh branch from 30c1796 to 7923d26 Mar 20, 2019

@runcom

This comment has been minimized.

Copy link
Member

runcom commented Mar 20, 2019

/approve

pending LGTM until this works and passes and others review this

(great work with this!)

@kikisdeliveryservice

This comment has been minimized.

Copy link
Member Author

kikisdeliveryservice commented Mar 20, 2019

thanks for your suggestions @runcom - they are really helpful!!!

ill squash once i get this working.

@kikisdeliveryservice

This comment has been minimized.

Copy link
Member Author

kikisdeliveryservice commented Mar 20, 2019

level=info msg="Waiting up to 30m0s for the cluster at https://api.ci-op-jfqwkc5d-57a9f.origin-ci-int-aws.dev.rhcloud.com:6443 to initialize..."
level=fatal msg="failed to initialize the cluster: timed out waiting for the condition"

looking through the logs the timeout might be due to InstallerControllerFailed (kube-apiserver), so going to try retest

/test e2e-aws-op

@kikisdeliveryservice

This comment has been minimized.

Copy link
Member Author

kikisdeliveryservice commented Mar 20, 2019

pretty sure grep is failing bc it's being done in a different shell than the oc rsh.

@kikisdeliveryservice

This comment has been minimized.

Copy link
Member Author

kikisdeliveryservice commented Mar 20, 2019

cc: @runcom

GOCACHE=off go test -timeout 50m -v${WHAT:+ -run="$WHAT"} ./test/e2e/
=== RUN   TestMCDToken
--- PASS: TestMCDToken (0.45s)
=== RUN   TestMCDeployed
--- PASS: TestMCDeployed (2418.86s)
=== RUN   TestUpdateSSH
panic: test timed out after 50m0s

goroutine 2196 [running]:
testing.(*M).startAlarm.func1()
	/usr/local/go/src/testing/testing.go:1240 +0xfc
created by time.goFunc
	/usr/local/go/src/time/sleep.go:172 +0x44

goroutine 1 [chan receive, 9 minutes]:

Because TestMCDeployed takes 40 minutes, it's blocking my test from ever passing bc only 10 minutes are left for all of the other tests in our suite to run.

I believe that my test might be working correctly as I see no error output on this pass. I will change TestMCDeployed from testing 10 to 2 MCs my branch test file to allow my PR more time to run. If it works, we'll have to decide what to do bc I've seen TestMCDeployed take ~40 min for 10 MCs consistently.

cc:@runcom

@kikisdeliveryservice kikisdeliveryservice force-pushed the kikisdeliveryservice:e2e-ssh branch 2 times, most recently from d38ea54 to bb85053 Mar 20, 2019

@kikisdeliveryservice

This comment has been minimized.

Copy link
Member Author

kikisdeliveryservice commented Mar 20, 2019

/retest

@kikisdeliveryservice kikisdeliveryservice changed the title [WIP] e2e: first pass adding e2e for ssh via mcd [WIP] e2e: add ssh mcd test Mar 20, 2019

@kikisdeliveryservice kikisdeliveryservice force-pushed the kikisdeliveryservice:e2e-ssh branch 3 times, most recently from 9011194 to 01f5991 Mar 21, 2019

@kikisdeliveryservice

This comment has been minimized.

Copy link
Member Author

kikisdeliveryservice commented Mar 21, 2019

Awesome!! Got my e2e ssh test working! Will clean up commits tomorrow and pick up #563 to see if it fixes the time out issue.

@runcom

This comment has been minimized.

Copy link
Member

runcom commented Mar 21, 2019

Awesome!!!!!!

add e2e test for ssh updates via mcd
Test verifies that MCD updates MCP, daemons and
writes new ssh keys to node filesystems.

@kikisdeliveryservice kikisdeliveryservice force-pushed the kikisdeliveryservice:e2e-ssh branch from 01f5991 to 872bbc6 Mar 21, 2019

@kikisdeliveryservice kikisdeliveryservice changed the title [WIP] e2e: add ssh mcd test e2e: add ssh mcd test Mar 21, 2019

@kikisdeliveryservice

This comment has been minimized.

Copy link
Member Author

kikisdeliveryservice commented Mar 21, 2019

@runcom timeout issues fixed, I think your PR did the trick!

@runcom
Copy link
Member

runcom left a comment

/lgtm

@openshift-ci-robot

This comment has been minimized.

Copy link

openshift-ci-robot commented Mar 21, 2019

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kikisdeliveryservice, runcom

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [kikisdeliveryservice,runcom]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit bca2c1f into openshift:master Mar 21, 2019

4 of 5 checks passed

tide Not mergeable. Needs lgtm label.
Details
ci/prow/e2e-aws Job succeeded.
Details
ci/prow/e2e-aws-op Job succeeded.
Details
ci/prow/images Job succeeded.
Details
ci/prow/unit Job succeeded.
Details

@kikisdeliveryservice kikisdeliveryservice deleted the kikisdeliveryservice:e2e-ssh branch Mar 21, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.