-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
First prow integration test: sinker #20451
First prow integration test: sinker #20451
Conversation
/test pull-test-infra-integration |
/retest |
The integration test took 6 min 40 sec: https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/test-infra/20451/pull-test-infra-integration/1349025501456371712 |
awesome |
|
||
# Install nginx and wait for it ready | ||
echo "Install nginx on kind cluster" | ||
kubectl --context=${CONTEXT} apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/master/deploy/static/provider/kind/deploy.yaml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we rely on this URL to stay this way? How likely is this to break?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah agreed, we should refer a concrete revision here rather than just master
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, pinned to a revision rather than master
help: "https://kind.sigs.k8s.io/docs/user/local-registry/" | ||
EOF | ||
|
||
# Install nginx and wait for it ready |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't seem to wait for it here though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, the wait was deferred to later step to make this run faster, removed the comment
prow/test/integration/setup-prow.sh
Outdated
echo "Push test image to registry" | ||
docker pull busybox | ||
docker tag busybox:latest localhost:5000/busybox:latest | ||
docker push localhost:5000/busybox:latest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't we hit the dockerhub rate limits?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. What do you think about craning this image to gcr?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already publish an alpine image with Prow. I think that is suitable for this. gcr.io/k8s-prow/alpine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we're still tagging the image as busybox
and referencing that in the pod. That works, but it'd be better to name it accurately.
// if err != nil { | ||
// t.Fatalf("Failed stat %q: %v", defaultKubeconfig, err) | ||
// } | ||
// t.Logf("Stat of %q: %v\n\n%v\n\n%v", defaultKubeconfig, stat.Mode(), stat.Sys(), stat) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
debug code? do we need to keep this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, deleted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome to see some progress here :)
data: | ||
oauth: ZmFrZW9hdXRodG9rZW4K # From 'fakeoauthtoken' | ||
--- | ||
apiVersion: apiextensions.k8s.io/v1beta1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to use the manifests from the config directory (maybe the starter-s3.yaml) to make sure they are correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would also be what I prefer, since majority of the manifests are identical. But there will be slight difference in deployment config, such as the image path, and future github-endpoint for github related integration tests, as well as other services that need mock(s). These can be achieved though, by various different method, such as kustomize, but might need some maintenance. What do you think?
|
||
prowjob_namespace: default | ||
pod_namespace: test-pods | ||
log_level: debug |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can just omit the job config file, it isn't mandatory (I can not comment on an empty file)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
# Install nginx and wait for it ready | ||
echo "Install nginx on kind cluster" | ||
kubectl --context=${CONTEXT} apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/master/deploy/static/provider/kind/deploy.yaml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah agreed, we should refer a concrete revision here rather than just master
prow/test/integration/test.sh
Outdated
|
||
CURRENT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd -P )" | ||
|
||
if [[ -n "${CI:-}" ]]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would just check for the presence of the kind
binary rather than making assumptions about where it is and isn't present
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
chmod +x /usr/bin/kind | ||
|
||
# TODO(chaodaiG): remove this once bazel is installed in test image | ||
echo "Install bazel for prow" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, checking for thhe presence of bazel rather than for running in CI makes IMHO more sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which bazel
would be more robust by avoiding assuming the installation path. If you want to ensure a specific bazel version check with bazel --version
. This installation doesn't create a binary called bazel
or add anything to the path so I wouldn't expect the bazel
command below to work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't think this works? If I have bazel installed, but don't have the correct bazel version, this will download the correct version, but then continue to use the version I originally had installed.
What is the need for requiring such a specific bazel version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This solves the problem of the test image has bazel installed but not at the version required, it felt to me that bazel is smart enough to figure out which version to use?
|
||
for _, tt := range tests { | ||
tt := tt | ||
name := tt.name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this isn't needed, since you already capture tt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
prow/test/integration/test/setup.go
Outdated
return defaultKubeconfig | ||
} | ||
|
||
func NewClients(configPath, clusterName string) (*kubernetes.Clientset, *prow.Clientset, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can only recommend to use the controller-runtime client for this, as it is one client that allows you to interact with all object kinds
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
} | ||
t.Logf("Pod is running: %s", name) | ||
|
||
// Make sure pod is deleted, it'll take roughly 2 minutes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This takes two minutes with a five second resync period, are you sure?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The deletion action starts pretty fast, but completion of the deletion can take more than 1 minute
Delete(ctx, name, v1.DeleteOptions{}) | ||
}) | ||
|
||
if tt.hasCRD { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
err, please rename to hasCR
, all tests have the CRD.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
} | ||
return !exist, nil | ||
}) | ||
pods, err := kubeClient.CoreV1().Pods(testpodNamespace).List(ctx, v1.ListOptions{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just capture the exists
variable in this scope and avoid the second list, pod iteration etc and end the test right here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
edd9f28
to
713a6c5
Compare
/test pull-test-infra-integration |
713a6c5
to
29d2c2d
Compare
/test pull-test-infra-integration |
29d2c2d
to
043da32
Compare
/test pull-test-infra-integration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this Chao, very exciting!
chmod +x /usr/bin/kind | ||
|
||
# TODO(chaodaiG): remove this once bazel is installed in test image | ||
echo "Install bazel for prow" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which bazel
would be more robust by avoiding assuming the installation path. If you want to ensure a specific bazel version check with bazel --version
. This installation doesn't create a binary called bazel
or add anything to the path so I wouldn't expect the bazel
command below to work.
hack/bazel.sh
Outdated
@@ -20,7 +20,7 @@ set -o errexit | |||
set -o pipefail | |||
|
|||
code=0 | |||
(set -o xtrace && bazel "$@") || code=$? | |||
(set -o xtrace && bazel "$@" --test_tag_filters=-e2e) || code=$? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this change included accidentally? This script isn't used and this would probably be a breaking change for existing uses of the script.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh it looks like you're trying to prevent the integration tests from running unless specifically requested. Can we achieve that a better way? This prevents hack/bazel.sh
from being used with the --test_tag_filters
flag since it will already be specified. Also this assumes that bazel is always invoked via this script which is not the case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. I can use an env var or some sort to skip integration test if it's not specified, what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That could work. A flag would be a bit more explicit. It wouldn't be ideal for the test to noop and produce successful junit results when skipped, but IIRC Go's testing package provides a way to explicitly mark tests as skipped.
@@ -0,0 +1,99 @@ | |||
apiVersion: v1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Please rename this file, it has more than just the namespace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
prow/test/integration/setup-prow.sh
Outdated
|
||
set -o errexit | ||
|
||
CURRENT_REPO="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd -P )" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: This variable name is misleading, this is not the repo root, but rather the bash source dir (prow/test/integration
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack, done
prow/test/integration/test/setup.go
Outdated
} | ||
|
||
func NewClientsFromConfig(cfg *rest.Config) (*kubernetes.Clientset, error) { | ||
kubeClient, err := kubernetes.NewForConfig(cfg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: We could just use kubernetes.NewForConfig(cfg)
directly. This function isn't really needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is obsolete as well, deleted
if err := kubeClient.Create(ctx, &prowjob); err != nil { | ||
t.Fatalf("Failed creating prowjob: %v", err) | ||
} | ||
t.Logf("Finished creating CRD: %s", tt.name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: In a couple places we say CRD rather than CR or PJ.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
"orphaned-pod", | ||
false, | ||
true, | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll also want to test some more scenarios like the following:
- completed, non-orphaned pods are deleted after the
terminatedPodTTL
expires. - pods not created by prow are not deleted.
- prowjobs (not pods) are deleted after
maxProwJobAge
has passed.
I figure this PR is more an initial prototype for integration testing though so we don't need to add these just yet if you'd rather focus on just validating this integration testing pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed that the above scenarios all need to be tested, not added in this PR as this is more for validating the pattern as you mentioned.
} | ||
t.Logf("Finished creating pod: %s", tt.name) | ||
|
||
// Make sure pod is running |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This races with sinker deleting the pod. To safely test the orphaned pod case, I'd expect a PJ to be created before the pod is created, wait for the pod to start, then delete the PJ to orphan the pod. That should prevent sinker from seeing an orphaned pod until after we've confirmed the pod was successfully created.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack, done
prow/test/integration/setup-prow.sh
Outdated
echo "Push test image to registry" | ||
docker pull busybox | ||
docker tag busybox:latest localhost:5000/busybox:latest | ||
docker push localhost:5000/busybox:latest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already publish an alpine image with Prow. I think that is suitable for this. gcr.io/k8s-prow/alpine
|
||
prowjob_namespace: default | ||
pod_namespace: test-pods | ||
log_level: debug |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be handy to allow dumping the Prow component logs into an output dir ($ARTIFACTS
in CI) so that we can more easily debug integration test failures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this needs to be done manually, something like k get logs svc/sinker -f > $ARTIFACTS/prowlogs/sinker &
, what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can also be done with client-go, but it might be easier with kubectl
. We don't need to stream it though, we can just as some kind of a post step dump the log of all pods in a file that has the pod name or sth like that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I'd expect logs to be dumped with kubectl at the end if $ARTIFACTS is populated (or better yet use a CLI arg/flag).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And have verified that dumped logs are under artifacts: https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/test-infra/20451/pull-test-infra-integration/1349801426494164992
/test pull-test-infra-integration |
|
||
prowjob_namespace: default | ||
pod_namespace: test-pods | ||
log_level: debug |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can also be done with client-go, but it might be easier with kubectl
. We don't need to stream it though, we can just as some kind of a post step dump the log of all pods in a file that has the pod name or sth like that
prow/test/integration/test/setup.go
Outdated
return *clusterContext | ||
} | ||
|
||
func getDefaultKubeconfig(cfg string) string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all of this is something the clientcfg.ConfigLoader
already does with its default ruleset
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to learn, done
/test pull-test-infra-integration |
1 similar comment
/test pull-test-infra-integration |
@cjwagner , instead of using build tag for integration test, a test flag |
576433a
to
f8003df
Compare
/test pull-test-infra-integration |
@petr-muller , @alvaroaleman , @cjwagner , I believe I have addressed all comments, could you take another look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/hold
It would be nice to enable reporting for the new presubmit though, so that it appears below PRs where ppl explicitly triggered it
The integration test introduced in kubernetes#20451 works as expected, make it reporting to github before make it required for presubmit
/test pull-test-infra-integration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/hold cancel
overrides := clientcmd.ConfigOverrides{} | ||
// Override the cluster name if provided. | ||
if clusterName != "" { | ||
overrides.Context.Cluster = clusterName |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure overwriting the Context.Cluster
would be problematic if the values actually differed since the cluster needs to be associated with the correct user (AuthInfo
). That being said I don't know if the values will ever differ in practice so this might be fine anyways.
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: alvaroaleman, chaodaiG, cjwagner The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
This is the first integration test added for prow, as designed in https://docs.google.com/document/d/1hIHIoApoR4OUs_esBDE7A778wi-jUEZcr2-a0zVTqW0/edit.
The integration test deploys prow components in KIND cluster and test prow functions inside the cluster.
/assign @cjwagner @alvaroaleman @fejta