Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First prow integration test: sinker #20451

Merged

Conversation

chaodaiG
Copy link
Contributor

This is the first integration test added for prow, as designed in https://docs.google.com/document/d/1hIHIoApoR4OUs_esBDE7A778wi-jUEZcr2-a0zVTqW0/edit.

The integration test deploys prow components in KIND cluster and test prow functions inside the cluster.

/assign @cjwagner @alvaroaleman @fejta

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. area/prow Issues or PRs related to prow sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels Jan 12, 2021
@chaodaiG
Copy link
Contributor Author

/test pull-test-infra-integration

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 12, 2021
@chaodaiG
Copy link
Contributor Author

/retest

@chaodaiG
Copy link
Contributor Author

@matthyx
Copy link
Contributor

matthyx commented Jan 12, 2021

awesome
/lgtm
/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 12, 2021
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 12, 2021

# Install nginx and wait for it ready
echo "Install nginx on kind cluster"
kubectl --context=${CONTEXT} apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/master/deploy/static/provider/kind/deploy.yaml
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we rely on this URL to stay this way? How likely is this to break?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah agreed, we should refer a concrete revision here rather than just master

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, pinned to a revision rather than master

help: "https://kind.sigs.k8s.io/docs/user/local-registry/"
EOF

# Install nginx and wait for it ready
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't seem to wait for it here though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, the wait was deferred to later step to make this run faster, removed the comment

echo "Push test image to registry"
docker pull busybox
docker tag busybox:latest localhost:5000/busybox:latest
docker push localhost:5000/busybox:latest
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't we hit the dockerhub rate limits?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. What do you think about craning this image to gcr?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already publish an alpine image with Prow. I think that is suitable for this. gcr.io/k8s-prow/alpine

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we're still tagging the image as busybox and referencing that in the pod. That works, but it'd be better to name it accurately.

// if err != nil {
// t.Fatalf("Failed stat %q: %v", defaultKubeconfig, err)
// }
// t.Logf("Stat of %q: %v\n\n%v\n\n%v", defaultKubeconfig, stat.Mode(), stat.Sys(), stat)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

debug code? do we need to keep this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, deleted

Copy link
Member

@alvaroaleman alvaroaleman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome to see some progress here :)

data:
oauth: ZmFrZW9hdXRodG9rZW4K # From 'fakeoauthtoken'
---
apiVersion: apiextensions.k8s.io/v1beta1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to use the manifests from the config directory (maybe the starter-s3.yaml) to make sure they are correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would also be what I prefer, since majority of the manifests are identical. But there will be slight difference in deployment config, such as the image path, and future github-endpoint for github related integration tests, as well as other services that need mock(s). These can be achieved though, by various different method, such as kustomize, but might need some maintenance. What do you think?


prowjob_namespace: default
pod_namespace: test-pods
log_level: debug
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can just omit the job config file, it isn't mandatory (I can not comment on an empty file)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


# Install nginx and wait for it ready
echo "Install nginx on kind cluster"
kubectl --context=${CONTEXT} apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/master/deploy/static/provider/kind/deploy.yaml
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah agreed, we should refer a concrete revision here rather than just master


CURRENT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd -P )"

if [[ -n "${CI:-}" ]]; then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just check for the presence of the kind binary rather than making assumptions about where it is and isn't present

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

chmod +x /usr/bin/kind

# TODO(chaodaiG): remove this once bazel is installed in test image
echo "Install bazel for prow"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, checking for thhe presence of bazel rather than for running in CI makes IMHO more sense

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which bazel would be more robust by avoiding assuming the installation path. If you want to ensure a specific bazel version check with bazel --version. This installation doesn't create a binary called bazel or add anything to the path so I wouldn't expect the bazel command below to work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't think this works? If I have bazel installed, but don't have the correct bazel version, this will download the correct version, but then continue to use the version I originally had installed.
What is the need for requiring such a specific bazel version?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This solves the problem of the test image has bazel installed but not at the version required, it felt to me that bazel is smart enough to figure out which version to use?


for _, tt := range tests {
tt := tt
name := tt.name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this isn't needed, since you already capture tt

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

return defaultKubeconfig
}

func NewClients(configPath, clusterName string) (*kubernetes.Clientset, *prow.Clientset, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can only recommend to use the controller-runtime client for this, as it is one client that allows you to interact with all object kinds

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}
t.Logf("Pod is running: %s", name)

// Make sure pod is deleted, it'll take roughly 2 minutes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This takes two minutes with a five second resync period, are you sure?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The deletion action starts pretty fast, but completion of the deletion can take more than 1 minute

Delete(ctx, name, v1.DeleteOptions{})
})

if tt.hasCRD {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

err, please rename to hasCR, all tests have the CRD.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}
return !exist, nil
})
pods, err := kubeClient.CoreV1().Pods(testpodNamespace).List(ctx, v1.ListOptions{})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just capture the exists variable in this scope and avoid the second list, pod iteration etc and end the test right here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 13, 2021
@chaodaiG chaodaiG force-pushed the first-integration-test-sinker branch from edd9f28 to 713a6c5 Compare January 13, 2021 18:36
@chaodaiG
Copy link
Contributor Author

/test pull-test-infra-integration

@chaodaiG chaodaiG force-pushed the first-integration-test-sinker branch from 713a6c5 to 29d2c2d Compare January 13, 2021 19:55
@chaodaiG
Copy link
Contributor Author

/test pull-test-infra-integration

@chaodaiG chaodaiG force-pushed the first-integration-test-sinker branch from 29d2c2d to 043da32 Compare January 13, 2021 20:18
@chaodaiG
Copy link
Contributor Author

/test pull-test-infra-integration

Copy link
Member

@cjwagner cjwagner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this Chao, very exciting!

chmod +x /usr/bin/kind

# TODO(chaodaiG): remove this once bazel is installed in test image
echo "Install bazel for prow"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which bazel would be more robust by avoiding assuming the installation path. If you want to ensure a specific bazel version check with bazel --version. This installation doesn't create a binary called bazel or add anything to the path so I wouldn't expect the bazel command below to work.

hack/bazel.sh Outdated
@@ -20,7 +20,7 @@ set -o errexit
set -o pipefail

code=0
(set -o xtrace && bazel "$@") || code=$?
(set -o xtrace && bazel "$@" --test_tag_filters=-e2e) || code=$?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this change included accidentally? This script isn't used and this would probably be a breaking change for existing uses of the script.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh it looks like you're trying to prevent the integration tests from running unless specifically requested. Can we achieve that a better way? This prevents hack/bazel.sh from being used with the --test_tag_filters flag since it will already be specified. Also this assumes that bazel is always invoked via this script which is not the case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I can use an env var or some sort to skip integration test if it's not specified, what do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That could work. A flag would be a bit more explicit. It wouldn't be ideal for the test to noop and produce successful junit results when skipped, but IIRC Go's testing package provides a way to explicitly mark tests as skipped.

@@ -0,0 +1,99 @@
apiVersion: v1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Please rename this file, it has more than just the namespace.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


set -o errexit

CURRENT_REPO="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd -P )"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: This variable name is misleading, this is not the repo root, but rather the bash source dir (prow/test/integration).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack, done

}

func NewClientsFromConfig(cfg *rest.Config) (*kubernetes.Clientset, error) {
kubeClient, err := kubernetes.NewForConfig(cfg)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: We could just use kubernetes.NewForConfig(cfg) directly. This function isn't really needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is obsolete as well, deleted

if err := kubeClient.Create(ctx, &prowjob); err != nil {
t.Fatalf("Failed creating prowjob: %v", err)
}
t.Logf("Finished creating CRD: %s", tt.name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: In a couple places we say CRD rather than CR or PJ.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

"orphaned-pod",
false,
true,
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll also want to test some more scenarios like the following:

  1. completed, non-orphaned pods are deleted after the terminatedPodTTL expires.
  2. pods not created by prow are not deleted.
  3. prowjobs (not pods) are deleted after maxProwJobAge has passed.

I figure this PR is more an initial prototype for integration testing though so we don't need to add these just yet if you'd rather focus on just validating this integration testing pattern.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed that the above scenarios all need to be tested, not added in this PR as this is more for validating the pattern as you mentioned.

}
t.Logf("Finished creating pod: %s", tt.name)

// Make sure pod is running
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This races with sinker deleting the pod. To safely test the orphaned pod case, I'd expect a PJ to be created before the pod is created, wait for the pod to start, then delete the PJ to orphan the pod. That should prevent sinker from seeing an orphaned pod until after we've confirmed the pod was successfully created.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack, done

echo "Push test image to registry"
docker pull busybox
docker tag busybox:latest localhost:5000/busybox:latest
docker push localhost:5000/busybox:latest
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already publish an alpine image with Prow. I think that is suitable for this. gcr.io/k8s-prow/alpine


prowjob_namespace: default
pod_namespace: test-pods
log_level: debug
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be handy to allow dumping the Prow component logs into an output dir ($ARTIFACTS in CI) so that we can more easily debug integration test failures.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this needs to be done manually, something like k get logs svc/sinker -f > $ARTIFACTS/prowlogs/sinker &, what do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can also be done with client-go, but it might be easier with kubectl. We don't need to stream it though, we can just as some kind of a post step dump the log of all pods in a file that has the pod name or sth like that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I'd expect logs to be dumped with kubectl at the end if $ARTIFACTS is populated (or better yet use a CLI arg/flag).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chaodaiG
Copy link
Contributor Author

/test pull-test-infra-integration


prowjob_namespace: default
pod_namespace: test-pods
log_level: debug
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can also be done with client-go, but it might be easier with kubectl. We don't need to stream it though, we can just as some kind of a post step dump the log of all pods in a file that has the pod name or sth like that

return *clusterContext
}

func getDefaultKubeconfig(cfg string) string {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all of this is something the clientcfg.ConfigLoader already does with its default ruleset

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to learn, done

@chaodaiG
Copy link
Contributor Author

/test pull-test-infra-integration

1 similar comment
@chaodaiG
Copy link
Contributor Author

/test pull-test-infra-integration

@chaodaiG
Copy link
Contributor Author

@cjwagner , instead of using build tag for integration test, a test flag --run-integration-test was added for the test suite, if not provided the tests won't run.

@chaodaiG chaodaiG force-pushed the first-integration-test-sinker branch from 576433a to f8003df Compare January 14, 2021 21:31
@chaodaiG
Copy link
Contributor Author

/test pull-test-infra-integration

@chaodaiG
Copy link
Contributor Author

@petr-muller , @alvaroaleman , @cjwagner , I believe I have addressed all comments, could you take another look?

Copy link
Member

@alvaroaleman alvaroaleman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/hold

It would be nice to enable reporting for the new presubmit though, so that it appears below PRs where ppl explicitly triggered it

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 15, 2021
chaodaiG added a commit to chaodaiG/test-infra that referenced this pull request Jan 15, 2021
The integration test introduced in kubernetes#20451 works as expected, make it reporting to github before make it required for presubmit
@chaodaiG
Copy link
Contributor Author

/test pull-test-infra-integration

Copy link
Member

@cjwagner cjwagner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/hold cancel

overrides := clientcmd.ConfigOverrides{}
// Override the cluster name if provided.
if clusterName != "" {
overrides.Context.Cluster = clusterName
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure overwriting the Context.Cluster would be problematic if the values actually differed since the cluster needs to be associated with the correct user (AuthInfo). That being said I don't know if the values will ever differ in practice so this might be fine anyways.

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 15, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alvaroaleman, chaodaiG, cjwagner

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit 603a3a0 into kubernetes:master Jan 15, 2021
@k8s-ci-robot k8s-ci-robot added this to the v1.21 milestone Jan 15, 2021
@chaodaiG chaodaiG deleted the first-integration-test-sinker branch January 21, 2021 15:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/prow Issues or PRs related to prow cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants