Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make GCI nodes mount non tmpfs, ext* & bind mounts using an external mounter #36267

Merged
merged 6 commits into from
Nov 9, 2016

Conversation

vishh
Copy link
Contributor

@vishh vishh commented Nov 4, 2016

This PR downloads the stage1 & gci-mounter ACIs as part of cluster bring up instead of downloading them dynamically from gcr.io, which was the cause for #36206.

I have also optimized the containerized mounter to pre-load the mounter image once to avoid fetch latency while using it.

Original PR which got reverted: #35821

GCI nodes use an external mounter script to mount NFS & GlusterFS storage volumes

@mtaufen Node e2e is not re-enabled in this PR.

cc @jingxu97


This change is Reviewable

@vishh vishh added release-note Denotes a PR that will be considered when it comes time to generate release notes. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Nov 4, 2016
@vishh vishh added this to the v1.5 milestone Nov 4, 2016
@@ -185,7 +192,7 @@ function install-kube-binary-config {
chmod -R 755 "${kube_bin}"

# Install rkt binary to allow mounting storage volumes in GCI
install-rkt
install-gci-mounter-tools
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change the comment too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@jingxu97
Copy link
Contributor

jingxu97 commented Nov 4, 2016

@mtaufen could you also help review this PR? Thanks!

@k8s-github-robot k8s-github-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Nov 4, 2016
@vishh vishh force-pushed the gci-mounter-scope branch 2 times, most recently from 835fbd0 to ee30af7 Compare November 4, 2016 23:19
@k8s-github-robot k8s-github-robot added the kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API label Nov 4, 2016
@jingxu97
Copy link
Contributor

jingxu97 commented Nov 7, 2016

@k8s-bot gci gke e2e test this

@mtaufen
Copy link
Contributor

mtaufen commented Nov 7, 2016

For the following tests the cluster fails to initialize within 300 seconds:

  • pull-kubernetes-e2e-gce
  • pull-kubernetes-e2e-gce-etcd3
  • pull-kubernetes-e2e-gce-gci
  • pull-kubernetes-kubemark-e2e-gce

The verification test is wants us to run ./hack/update_owners.py:

I1107 08:42:07.156] Verifying hack/make-rules/../../hack/verify-test-owners.sh
I1107 08:42:08.482] # OUTDATED TESTS (6):
I1107 08:42:08.482] Generated release_1_5 clientset should create v2alpha1 scheduleJobs, delete scheduleJobs, watch scheduleJobs -- caesarxuchao
I1107 08:42:08.483] ScheduledJob should not emit unexpected warnings -- soltysh
I1107 08:42:08.483] ScheduledJob should not schedule jobs when suspended -- soltysh
I1107 08:42:08.483] ScheduledJob should not schedule new jobs when ForbidConcurrent -- soltysh
I1107 08:42:08.483] ScheduledJob should replace jobs when ReplaceConcurrent -- soltysh
I1107 08:42:08.483] ScheduledJob should schedule multiple jobs concurrently -- soltysh
I1107 08:42:08.483] # NEW TESTS (11):
I1107 08:42:08.484] CronJob should not emit unexpected warnings
I1107 08:42:08.484] CronJob should not schedule jobs when suspended
I1107 08:42:08.484] CronJob should not schedule new jobs when ForbidConcurrent
I1107 08:42:08.484] CronJob should replace jobs when ReplaceConcurrent
I1107 08:42:08.484] CronJob should schedule multiple jobs concurrently
I1107 08:42:08.484] Federation daemonsets DaemonSet objects should be created and deleted successfully
I1107 08:42:08.485] Federation deployments Deployment objects should be created and deleted successfully
I1107 08:42:08.485] Federation deployments Federated Deployment should create and update matching deployments in underling clusters
I1107 08:42:08.485] Generated release_1_5 clientset should create v2alpha1 cronJobs, delete cronJobs, watch cronJobs
I1107 08:42:08.485] Kubectl alpha client Kubectl run CronJob should create a CronJob
I1107 08:42:08.485] k8s.io/kubernetes/test/e2e_node/system
I1107 08:42:08.485] 
I1107 08:42:08.486] ERROR: the test list has changed
I1107 08:42:08.489] Run ./hack/update_owners.py to fix it

These two also timed out waiting for cluster initialization:

  • pull-kubernetes-e2e-gke
W1107 08:51:33.535] ERROR: (gcloud.container.clusters.create) Operation [<Operation
W1107 08:51:33.535]  name: u'operation-1478536254707-2383fe35'
W1107 08:51:33.535]  operationType: OperationTypeValueValuesEnum(CREATE_CLUSTER, 1)
W1107 08:51:33.535]  selfLink: u'https://test-container.sandbox.googleapis.com/v1/projects/154064302321/zones/us-central1-f/operations/operation-1478536254707-2383fe35'
W1107 08:51:33.535]  status: StatusValueValuesEnum(DONE, 3)
W1107 08:51:33.535]  statusMessage: u'Timed out waiting for cluster initialization. Cluster API may not be available.'
W1107 08:51:33.535]  targetLink: u'https://test-container.sandbox.googleapis.com/v1/projects/154064302321/zones/us-central1-f/clusters/e2e-gke-agent-pr-20-0'
W1107 08:51:33.536]  zone: u'us-central1-f'>] finished with error: Timed out waiting for cluster initialization. Cluster API may not be available.
  • pull-kubernetes-e2e-gke-gci
W1107 09:59:11.300] ERROR: (gcloud.container.clusters.create) Operation [<Operation
W1107 09:59:11.301]  name: u'operation-1478540312573-574d1447'
W1107 09:59:11.301]  operationType: OperationTypeValueValuesEnum(CREATE_CLUSTER, 1)
W1107 09:59:11.301]  selfLink: u'https://test-container.sandbox.googleapis.com/v1/projects/943316867058/zones/us-central1-f/operations/operation-1478540312573-574d1447'
W1107 09:59:11.301]  status: StatusValueValuesEnum(DONE, 3)
W1107 09:59:11.301]  statusMessage: u'Timed out waiting for cluster initialization. Cluster API may not be available.'
W1107 09:59:11.302]  targetLink: u'https://test-container.sandbox.googleapis.com/v1/projects/943316867058/zones/us-central1-f/clusters/e2e-gke-agent-pr-82-0'
W1107 09:59:11.302]  zone: u'us-central1-f'>] finished with error: Timed out waiting for cluster initialization. Cluster API may not be available.

}

# Garbage collect old rkt containers on exit
trap gc EXIT

${RKT_BINARY} run --stage1-name="coreos.com/rkt/stage1-fly:1.18.0" \
if [[ ! $(${RKT_BINARY} image list | grep ${MOUNTER_IMAGE}) ]]; then
${RKT_BINARY} fetch file://${MOUNTER_ACI}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vishh this rkt fetch does not work because of signature issue. I found out a way to make this work by adding an option insecure-options

${RKT_BINARY} fetch --insecure-options=image file://${MOUNTER_ACI}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops. Fixing it now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@k8s-ci-robot
Copy link
Contributor

Jenkins GCI GKE smoke e2e failed for commit 96e23163d7c90191438e53fb739503bfe3e16917. Full PR test history.

The magic incantation to run this job again is @k8s-bot gci gke e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

@k8s-ci-robot
Copy link
Contributor

Jenkins GCI GCE e2e failed for commit 96e23163d7c90191438e53fb739503bfe3e16917. Full PR test history.

The magic incantation to run this job again is @k8s-bot gci gce e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

@k8s-ci-robot
Copy link
Contributor

Jenkins GCE etcd3 e2e failed for commit 96e23163d7c90191438e53fb739503bfe3e16917. Full PR test history.

The magic incantation to run this job again is @k8s-bot gce etcd3 e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

@k8s-ci-robot
Copy link
Contributor

Jenkins Kubemark GCE e2e failed for commit 96e23163d7c90191438e53fb739503bfe3e16917. Full PR test history.

The magic incantation to run this job again is @k8s-bot kubemark e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

@k8s-ci-robot
Copy link
Contributor

Jenkins GKE smoke e2e failed for commit 96e23163d7c90191438e53fb739503bfe3e16917. Full PR test history.

The magic incantation to run this job again is @k8s-bot cvm gke e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

@k8s-ci-robot
Copy link
Contributor

Jenkins unit/integration failed for commit 96e23163d7c90191438e53fb739503bfe3e16917. Full PR test history.

The magic incantation to run this job again is @k8s-bot unit test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

@k8s-ci-robot
Copy link
Contributor

Jenkins GCE Node e2e failed for commit 96e23163d7c90191438e53fb739503bfe3e16917. Full PR test history.

The magic incantation to run this job again is @k8s-bot node e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

@k8s-ci-robot
Copy link
Contributor

Jenkins GCE e2e failed for commit 96e23163d7c90191438e53fb739503bfe3e16917. Full PR test history.

The magic incantation to run this job again is @k8s-bot cvm gce e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 8, 2016
@k8s-ci-robot
Copy link
Contributor

Jenkins verification failed for commit 96e23163d7c90191438e53fb739503bfe3e16917. Full PR test history.

The magic incantation to run this job again is @k8s-bot verify test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

Added a make rule `make upload` to audit and automate release artifact
uploads to GCS.

Signed-off-by: Vishnu kannan <vishnuk@google.com>
Signed-off-by: Vishnu kannan <vishnuk@google.com>
@mtaufen
Copy link
Contributor

mtaufen commented Nov 8, 2016

WRT verification test, you probably need to:

  • rebase, I'm guessing type *KubeletConfiguration has no field or method ExperimentalRuntimeIntegrationType is because the name of that field changed to EnableCRIin dcce768
  • run hack/update-codecgen.sh

Update the gci-mounter sha1 number
…ring runtime

Signed-off-by: Vishnu kannan <vishnuk@google.com>
Signed-off-by: Vishnu kannan <vishnuk@google.com>
@k8s-github-robot k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 8, 2016
@jingxu97
Copy link
Contributor

jingxu97 commented Nov 8, 2016

@k8s-bot node e2e test this

@jingxu97
Copy link
Contributor

jingxu97 commented Nov 8, 2016

@k8s-bot gci gce e2e test this

@jingxu97
Copy link
Contributor

jingxu97 commented Nov 8, 2016

@k8s-bot gci gke e2e test this

@vishh
Copy link
Contributor Author

vishh commented Nov 8, 2016

@jingxu97 @mtaufen @saad-ali this PR is ready to be merged.

@jingxu97
Copy link
Contributor

jingxu97 commented Nov 8, 2016

The tests passed. I rerun a few just to make sure. @mtaufen Do you have any other comments? Otherwise, LGTM

@jingxu97 jingxu97 added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 9, 2016
@k8s-github-robot
Copy link

@k8s-bot test this [submit-queue is verifying that this PR is safe to merge]

@k8s-github-robot
Copy link

Automatic merge from submit-queue

@k8s-github-robot k8s-github-robot merged commit 6983262 into kubernetes:master Nov 9, 2016
@vishh
Copy link
Contributor Author

vishh commented Nov 9, 2016

@saad-ali @jingxu97 Is this expected to be cherry-picked into v1.5?

@saad-ali
Copy link
Member

saad-ali commented Nov 9, 2016

@saad-ali @jingxu97 Is this expected to be cherry-picked into v1.5?

No cherry pick needed. We will fast forward the release-1.5 branch to pick up changes before the code freeze is lifted.

dims pushed a commit to dims/kubernetes that referenced this pull request Feb 8, 2018
Automatic merge from submit-queue

Make GCI nodes mount non tmpfs, ext* & bind mounts using an external mounter 

This PR downloads the stage1 & gci-mounter ACIs as part of cluster bring up instead of downloading them dynamically from gcr.io, which was the cause for kubernetes#36206.

I have also optimized the containerized mounter to pre-load the mounter image once to avoid fetch latency while using it.

Original PR which got reverted: kubernetes#35821

```release-note
GCI nodes use an external mounter script to mount NFS & GlusterFS storage volumes
```

@mtaufen Node e2e is not re-enabled in this PR.

cc @jingxu97
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants