Windows tests failing on GCE with "Couldn't delete ns" timeouts #76666

pjh · 2019-04-16T18:24:35Z

Which jobs are failing:
https://testgrid.k8s.io/google-windows#windows-gce&width=20&show-stale-tests=
https://testgrid.k8s.io/google-windows#windows-prototype&width=20&show-stale-tests=
https://testgrid.k8s.io/google-windows#windows-gce-1.14&width=20&show-stale-tests=

These test jobs started timing out on 04-15. We believe this is due to a change in the Windows image that is being used for the Windows VMs; yesterday the default image changed from windows-server-1809-dc-core-for-containers-v20190312 to windows-server-1809-dc-core-for-containers-v20190411.

Note that the failure in the windows-gce job was obfuscated by a Docker API version problem which was fixed by #76621.

The text was updated successfully, but these errors were encountered:

pjh · 2019-04-16T18:25:01Z

/assign peterhornyack
/cc yujuhong
/sig windows

k8s-ci-robot · 2019-04-16T18:25:03Z

@pjh: GitHub didn't allow me to assign the following users: peterhornyack.

Note that only kubernetes members and repo collaborators can be assigned and that issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

/assign peterhornyack
/cc yujuhong
/sig windows

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

pjh · 2019-04-16T18:25:14Z

/assign pjh

pjh · 2019-04-16T18:35:07Z

Version information for our March Windows image:

2019/04/15 17:38:33 windows-startup-script-ps1: GCE Windows image:
2019/04/15 17:38:33 windows-startup-script-ps1: projects/windows-cloud/global/images/windows-server-1809-dc-core-for-containers-v20190312

2019/04/15 17:38:33 windows-startup-script-ps1: Source        Description      HotFixID      InstalledBy          InstalledOn              
2019/04/15 17:38:33 windows-startup-script-ps1: ------        -----------      --------      -----------          -----------              
2019/04/15 17:38:33 windows-startup-script-ps1: E2E-B9585E... Update           KB4486553     NT AUTHORITY\SYSTEM  3/13/2019 12:00:00 AM    
2019/04/15 17:38:33 windows-startup-script-ps1: E2E-B9585E... Security Update  KB4470788     NT AUTHORITY\SYSTEM  3/12/2019 12:00:00 AM    
2019/04/15 17:38:33 windows-startup-script-ps1: E2E-B9585E... Update           KB4480056     NT AUTHORITY\SYSTEM  3/12/2019 12:00:00 AM    
2019/04/15 17:38:33 windows-startup-script-ps1: E2E-B9585E... Security Update  KB4489899     NT AUTHORITY\SYSTEM  3/13/2019 12:00:00 AM

Version information for our April Windows image (captured manually since the timed-out test runs are failing to grab the artifacts from the Windows nodes):

PS C:\Users\peterhornyack> $client = New-Object Net.WebClient
PS C:\Users\peterhornyack> $client.Headers.Add('Metadata-Flavor', 'Google')
PS C:\Users\peterhornyack> $client.DownloadString("http://metadata.google.intern
al/computeMetadata/v1/instance/image").Trim()
projects/windows-cloud/global/images/windows-server-1809-dc-core-for-containers-
v20190411
PS C:\Users\peterhornyack> Get-Hotfix | Out-String 

Source        Description      HotFixID      InstalledBy          InstalledOn   
------        -----------      --------      -----------          -----------   
E2E-TEST-P... Update           KB4486553     NT AUTHORITY\SYSTEM  4/12/2019 ... 
E2E-TEST-P... Update           KB4480056     NT AUTHORITY\SYSTEM  4/12/2019 ... 
E2E-TEST-P... Security Update  KB4493510     NT AUTHORITY\SYSTEM  4/12/2019 ... 
E2E-TEST-P... Security Update  KB4493509     NT AUTHORITY\SYSTEM  4/12/2019 ...

pjh · 2019-04-16T18:35:51Z

cc @yujuhong

pjh · 2019-04-16T18:38:09Z

Example failing test run: https://prow.k8s.io/view/gcs/kubernetes-jenkins/logs/ci-kubernetes-e2e-windows-gce-k8sbeta/1117867576618127362. There are about 18 failures that look like this:

Apr 15 19:37:56.069: INFO: Couldn't delete ns: "pods-2328": namespace pods-2328 was not deleted with limit: timed out waiting for the condition, pods remaining: 1 (&errors.errorString{s:"namespace pods-2328 was not deleted with limit: timed out waiting for the condition, pods remaining: 1"})

These numerous failures are causing the test run to hit the 2-hour timeout.

pjh · 2019-04-16T18:53:13Z

cc @PatrickLang @dineshgovindasamy

We're seeing test failures that correlate with us starting to use a new Windows Server 1809 image. Previously we were running our 1809 nodes with these hotfixes: KB4470788, KB4489899. Yesterday we started running with those hotfixes removed and these hotfixes applied instead: KB4493510, KB4493509. All of these KBs are described as Security Updates.

Does anyone have any ideas why these hotfix changes seem to be causing these failures?: Couldn't delete ns: "pods-2328": namespace pods-2328 was not deleted with limit: timed out waiting for the condition.

pjh · 2019-04-16T20:28:53Z

Dinesh looked at the OS version differences and did not find anything obviously related to namespace or endpoint deletion. Thanks a lot for checking.

Another thing that may have changed when our image changed is the Docker version that's installed. Looking into this now.

pjh · 2019-04-16T21:25:13Z

Our Docker version changed from Docker version 18.09.3, build 142dfcedca to Docker version 18.09.5, build be4553c277. This seems like the probable cause of the behavior change but I haven't confirmed this.

Working on gathering HNS etc. traces from a Windows node while reproducing the failure.

yujuhong · 2019-04-16T22:17:44Z

What we are seeing is that pause containers sometimes cannot be terminated:

failed to terminate hccshim container [container=1d39bdf3f6b6ab167742163e85ba2d41dca4d4bda0b87fcf749e8e35375f5d0b error=container 1d39bdf3f6b6ab167742163e85ba2d41dca4d4bda0b87fcf749e8e35375f5d0b encountered an error during Term
inate: hcsshim: the handle has already been closed process=init pid=14648 signal=9 namespace=moby module=libcontainerd]

This doesn't always happen. Some tests still pass.

yujuhong · 2019-04-16T22:39:56Z

We are having a hard time stopping the pause container (e2eteam/pause:3.1) in GCE with the latest Windows server image.

Kubelet is full of timeout messages:

Container "docker://1d39bdf3f6b6ab167742163e85ba2d41dca4d4bda0b87fcf749e8e35375f5d0b" termination failed with gracePeriod 2: rpc error: code = Unknown desc = operation timeout: context deadline exceeded

Docker log is full of errors:

failed to terminate hccshim container [container=1d39bdf3f6b6ab167742163e85ba2d41dca4d4bda0b87fcf749e8e35375f5d0b error=container 1d39bdf3f6b6ab167742163e85ba2d41dca4d4bda0b87fcf749e8e35375f5d0b encountered an error during Term
inate: hcsshim: the handle has already been closed process=init pid=14648 signal=9 namespace=moby module=libcontainerd]

When I tried stopping the pause container manually with docker stop, the command basically hangs.

UPDATE: The command finished after 10min, and the container was stopped successfully

@PatrickLang Do you have any idea why this may have happened?

pjh · 2019-04-16T22:57:06Z

I've uploaded test artifacts from a repro of the issue at https://github.com/pjh/kubernetes/tree/76666-timeouts/e2e-artifacts-timeout-repro-1/e2e-test-peterhornyack-windows-node-group-31ht. That directory contains an HNS trace that I captured using the commands shared at #75421 (comment).

I didn't manage to collect logs using the collectlogs.ps1 script yet.

pjh · 2019-04-17T00:56:09Z

Update: I reproduced the error again, this time with collectlogs.ps1 output from before and after running the e2e test. Artifacts are here, repro commands for posterity here.

pjh · 2019-04-17T01:16:21Z

Also, I was able to reproduce the issue in 3 out of 5 manual runs with --ginkgo.focus="should\sserve\smultiport\sendpoints\sfrom\spods".

madhanrm · 2019-04-17T01:36:34Z

I0417 00:18:25.405448 1524 fake_cpu_manager.go:45] [fake cpumanager] RemoveContainer (container id: c62d22d668a805bb2138d5387c7baa1c5d427cfc4e0ab15ceb54ff7df001fd93) => [Kubelet] Container is removed or attempted to be stopped

5735 None 2019-04-16T17:18:26.4763887 0.0000269 3992 1972 Microsoft_Windows_HyperV_Compute ContainerStopped: containerId=c62d22d668a805bb2138d5387c7baa1c5d427cfc4e0ab15ceb54ff7df001fd93,peakMemoryUsageKB=74739712,processCount=21,reason=0,runtimeMs=5545,type=WindowsContainer => [VmCompute] Container is stopped.

The container stats is getting queried for ever. Looks like it is not getting cleaned up.

E0417 00:20:25.064542 1524 remote_runtime.go:250] StopContainer "c62d22d668a805bb2138d5387c7baa1c5d427cfc4e0ab15ceb54ff7df001fd93" from runtime service failed: rpc error: code = Unknown desc = operation timeout: context deadline exceeded => [Kubelet] Docker client times out. Dockerd logs doesn't have time stamp, so it is difficult to sequence it with other logs.

[DockerD]
Container c62d22d668a805bb2138d5387c7baa1c5d427cfc4e0ab15ceb54ff7df001fd93 is not running
Container c62d22d668a805bb2138d5387c7baa1c5d427cfc4e0ab15ceb54ff7df001fd93 failed to exit within 30 seconds of signal 15 - using the force

madhanrm · 2019-04-17T01:44:46Z

the handle has already been closed

This is coming from GitHub
https://github.com/Microsoft/hcsshim/blob/4e93834c8eda905fd06c8d71e6f982faa84294a9/internal/hcs/system.go#L331

yujuhong · 2019-04-17T01:56:08Z

Yep, it's from the hcsshim. That's why I filed an issue there to see if anyone could help shed some light on why we're getting this error frequently: microsoft/hcsshim#567

pjh · 2019-04-17T16:29:57Z

Forcing the Docker version to be 18.09.3 (instead of 18.09.5) with our April 1809 image did not help with this issue. However, pinning our Windows nodes back to our 1809 image from March did restore our tests: https://testgrid.k8s.io/google-windows#windows-prototype&width=20.

This seems to indicate that the problem is in HCS or hcsshim, as Madhan and Yu-Ju noted above.

@PatrickLang @dineshgovindasamy @madhanrm @daschott how can I tell what has changed within HCS / hcsshim between the Windows 1809 images that we generated in March and April? We currently dump Get-Hotfix output when bringing up our nodes (https://github.com/kubernetes/kubernetes/blob/master/cluster/gce/windows/k8s-node-setup.psm1#L142) to aid in diagnosing issues like this, but it sounds like the KBs that we have installed or not installed do not capture the state of HCS. What additional commands can we run that will tell us if something changed in HCS or hcsshim?

This is to work around kubernetes#76666.

daschott · 2019-05-18T01:19:31Z

@pjh can you try
cmd.exe /c wusa.exe /uninstall /KB4486553 /norestart

You will probably have to restart, even with the /norestart.

EDIT: Syntax may be KB:4486553,kb:4486553, kb4486553, or just 4486553, I can't quite remember but you should know when it passes through :)

pjh · 2019-05-20T18:04:45Z

You will probably have to restart, even with the /norestart.

Understood, I'm just restarting manually a bit later on.

I tried running cmd.exe /c wusa.exe /uninstall /kb:4486553 /quiet /norestart but that doesn't work either. Here's what I found:

None of these do anything when I run them in my PowerShell script - the hotfix is still present after a reboot:

& wusa /uninstall /kb:4486553 /quiet /norestart
& cmd.exe /c wusa.exe /uninstall /kb:4486553 /quiet /norestart
& wusa /uninstall /kb:4486553 /norestart

This command hangs forever, presumably waiting for a confirmation that never comes:

& cmd.exe /c wusa.exe /uninstall /kb:4486553 /norestart

Any other ideas for a workaround? Thanks.

PatrickLang · 2019-05-21T09:31:43Z

is KB4486553 the right KB? That's a .Net update, I don't think it includes Windows changes?

pjh · 2019-05-21T18:14:52Z

I thought @madhanrm said it was KB4486553, but maybe I'm misunderstanding.

* test: remove k8s.io/apiextensions-apiserver from framework There are two reason why this is useful: 1. less code to vendor into external users of the framework The following dependencies become obsolete due to this change (from `dep`): (8/23) Removed unused project github.com/grpc-ecosystem/go-grpc-prometheus (9/23) Removed unused project github.com/coreos/etcd (10/23) Removed unused project github.com/globalsign/mgo (11/23) Removed unused project github.com/go-openapi/strfmt (12/23) Removed unused project github.com/asaskevich/govalidator (13/23) Removed unused project github.com/mitchellh/mapstructure (14/23) Removed unused project github.com/NYTimes/gziphandler (15/23) Removed unused project gopkg.in/natefinch/lumberjack.v2 (16/23) Removed unused project github.com/go-openapi/errors (17/23) Removed unused project github.com/go-openapi/analysis (18/23) Removed unused project github.com/go-openapi/runtime (19/23) Removed unused project sigs.k8s.io/structured-merge-diff (20/23) Removed unused project github.com/go-openapi/validate (21/23) Removed unused project github.com/coreos/go-systemd (22/23) Removed unused project github.com/go-openapi/loads (23/23) Removed unused project github.com/munnerz/goautoneg 2. works around kubernetes#75338 which currently breaks vendoring Some recent changes to crd_util.go must now be pulling in the broken k8s.io/apiextensions-apiserver packages, because it was still working in revision 2e90d92 (as demonstrated by https://github.com/intel/pmem-CSI/tree/586ae281ac2810cb4da6f1e160cf165c7daf0d80). * update Bazel files * test: fix golint warnings in crd_util.go Because the code was moved, golint is now active. Because users of the code must adapt to the new location of the code, it makes sense to also change the API at the same time to address the style comments from golint ("struct field ApiGroup should be APIGroup", same for ApiExtensionClient). * fix race condition issue for smb mount on windows change var name * stop vsphere cloud provider from spamming logs with `failed to patch IP` Fixes: kubernetes#75236 * Remove reference to USE_RELEASE_NODE_BINARIES. This variable was used for development purposes and was accidentally introduced in kubernetes@f0f7829. This is its only use in the tree: https://github.com/kubernetes/kubernetes/search?q=USE_RELEASE_NODE_BINARIES&unscoped_q=USE_RELEASE_NODE_BINARIES * Clear conntrack entries on 0 -> 1 endpoint transition with externalIPs As part of the endpoint creation process when going from 0 -> 1 conntrack entries are cleared. This is to prevent an existing conntrack entry from preventing traffic to the service. Currently the system ignores the existance of the services external IP addresses, which exposes that errant behavior This adds the externalIP addresses of udp services to the list of conntrack entries that get cleared. Allowing traffic to flow Signed-off-by: Jacob Tanenbaum <jtanenba@redhat.com> * Move to golang 1.12.1 official image We used 1.12.0 + hack to download 1.12.1 binaries as we were in a rush on friday since the images were not published at that time. Let's remove the hack now and republish the kube-cross image Change-Id: I3ffff3283b6ca755320adfca3c8f4a36dc1c2b9e * fix-kubeadm-init-output * Mark audit e2e tests as flaky * Bump kube-cross image to 1.12.1-2 * Restore username and password kubectl flags * build/gci: bump CNI version to 0.7.5 * Add/Update CHANGELOG-1.14.md for v1.14.0-rc.1. * Restore machine readability to the print-join-command output The output of `kubeadm token create --print-join-command` should be usable by batch scripts. This issue was pointed out in: kubernetes/kubeadm#1454 * bump required minimum go version to 1.12.1 (strings package compatibility) * Bump go-openapi/jsonpointer and go-openapi/jsonreference versions xref: kubernetes#75653 Signed-off-by: Jorge Alarcon Ochoa <alarcj137@gmail.com> * Kubernetes version v1.14.1-beta.0 openapi-spec file updates * Add/Update CHANGELOG-1.14.md for v1.14.0. * 1.14 release notes fixes * Add flag to enable strict ARP * Do not delete existing VS and RS when starting * Update Cluster Autscaler version to 1.14.0 No changes since 1.14.0-beta.2 Changelog: https://github.com/kubernetes/autoscaler/releases/tag/cluster-autoscaler-1.14.0 * Fix Windows to read VM UUIDs from serial numbers Certain versions of vSphere do not have the same value for product_uuid and product_serial. This mimics the change in kubernetes#59519. Fixes kubernetes#74888 * godeps: update vmware/govmomi to v0.20 release * vSphere: add token auth support for tags client SAML auth support for the vCenter rest API endpoint came to govmomi a bit after Zone support came to vSphere Cloud Provider. Fixes kubernetes#75511 * vsphere: govmomi rest API simulator requires authentication * gce: configure: validate SA has storage scope If the VM SA doesn't have storage scope associated, don't use the token in the curl request or the request will fail with 403. * fix-external-etcd * Update gcp images with security patches [stackdriver addon] Bump prometheus-to-sd to v0.5.0 to pick up security fixes. [fluentd-gcp addon] Bump fluentd-gcp-scaler to v0.5.1 to pick up security fixes. [fluentd-gcp addon] Bump event-exporter to v0.2.4 to pick up security fixes. [fluentd-gcp addon] Bump prometheus-to-sd to v0.5.0 to pick up security fixes. [metatada-proxy addon] Bump prometheus-to-sd v0.5.0 to pick up security fixes. * kubeadm: fix "upgrade plan" not working without k8s version If the k8s version argument passed to "upgrade plan" is missing the logic should perform the following actions: - fetch a "stable" version from the internet. - if that fails, fallback to the local client version. Currentely the logic fails because the cfg.KubernetesVersion is defaulted to the version of the existing cluster, which then causes an early exit without any ugprade suggestions. See app/cmd/upgrade/common.go::enforceRequirements(): configutil.FetchInitConfigurationFromCluster(..) Fix that by passing the explicit user value that can also be "". This will then make the "offline getter" treat it as an explicit desired upgrade target. In the future it might be best to invert this logic: - if no user k8s version argument is passed - default to the kubeadm version. - if labels are passed (e.g. "stable"), fetch a version from the internet. * Disable GCE agent address management on Windows nodes. With this metadata key set, "GCEWindowsAgent: GCE address manager status: disabled" will appear in the VM's serial port output during boot. Tested: PROJECT=${CLOUDSDK_CORE_PROJECT} KUBE_GCE_ENABLE_IP_ALIASES=true NUM_WINDOWS_NODES=2 NUM_NODES=2 KUBERNETES_NODE_PLATFORM=windows go run ./hack/e2e.go -- --up cluster/gce/windows/smoke-test.sh cat > iis.yaml <<EOF apiVersion: v1 kind: Pod metadata: name: iis labels: app: iis spec: containers: - image: mcr.microsoft.com/windows/servercore/iis imagePullPolicy: IfNotPresent name: iis-server ports: - containerPort: 80 protocol: TCP nodeSelector: beta.kubernetes.io/os: windows tolerations: - effect: NoSchedule key: node.kubernetes.io/os operator: Equal value: windows1809 EOF kubectl create -f iis.yaml kubectl expose pod iis --type=LoadBalancer --name=iis kubectl get services curl http://<service external IP address> * kube-aggregator: bump openapi aggregation log level * Explicitly flush headers when proxying * fix-kubeadm-upgrade-12-13-14 * GCE/Windows: disable stackdriver logging agent The logging service could not be stopped at times, causing node startup failures. Disable it until the issue is fixed. * Finish saving test results on failure The conformance image should be saving its results regardless of the results of the tests. However, with errexit set, when ginkgo gets test failures it exits 1 which prevents saving the results for Sonobuoy to pick up. Fixes: kubernetes#76036 * Avoid panic in cronjob sorting This change handles the case where the ith cronjob may have its start time set to nil. Previously, the Less method could cause a panic in case the ith cronjob had its start time set to nil, but the jth cronjob did not. It would panic when calling Before on a nil StartTime. * Removed cleanup for non-current kube-proxy modes in newProxyServer() * Depricated --cleanup-ipvs flag in kube-proxy * Fixed old function signature in kube-proxy tests. * Revert "Deprecated --cleanup-ipvs flag in kube-proxy" This reverts commit 4f1bb2b. * Revert "Fixed old function signature in kube-proxy tests." This reverts commit 29ba1b0. * Fixed --cleanup-ipvs help text * Check for required name parameter in dynamic client The Create, Delete, Get, Patch, Update and UpdateStatus methods in the dynamic client all expect the name parameter to be non-empty, but did not validate this requirement, which could lead to a panic. Add explicit checks to these methods. * Fix empty array expansion error in cluster/gce/util.sh Empty array expansion causes "unbound variable" error in bash 4.2 and bash 4.3. * Improve volume operation metrics * Add e2e tests * ensuring that logic is checking for differences in listener * Kubernetes version v1.14.2-beta.0 openapi-spec file updates * Delete only unscheduled pods if node doesn't exist anymore. * Add/Update CHANGELOG-1.14.md for v1.14.1. * Use Node-Problem-Detector v0.6.3 on GCI * proxy: Take into account exclude CIDRs while deleting legacy real servers * kubeadm: Don't error out on join with --cri-socket override In the case where newControlPlane is true we don't go through getNodeRegistration() and initcfg.NodeRegistration.CRISocket is empty. This forces DetectCRISocket() to be called later on, and if there is more than one CRI installed on the system, it will error out, while asking for the user to provide an override for the CRI socket. Even if the user provides an override, the call to DetectCRISocket() can happen too early and thus ignore it (while still erroring out). However, if newControlPlane == true, initcfg.NodeRegistration is not used at all and it's overwritten later on. Thus it's necessary to supply some default value, that will avoid the call to DetectCRISocket() and as initcfg.NodeRegistration is discarded, setting whatever value here is harmless. Signed-off-by: Rostislav M. Georgiev <rostislavg@vmware.com> * Bump coreos/go-semver The https://github.com/coreos/go-semver/ dependency has formally release v0.3.0 at commit e214231b295a8ea9479f11b70b35d5acf3556d9b. This is the commit point we've been using, but the hack/verify-godeps.sh script notices the discrepancy and causes ci-kubernetes-verify job to fail. Fixes: kubernetes#76526 Signed-off-by: Tim Pepper <tpepper@vmware.com> * Fix Azure SLB support for multiple backend pools Azure VM and vmssVM support multiple backend pools for the same SLB, but not for different LBs. * Restore metrics-server using of IP addresses This preference list matches is used to pick prefered field from k8s node object. It was introduced in metrics-server 0.3 and changed default behaviour to use DNS instead of IP addresses. It was merged into k8s 1.12 and caused breaking change by introducing dependency on DNS configuration. * refactor detach azure disk retry operation * move disk lock process to azure cloud provider fix comments fix import keymux check error add unit test for attach/detach disk funcs * Fix concurrent map access in Portworx create volume call Fixes kubernetes#76340 Signed-off-by: Harsh Desai <harsh@portworx.com> * Fix race condition between actual and desired state in kublet volume manager This PR fixes the issue kubernetes#75345. This fix modified the checking volume in actual state when validating whether volume can be removed from desired state or not. Only if volume status is already mounted in actual state, it can be removed from desired state. For the case of mounting fails always, it can still work because the check also validate whether pod still exist in pod manager. In case of mount fails, pod should be able to removed from pod manager so that volume can also be removed from desired state. * fix validation message: apiServerEndpoints -> apiServerEndpoint * add shareName param in azure file storage class skip create azure file if it exists * Update Cluster Autoscaler to 1.14.2 * Create the "internal" firewall rule for kubemark master. This is equivalent to the "internal" firewall rule that is created for the regular masters. The main reason for doing it is to allow prometheus scraping metrics from various kubemark master components, e.g. kubelet. Ref. kubernetes/perf-tests#503 * fix disk list corruption issue * Restrict builds to officially supported platforms Prior to this change, including windows/amd64 in KUBE_BUILD_PLATFORMS would, for example, attempt to build the server binaries/tars/images for Windows, which is not supported. This can break downstream build steps. * Fix verify godeps failure github.com/evanphx/json-patch added a new tag at the same sha this morning: https://github.com/evanphx/json-patch/releases/tag/v4.2.0 This confused godeps. This PR updates our file to match godeps expectation. Fixes issue 77238 * Upgrade Stackdriver Logging Agent addon image from 1.6.0 to 1.6.8. * Test kubectl cp escape * Properly handle links in tar * Bump debian-iptables versions to v11.0.2. * os exit when option is true * Pin GCE Windows node image to 1809 v20190312. This is to work around kubernetes#76666. * Update the dynamic volume limit in GCE PD Currently GCE PD support 128 maximum disks attached to a node for all machines types except shared-core. This PR updates the limit number to date. Change-Id: Id9dfdbd24763b6b4138935842c246b1803838b78 * Use consistent imageRef during container startup * Replace vmss update API with instance-level update API commit * Cleanup codes that not required any more * Add unit tests * Upgrade compute API to version 2019-03-01 * Update vendors * Fix issues because of rebase * Pick up security patches for fluentd-gcp-scaler by upgrading to version 0.5.2 * Short-circuit quota admission rejection on zero-delta updates * Accept admission request if resource is being deleted * Error when etcd3 watch finds delete event with nil prevKV * Bump addon-manager to v9.0.1 - Rebase image on debian-base:v1.0.0. * Remove terminated pod from summary api. Signed-off-by: Lantao Liu <lantaol@google.com> * Expect the correct object type to be removed * check if Memory is not nil for container stats * Update to go 1.12.4 * Update to go 1.12.5 * Some remaining fixes.

PatrickLang · 2019-05-22T07:55:34Z

http://www.catalog.update.microsoft.com/Search.aspx?q=KB4497934%20 - I think this is the right fix.

pjh · 2019-05-22T20:36:32Z

Yes, thanks everyone @ Microsoft for getting this fix out. I've just started installing it on some of our Windows test clusters, I'll update + close this bug when I can confirm the problem is resolved.

pjh · 2019-05-24T19:51:21Z

Unfortunately I'm still seeing "Couldn't delete ns" in our test results after enabling the update with the fix. The failures don't seem to be as frequent as when I initially reported this issue; they don't happen in every test run, but they have still caused some test runs to time out.

Here's an example from 2019-05-23. Five tests failed with messages like:

May 24 01:15:19.604: Couldn't delete ns: "windows-volumes-8050": namespace windows-volumes-8050 was not deleted with limit: timed out waiting for the condition, pods remaining: 1 (&errors.errorString{s:"namespace windows-volumes-8050 was not deleted with limit: timed out waiting for the condition, pods remaining: 1"})

And the entire test run failed after 2 hours. In the Windows node serial output you can see KB4497934 is present the Get-Hotfix output before the Windows node joins the cluster.

Another test run from the same day finished in 1h32m without any "couldn't delete ns" failures though.

@madhanrm @dineshgovindasamy any thoughts? Do you need us to collect debug information again?

@PatrickLang is this happening in clusters on Azure even with the KB in place?

pjh · 2019-05-28T17:08:47Z

The problem isn't happening in every test run now, but maybe 1 out of 5 or so.

dineshgovindasamy · 2019-05-28T20:10:03Z

@madhanrm @nagiesek Can you guys look into this?

@pjh Can you please collect logs and send us the logs?

madhanrm · 2019-05-28T21:20:16Z

At this point, we don't know if this is failing because of HNS. We need HNS traces for the duration of the run & Kubelet logs to figure out what is happening.

pjh · 2019-05-28T22:29:09Z

Ok, I've started running manual test runs with tracing enabled, will share them when I repro the issue.

pjh · 2019-05-30T17:21:57Z

Ok, I reproduced the issue again yesterday evening, on my fourth or fifth try - the first few runs did not demonstrate the "couldn't delete ns" failure. My Windows nodes are running an image created with May's Windows updates plus KB4497934 installed.

The failing test log is here - at least a dozen tests failed with the "Couldn't delete ns" timeout message. The kubelet logs for each node are in subdirectories of the artifacts directory. Finally, I captured HNS traces (commands) on each of the three Windows nodes - those can be downloaded from these links for nodes 1vjk, 9q9v and jpxd. Note that two of the three .etl trace files hit the 250MB limit, hopefully they still contain useful data.

@madhanrm @dineshgovindasamy please let me know if that's helpful or if I can help collect any additional data.

pjh · 2019-05-30T17:35:54Z

I also gathered collectlogs.ps1 output from each of the Windows nodes and put it in the subdirectories here. It's difficult for me to automate running that script so I just ran it manually, several hours after the tests completed. Let me know if those logs contain anything useful or not.

madhanrm · 2019-05-30T18:33:42Z

Ok, I reproduced the issue again yesterday evening, on my fourth or fifth try - the first few runs did not demonstrate the "couldn't delete ns" failure. My Windows nodes are running an image created with May's Windows updates plus KB4497934 installed.
The failing test log is here - at least a dozen tests failed with the "Couldn't delete ns" timeout message. The kubelet logs for each node are in subdirectories of the artifacts directory. Finally, I captured HNS traces (commands) on each of the three Windows nodes - those can be downloaded from these links for nodes 1vjk, 9q9v and jpxd. Note that two of the three .etl trace files hit the 250MB limit, hopefully they still contain useful data.
@madhanrm @dineshgovindasamy please let me know if that's helpful or if I can help collect any additional data.

Looking

madhanrm · 2019-05-30T19:09:02Z

From: e2e.out

May 29 18:46:15.580: INFO: Couldn't delete ns: "replication-controller-217": namespace replication-controller-217 was not deleted with limit: timed out waiting for the condition, pods remaining: 1 (&errors.errorString{s:"namespace replication-controller-217 was not deleted with limit: timed out waiting for the condition, pods remaining: 1"})

From Kubelet.log - e2e-test-peterhornyack-windows-node-group-jpxd

I0530 01:33:25.651866    2420 cni.go:328] Adding replication-controller-217_pod-adoption/3a1190909b64ed6f224e47e42ba82908d87c555daddd8323b21b3023be01b4f7 to network win-bridge/l2bridge netns "none"
I0530 01:33:26.170048    2420 cni.go:334] Added replication-controller-217_pod-adoption/3a1190909b64ed6f224e47e42ba82908d87c555daddd8323b21b3023be01b4f7 to network l2bridge: IP4:{IP:{IP:10.64.1.71 Mask:ffffff00} Gateway:10.64.1.2 Routes:[]}, DNS:{Nameservers:[10.0.0.10] Domain: Search:[cluster.local] Options:[]}


I0530 01:48:13.722691    2420 fake_cpu_manager.go:45] [fake cpumanager] RemoveContainer (container id: fa683f4d7e7fdf94a686145ba0a44ae077f95334a3ff999ffb5f161023e27bf5)
I0530 01:48:13.722691    2420 kuberuntime_container.go:581] Killing container "docker://fa683f4d7e7fdf94a686145ba0a44ae077f95334a3ff999ffb5f161023e27bf5" with 30 second grace period

I0530 01:48:19.603129    2420 kuberuntime_container.go:587] Container "docker://fa683f4d7e7fdf94a686145ba0a44ae077f95334a3ff999ffb5f161023e27bf5" exited normally


E0530 01:54:16.772196    2420 remote_runtime.go:271] RemoveContainer "fa683f4d7e7fdf94a686145ba0a44ae077f95334a3ff999ffb5f161023e27bf5" from runtime service failed: rpc error: code = Unknown desc = failed to remove container "fa683f4d7e7fdf94a686145ba0a44ae077f95334a3ff999ffb5f161023e27bf5": Error response from daemon: unable to remove filesystem for fa683f4d7e7fdf94a686145ba0a44ae077f95334a3ff999ffb5f161023e27bf5: remove C:\ProgramData\docker\containers\fa683f4d7e7fdf94a686145ba0a44ae077f95334a3ff999ffb5f161023e27bf5: The directory is not empty.
E0530 01:54:16.789808    2420 remote_runtime.go:271] RemoveContainer "fa683f4d7e7fdf94a686145ba0a44ae077f95334a3ff999ffb5f161023e27bf5" from runtime service failed: rpc error: code = Unknown desc = failed to remove container "fa683f4d7e7fdf94a686145ba0a44ae077f95334a3ff999ffb5f161023e27bf5": Error response from daemon: unable to remove filesystem for fa683f4d7e7fdf94a686145ba0a44ae077f95334a3ff999ffb5f161023e27bf5: GetFileAttributesEx C:\ProgramData\docker\containers\fa683f4d7e7fdf94a686145ba0a44ae077f95334a3ff999ffb5f161023e27bf5\fa683f4d7e7fdf94a686145ba0a44ae077f95334a3ff999ffb5f161023e27bf5-json.log: Access is denied.

I0530 01:54:16.754630    2420 kubelet_pods.go:917] Pod "pod-adoption_replication-controller-217(6d00fa10-c0dd-4126-a9ab-96b91853e227)" is terminated, but some containers have not been cleaned up: {ID:{Type:docker ID:fa683f4d7e7fdf94a686145ba0a44ae077f95334a3ff999ffb5f161023e27bf5} Name:pod-adoption State:exited CreatedAt:2019-05-30 01:33:26.3229959 +0000 GMT StartedAt:2019-05-30 01:33:28.4812663 +0000 GMT FinishedAt:2019-05-30 01:48:19.1602423 +0000 GMT ExitCode:-1073741077 Image:e2eteam/nginx:1.14-alpine ImageID:docker-pullable://e2eteam/nginx@sha256:31014429d17373c204a65373c84517d0a45be53bc522c0c49eba01464aee7a14 Hash:683191760 RestartCount:0 Reason:Error Message:} 

E0530 01:54:16.772196    2420 remote_runtime.go:271] RemoveContainer "fa683f4d7e7fdf94a686145ba0a44ae077f95334a3ff999ffb5f161023e27bf5" from runtime service failed: rpc error: code = Unknown desc = failed to remove container "fa683f4d7e7fdf94a686145ba0a44ae077f95334a3ff999ffb5f161023e27bf5": Error response from daemon: unable to remove filesystem for fa683f4d7e7fdf94a686145ba0a44ae077f95334a3ff999ffb5f161023e27bf5: remove C:\ProgramData\docker\containers\fa683f4d7e7fdf94a686145ba0a44ae077f95334a3ff999ffb5f161023e27bf5: The directory is not empty.


I0530 01:54:51.370901    2420 fake_cpu_manager.go:45] [fake cpumanager] RemoveContainer (container id: fa683f4d7e7fdf94a686145ba0a44ae077f95334a3ff999ffb5f161023e27bf5)
I0530 01:54:51.370901    2420 kuberuntime_container.go:581] Killing container "docker://fa683f4d7e7fdf94a686145ba0a44ae077f95334a3ff999ffb5f161023e27bf5" with 30 second grace period
I0530 01:54:51.371876    2420 kuberuntime_container.go:587] Container "docker://fa683f4d7e7fdf94a686145ba0a44ae077f95334a3ff999ffb5f161023e27bf5" exited normally
I0530 01:54:51.371876    2420 kuberuntime_container.go:802] Removing container "fa683f4d7e7fdf94a686145ba0a44ae077f95334a3ff999ffb5f161023e27bf5"
I0530 01:54:51.371876    2420 fake_cpu_manager.go:45] [fake cpumanager] RemoveContainer (container id: fa683f4d7e7fdf94a686145ba0a44ae077f95334a3ff999ffb5f161023e27bf5)

I0530 01:54:53.672026    2420 kubelet_pods.go:917] Pod "pod-adoption_replication-controller-217(6d00fa10-c0dd-4126-a9ab-96b91853e227)" is terminated, but some containers have not been cleaned up: {ID:{Type:docker ID:fa683f4d7e7fdf94a686145ba0a44ae077f95334a3ff999ffb5f161023e27bf5} Name:pod-adoption State:exited CreatedAt:2019-05-30 01:33:26.3229959 +0000 GMT StartedAt:2019-05-30 01:33:28.4812663 +0000 GMT FinishedAt:2019-05-30 01:48:19.1602423 +0000 GMT ExitCode:-1073741077 Image:e2eteam/nginx:1.14-alpine ImageID:docker-pullable://e2eteam/nginx@sha256:31014429d17373c204a65373c84517d0a45be53bc522c0c49eba01464aee7a14 Hash:683191760 RestartCount:0 Reason:Error Message:} 


I0530 01:58:23.695228    2420 status_manager.go:519] Pod "pod-adoption_replication-controller-217(6d00fa10-c0dd-4126-a9ab-96b91853e227)" fully terminated and removed from etcd
I0530 01:58:23.698508    2420 status_manager.go:479] Pod "pod-adoption" (6d00fa10-c0dd-4126-a9ab-96b91853e227) does not exist on the server

This is not the same issue as before. I dont see any hung container in hcs.
Docker remove container is failing with Access denied errors.

madhanrm · 2019-05-30T19:19:07Z

Do you happen to have the machine in this repro state?
For now, I can only speculate here that some other process (kubelet.exe ?) is holding an handle to the container log files @ C:\ProgramData\docker\containers\fa683f4d7e7fdf94a686145ba0a44ae077f95334a3ff999ffb5f161023e27bf5\fa683f4d7e7fdf94a686145ba0a44ae077f95334a3ff999ffb5f161023e27bf5-json.log

while trying to delete the container, which could fail.

But I need a machine in this state to know what is happening to this container and why deletion of container is failing.

madhanrm · 2019-05-30T19:20:42Z

The issue "Couldn't delete ns" would come anytime a container deletion fails, for various reasons.
Can we start a separate issue, to track this, and not confuse users reading this with multiple issues?

pjh · 2019-05-30T23:24:20Z

@madhanrm thanks so much for taking a look! I can definitely create a new issue so that we can close this one.

I still have the Windows nodes running after these test failures. It's unlikely that I'll be able to give you access to them, but I'd be glad to do further debugging; if you have specific suggestions please share them. If you think it could be a problem with leaked/stuck file handles maybe I can install Sysinternals and use Process Explorer to take a look?

I don't have time to investigate further today but I'll create a separate issue tomorrow and then will continue debugging there.

dineshgovindasamy · 2019-06-04T16:42:39Z

@pjh can we please close this issue and open a new issue to track this different issue?
ProcessExplorer or handle.exe(sysinternals) would be the first step to find out who is holding the handle and we can from there.

pjh · 2019-06-04T17:33:19Z

Yes, sorry for the delay, I haven't had a chance to investigate further. I'll create the new issue and then close this one.

pjh · 2019-06-05T19:24:42Z

I looked at my test nodes which were still running but handle.exe did not reveal anything; C:\ProgramData\docker\containers\ is empty on all the nodes, so whatever was seemingly holding on to those files for too long has released them by now.

I'll try to repro again and capture this debug information closer to when the test fails. I'll file a new issue when I'm able to do that.

Thanks again @madhanrm for taking a look at the new logs last week.

* test: remove k8s.io/apiextensions-apiserver from framework There are two reason why this is useful: 1. less code to vendor into external users of the framework The following dependencies become obsolete due to this change (from `dep`): (8/23) Removed unused project github.com/grpc-ecosystem/go-grpc-prometheus (9/23) Removed unused project github.com/coreos/etcd (10/23) Removed unused project github.com/globalsign/mgo (11/23) Removed unused project github.com/go-openapi/strfmt (12/23) Removed unused project github.com/asaskevich/govalidator (13/23) Removed unused project github.com/mitchellh/mapstructure (14/23) Removed unused project github.com/NYTimes/gziphandler (15/23) Removed unused project gopkg.in/natefinch/lumberjack.v2 (16/23) Removed unused project github.com/go-openapi/errors (17/23) Removed unused project github.com/go-openapi/analysis (18/23) Removed unused project github.com/go-openapi/runtime (19/23) Removed unused project sigs.k8s.io/structured-merge-diff (20/23) Removed unused project github.com/go-openapi/validate (21/23) Removed unused project github.com/coreos/go-systemd (22/23) Removed unused project github.com/go-openapi/loads (23/23) Removed unused project github.com/munnerz/goautoneg 2. works around kubernetes#75338 which currently breaks vendoring Some recent changes to crd_util.go must now be pulling in the broken k8s.io/apiextensions-apiserver packages, because it was still working in revision 2e90d92 (as demonstrated by https://github.com/intel/pmem-CSI/tree/586ae281ac2810cb4da6f1e160cf165c7daf0d80). * update Bazel files * test: fix golint warnings in crd_util.go Because the code was moved, golint is now active. Because users of the code must adapt to the new location of the code, it makes sense to also change the API at the same time to address the style comments from golint ("struct field ApiGroup should be APIGroup", same for ApiExtensionClient). * fix race condition issue for smb mount on windows change var name * stop vsphere cloud provider from spamming logs with `failed to patch IP` Fixes: kubernetes#75236 * Remove reference to USE_RELEASE_NODE_BINARIES. This variable was used for development purposes and was accidentally introduced in kubernetes@f0f7829. This is its only use in the tree: https://github.com/kubernetes/kubernetes/search?q=USE_RELEASE_NODE_BINARIES&unscoped_q=USE_RELEASE_NODE_BINARIES * Clear conntrack entries on 0 -> 1 endpoint transition with externalIPs As part of the endpoint creation process when going from 0 -> 1 conntrack entries are cleared. This is to prevent an existing conntrack entry from preventing traffic to the service. Currently the system ignores the existance of the services external IP addresses, which exposes that errant behavior This adds the externalIP addresses of udp services to the list of conntrack entries that get cleared. Allowing traffic to flow Signed-off-by: Jacob Tanenbaum <jtanenba@redhat.com> * Move to golang 1.12.1 official image We used 1.12.0 + hack to download 1.12.1 binaries as we were in a rush on friday since the images were not published at that time. Let's remove the hack now and republish the kube-cross image Change-Id: I3ffff3283b6ca755320adfca3c8f4a36dc1c2b9e * fix-kubeadm-init-output * Mark audit e2e tests as flaky * Bump kube-cross image to 1.12.1-2 * Restore username and password kubectl flags * build/gci: bump CNI version to 0.7.5 * Add/Update CHANGELOG-1.14.md for v1.14.0-rc.1. * Restore machine readability to the print-join-command output The output of `kubeadm token create --print-join-command` should be usable by batch scripts. This issue was pointed out in: kubernetes/kubeadm#1454 * bump required minimum go version to 1.12.1 (strings package compatibility) * Bump go-openapi/jsonpointer and go-openapi/jsonreference versions xref: kubernetes#75653 Signed-off-by: Jorge Alarcon Ochoa <alarcj137@gmail.com> * Kubernetes version v1.14.1-beta.0 openapi-spec file updates * Add/Update CHANGELOG-1.14.md for v1.14.0. * 1.14 release notes fixes * Add flag to enable strict ARP * Do not delete existing VS and RS when starting * Update Cluster Autscaler version to 1.14.0 No changes since 1.14.0-beta.2 Changelog: https://github.com/kubernetes/autoscaler/releases/tag/cluster-autoscaler-1.14.0 * Fix Windows to read VM UUIDs from serial numbers Certain versions of vSphere do not have the same value for product_uuid and product_serial. This mimics the change in kubernetes#59519. Fixes kubernetes#74888 * godeps: update vmware/govmomi to v0.20 release * vSphere: add token auth support for tags client SAML auth support for the vCenter rest API endpoint came to govmomi a bit after Zone support came to vSphere Cloud Provider. Fixes kubernetes#75511 * vsphere: govmomi rest API simulator requires authentication * gce: configure: validate SA has storage scope If the VM SA doesn't have storage scope associated, don't use the token in the curl request or the request will fail with 403. * fix-external-etcd * Update gcp images with security patches [stackdriver addon] Bump prometheus-to-sd to v0.5.0 to pick up security fixes. [fluentd-gcp addon] Bump fluentd-gcp-scaler to v0.5.1 to pick up security fixes. [fluentd-gcp addon] Bump event-exporter to v0.2.4 to pick up security fixes. [fluentd-gcp addon] Bump prometheus-to-sd to v0.5.0 to pick up security fixes. [metatada-proxy addon] Bump prometheus-to-sd v0.5.0 to pick up security fixes. * kubeadm: fix "upgrade plan" not working without k8s version If the k8s version argument passed to "upgrade plan" is missing the logic should perform the following actions: - fetch a "stable" version from the internet. - if that fails, fallback to the local client version. Currentely the logic fails because the cfg.KubernetesVersion is defaulted to the version of the existing cluster, which then causes an early exit without any ugprade suggestions. See app/cmd/upgrade/common.go::enforceRequirements(): configutil.FetchInitConfigurationFromCluster(..) Fix that by passing the explicit user value that can also be "". This will then make the "offline getter" treat it as an explicit desired upgrade target. In the future it might be best to invert this logic: - if no user k8s version argument is passed - default to the kubeadm version. - if labels are passed (e.g. "stable"), fetch a version from the internet. * Disable GCE agent address management on Windows nodes. With this metadata key set, "GCEWindowsAgent: GCE address manager status: disabled" will appear in the VM's serial port output during boot. Tested: PROJECT=${CLOUDSDK_CORE_PROJECT} KUBE_GCE_ENABLE_IP_ALIASES=true NUM_WINDOWS_NODES=2 NUM_NODES=2 KUBERNETES_NODE_PLATFORM=windows go run ./hack/e2e.go -- --up cluster/gce/windows/smoke-test.sh cat > iis.yaml <<EOF apiVersion: v1 kind: Pod metadata: name: iis labels: app: iis spec: containers: - image: mcr.microsoft.com/windows/servercore/iis imagePullPolicy: IfNotPresent name: iis-server ports: - containerPort: 80 protocol: TCP nodeSelector: beta.kubernetes.io/os: windows tolerations: - effect: NoSchedule key: node.kubernetes.io/os operator: Equal value: windows1809 EOF kubectl create -f iis.yaml kubectl expose pod iis --type=LoadBalancer --name=iis kubectl get services curl http://<service external IP address> * kube-aggregator: bump openapi aggregation log level * Explicitly flush headers when proxying * fix-kubeadm-upgrade-12-13-14 * GCE/Windows: disable stackdriver logging agent The logging service could not be stopped at times, causing node startup failures. Disable it until the issue is fixed. * Finish saving test results on failure The conformance image should be saving its results regardless of the results of the tests. However, with errexit set, when ginkgo gets test failures it exits 1 which prevents saving the results for Sonobuoy to pick up. Fixes: kubernetes#76036 * Avoid panic in cronjob sorting This change handles the case where the ith cronjob may have its start time set to nil. Previously, the Less method could cause a panic in case the ith cronjob had its start time set to nil, but the jth cronjob did not. It would panic when calling Before on a nil StartTime. * Removed cleanup for non-current kube-proxy modes in newProxyServer() * Depricated --cleanup-ipvs flag in kube-proxy * Fixed old function signature in kube-proxy tests. * Revert "Deprecated --cleanup-ipvs flag in kube-proxy" This reverts commit 4f1bb2b. * Revert "Fixed old function signature in kube-proxy tests." This reverts commit 29ba1b0. * Fixed --cleanup-ipvs help text * Check for required name parameter in dynamic client The Create, Delete, Get, Patch, Update and UpdateStatus methods in the dynamic client all expect the name parameter to be non-empty, but did not validate this requirement, which could lead to a panic. Add explicit checks to these methods. * Fix empty array expansion error in cluster/gce/util.sh Empty array expansion causes "unbound variable" error in bash 4.2 and bash 4.3. * Improve volume operation metrics * Add e2e tests * ensuring that logic is checking for differences in listener * Kubernetes version v1.14.2-beta.0 openapi-spec file updates * Delete only unscheduled pods if node doesn't exist anymore. * Add/Update CHANGELOG-1.14.md for v1.14.1. * Use Node-Problem-Detector v0.6.3 on GCI * proxy: Take into account exclude CIDRs while deleting legacy real servers * kubeadm: Don't error out on join with --cri-socket override In the case where newControlPlane is true we don't go through getNodeRegistration() and initcfg.NodeRegistration.CRISocket is empty. This forces DetectCRISocket() to be called later on, and if there is more than one CRI installed on the system, it will error out, while asking for the user to provide an override for the CRI socket. Even if the user provides an override, the call to DetectCRISocket() can happen too early and thus ignore it (while still erroring out). However, if newControlPlane == true, initcfg.NodeRegistration is not used at all and it's overwritten later on. Thus it's necessary to supply some default value, that will avoid the call to DetectCRISocket() and as initcfg.NodeRegistration is discarded, setting whatever value here is harmless. Signed-off-by: Rostislav M. Georgiev <rostislavg@vmware.com> * Bump coreos/go-semver The https://github.com/coreos/go-semver/ dependency has formally release v0.3.0 at commit e214231b295a8ea9479f11b70b35d5acf3556d9b. This is the commit point we've been using, but the hack/verify-godeps.sh script notices the discrepancy and causes ci-kubernetes-verify job to fail. Fixes: kubernetes#76526 Signed-off-by: Tim Pepper <tpepper@vmware.com> * Fix Azure SLB support for multiple backend pools Azure VM and vmssVM support multiple backend pools for the same SLB, but not for different LBs. * Restore metrics-server using of IP addresses This preference list matches is used to pick prefered field from k8s node object. It was introduced in metrics-server 0.3 and changed default behaviour to use DNS instead of IP addresses. It was merged into k8s 1.12 and caused breaking change by introducing dependency on DNS configuration. * refactor detach azure disk retry operation * move disk lock process to azure cloud provider fix comments fix import keymux check error add unit test for attach/detach disk funcs * Fix concurrent map access in Portworx create volume call Fixes kubernetes#76340 Signed-off-by: Harsh Desai <harsh@portworx.com> * Fix race condition between actual and desired state in kublet volume manager This PR fixes the issue kubernetes#75345. This fix modified the checking volume in actual state when validating whether volume can be removed from desired state or not. Only if volume status is already mounted in actual state, it can be removed from desired state. For the case of mounting fails always, it can still work because the check also validate whether pod still exist in pod manager. In case of mount fails, pod should be able to removed from pod manager so that volume can also be removed from desired state. * fix validation message: apiServerEndpoints -> apiServerEndpoint * add shareName param in azure file storage class skip create azure file if it exists * Update Cluster Autoscaler to 1.14.2 * Create the "internal" firewall rule for kubemark master. This is equivalent to the "internal" firewall rule that is created for the regular masters. The main reason for doing it is to allow prometheus scraping metrics from various kubemark master components, e.g. kubelet. Ref. kubernetes/perf-tests#503 * fix disk list corruption issue * Restrict builds to officially supported platforms Prior to this change, including windows/amd64 in KUBE_BUILD_PLATFORMS would, for example, attempt to build the server binaries/tars/images for Windows, which is not supported. This can break downstream build steps. * Fix verify godeps failure github.com/evanphx/json-patch added a new tag at the same sha this morning: https://github.com/evanphx/json-patch/releases/tag/v4.2.0 This confused godeps. This PR updates our file to match godeps expectation. Fixes issue 77238 * Upgrade Stackdriver Logging Agent addon image from 1.6.0 to 1.6.8. * Test kubectl cp escape * Properly handle links in tar * Bump debian-iptables versions to v11.0.2. * os exit when option is true * Pin GCE Windows node image to 1809 v20190312. This is to work around kubernetes#76666. * Update the dynamic volume limit in GCE PD Currently GCE PD support 128 maximum disks attached to a node for all machines types except shared-core. This PR updates the limit number to date. Change-Id: Id9dfdbd24763b6b4138935842c246b1803838b78 * Use consistent imageRef during container startup * Replace vmss update API with instance-level update API commit * Cleanup codes that not required any more * Add unit tests * Upgrade compute API to version 2019-03-01 * Update vendors * Fix issues because of rebase * Pick up security patches for fluentd-gcp-scaler by upgrading to version 0.5.2 * Short-circuit quota admission rejection on zero-delta updates * Accept admission request if resource is being deleted * Error when etcd3 watch finds delete event with nil prevKV * Bump addon-manager to v9.0.1 - Rebase image on debian-base:v1.0.0. * Remove terminated pod from summary api. Signed-off-by: Lantao Liu <lantaol@google.com> * Expect the correct object type to be removed * check if Memory is not nil for container stats * Fix eviction dry-run * Update k8s-dns-node-cache image version This revised image resolves kubernetes dns#292 by updating the image from `k8s-dns-node-cache:1.15.2` to `k8s-dns-node-cache:1.15.2` * Update to go 1.12.4 * Update to go 1.12.5 * fix incorrect prometheus metrics fix left incorrect metrics * In GuaranteedUpdate, retry on any error if we are working with stale data * BoundServiceAccountTokenVolume: fix InClusterConfig * Don't create a RuntimeClassManager without a KubeClient * Kubernetes version v1.14.3-beta.0 openapi-spec file updates * Add/Update CHANGELOG-1.14.md for v1.14.2. * fix CVE-2019-11244: `kubectl --http-cache=<world-accessible dir>` creates world-writeable cached schema files * Upgrade Azure network API version to 2018-07-01 * Update godeps * Terminate watchers when watch cache is destroyed * honor overridden tokenfile, add InClusterConfig override tests * Don't use mapfile as it isn't bash 3 compatible * fix unbound array variable * fix unbound variable release.sh * Don't use declare -g in build * Check KUBE_SERVER_PLATFORMS existence when compile kubectl on platform other than linux/amd64, we need to check the KUBE_SERVER_PLATFORMS array emptiness before assign it. the example command is: make WHAT=cmd/kubectl KUBE_BUILD_PLATFORMS="darwin/amd64 windows/amd64" * Backport of kubernetes#78137: godeps: update vmware/govmomi to v0.20.1 Cannot cherry-pick kubernetes#78137 (go mod vs godep) Includes fix for SAML token auth with vSphere and zones API Issue kubernetes#77360 See also: kubernetes#75742 * fix: failed to close kubelet->API connections on heartbeat failure * Revert "Use consistent imageRef during container startup" This reverts commit 26e3c86. * fix azure retry issue when return 2XX with error fix comments * Disable graceful termination for udp

* test: remove k8s.io/apiextensions-apiserver from framework There are two reason why this is useful: 1. less code to vendor into external users of the framework The following dependencies become obsolete due to this change (from `dep`): (8/23) Removed unused project github.com/grpc-ecosystem/go-grpc-prometheus (9/23) Removed unused project github.com/coreos/etcd (10/23) Removed unused project github.com/globalsign/mgo (11/23) Removed unused project github.com/go-openapi/strfmt (12/23) Removed unused project github.com/asaskevich/govalidator (13/23) Removed unused project github.com/mitchellh/mapstructure (14/23) Removed unused project github.com/NYTimes/gziphandler (15/23) Removed unused project gopkg.in/natefinch/lumberjack.v2 (16/23) Removed unused project github.com/go-openapi/errors (17/23) Removed unused project github.com/go-openapi/analysis (18/23) Removed unused project github.com/go-openapi/runtime (19/23) Removed unused project sigs.k8s.io/structured-merge-diff (20/23) Removed unused project github.com/go-openapi/validate (21/23) Removed unused project github.com/coreos/go-systemd (22/23) Removed unused project github.com/go-openapi/loads (23/23) Removed unused project github.com/munnerz/goautoneg 2. works around kubernetes#75338 which currently breaks vendoring Some recent changes to crd_util.go must now be pulling in the broken k8s.io/apiextensions-apiserver packages, because it was still working in revision 2e90d92 (as demonstrated by https://github.com/intel/pmem-CSI/tree/586ae281ac2810cb4da6f1e160cf165c7daf0d80). * update Bazel files * test: fix golint warnings in crd_util.go Because the code was moved, golint is now active. Because users of the code must adapt to the new location of the code, it makes sense to also change the API at the same time to address the style comments from golint ("struct field ApiGroup should be APIGroup", same for ApiExtensionClient). * fix race condition issue for smb mount on windows change var name * stop vsphere cloud provider from spamming logs with `failed to patch IP` Fixes: kubernetes#75236 * Remove reference to USE_RELEASE_NODE_BINARIES. This variable was used for development purposes and was accidentally introduced in kubernetes@f0f7829. This is its only use in the tree: https://github.com/kubernetes/kubernetes/search?q=USE_RELEASE_NODE_BINARIES&unscoped_q=USE_RELEASE_NODE_BINARIES * Clear conntrack entries on 0 -> 1 endpoint transition with externalIPs As part of the endpoint creation process when going from 0 -> 1 conntrack entries are cleared. This is to prevent an existing conntrack entry from preventing traffic to the service. Currently the system ignores the existance of the services external IP addresses, which exposes that errant behavior This adds the externalIP addresses of udp services to the list of conntrack entries that get cleared. Allowing traffic to flow Signed-off-by: Jacob Tanenbaum <jtanenba@redhat.com> * Move to golang 1.12.1 official image We used 1.12.0 + hack to download 1.12.1 binaries as we were in a rush on friday since the images were not published at that time. Let's remove the hack now and republish the kube-cross image Change-Id: I3ffff3283b6ca755320adfca3c8f4a36dc1c2b9e * fix-kubeadm-init-output * Mark audit e2e tests as flaky * Bump kube-cross image to 1.12.1-2 * Restore username and password kubectl flags * build/gci: bump CNI version to 0.7.5 * Add/Update CHANGELOG-1.14.md for v1.14.0-rc.1. * Restore machine readability to the print-join-command output The output of `kubeadm token create --print-join-command` should be usable by batch scripts. This issue was pointed out in: kubernetes/kubeadm#1454 * bump required minimum go version to 1.12.1 (strings package compatibility) * Bump go-openapi/jsonpointer and go-openapi/jsonreference versions xref: kubernetes#75653 Signed-off-by: Jorge Alarcon Ochoa <alarcj137@gmail.com> * Kubernetes version v1.14.1-beta.0 openapi-spec file updates * Add/Update CHANGELOG-1.14.md for v1.14.0. * 1.14 release notes fixes * Add flag to enable strict ARP * Do not delete existing VS and RS when starting * Update Cluster Autscaler version to 1.14.0 No changes since 1.14.0-beta.2 Changelog: https://github.com/kubernetes/autoscaler/releases/tag/cluster-autoscaler-1.14.0 * Fix Windows to read VM UUIDs from serial numbers Certain versions of vSphere do not have the same value for product_uuid and product_serial. This mimics the change in kubernetes#59519. Fixes kubernetes#74888 * godeps: update vmware/govmomi to v0.20 release * vSphere: add token auth support for tags client SAML auth support for the vCenter rest API endpoint came to govmomi a bit after Zone support came to vSphere Cloud Provider. Fixes kubernetes#75511 * vsphere: govmomi rest API simulator requires authentication * gce: configure: validate SA has storage scope If the VM SA doesn't have storage scope associated, don't use the token in the curl request or the request will fail with 403. * fix-external-etcd * Update gcp images with security patches [stackdriver addon] Bump prometheus-to-sd to v0.5.0 to pick up security fixes. [fluentd-gcp addon] Bump fluentd-gcp-scaler to v0.5.1 to pick up security fixes. [fluentd-gcp addon] Bump event-exporter to v0.2.4 to pick up security fixes. [fluentd-gcp addon] Bump prometheus-to-sd to v0.5.0 to pick up security fixes. [metatada-proxy addon] Bump prometheus-to-sd v0.5.0 to pick up security fixes. * kubeadm: fix "upgrade plan" not working without k8s version If the k8s version argument passed to "upgrade plan" is missing the logic should perform the following actions: - fetch a "stable" version from the internet. - if that fails, fallback to the local client version. Currentely the logic fails because the cfg.KubernetesVersion is defaulted to the version of the existing cluster, which then causes an early exit without any ugprade suggestions. See app/cmd/upgrade/common.go::enforceRequirements(): configutil.FetchInitConfigurationFromCluster(..) Fix that by passing the explicit user value that can also be "". This will then make the "offline getter" treat it as an explicit desired upgrade target. In the future it might be best to invert this logic: - if no user k8s version argument is passed - default to the kubeadm version. - if labels are passed (e.g. "stable"), fetch a version from the internet. * Disable GCE agent address management on Windows nodes. With this metadata key set, "GCEWindowsAgent: GCE address manager status: disabled" will appear in the VM's serial port output during boot. Tested: PROJECT=${CLOUDSDK_CORE_PROJECT} KUBE_GCE_ENABLE_IP_ALIASES=true NUM_WINDOWS_NODES=2 NUM_NODES=2 KUBERNETES_NODE_PLATFORM=windows go run ./hack/e2e.go -- --up cluster/gce/windows/smoke-test.sh cat > iis.yaml <<EOF apiVersion: v1 kind: Pod metadata: name: iis labels: app: iis spec: containers: - image: mcr.microsoft.com/windows/servercore/iis imagePullPolicy: IfNotPresent name: iis-server ports: - containerPort: 80 protocol: TCP nodeSelector: beta.kubernetes.io/os: windows tolerations: - effect: NoSchedule key: node.kubernetes.io/os operator: Equal value: windows1809 EOF kubectl create -f iis.yaml kubectl expose pod iis --type=LoadBalancer --name=iis kubectl get services curl http://<service external IP address> * kube-aggregator: bump openapi aggregation log level * Explicitly flush headers when proxying * fix-kubeadm-upgrade-12-13-14 * GCE/Windows: disable stackdriver logging agent The logging service could not be stopped at times, causing node startup failures. Disable it until the issue is fixed. * Finish saving test results on failure The conformance image should be saving its results regardless of the results of the tests. However, with errexit set, when ginkgo gets test failures it exits 1 which prevents saving the results for Sonobuoy to pick up. Fixes: kubernetes#76036 * Avoid panic in cronjob sorting This change handles the case where the ith cronjob may have its start time set to nil. Previously, the Less method could cause a panic in case the ith cronjob had its start time set to nil, but the jth cronjob did not. It would panic when calling Before on a nil StartTime. * Removed cleanup for non-current kube-proxy modes in newProxyServer() * Depricated --cleanup-ipvs flag in kube-proxy * Fixed old function signature in kube-proxy tests. * Revert "Deprecated --cleanup-ipvs flag in kube-proxy" This reverts commit 4f1bb2b. * Revert "Fixed old function signature in kube-proxy tests." This reverts commit 29ba1b0. * Fixed --cleanup-ipvs help text * Check for required name parameter in dynamic client The Create, Delete, Get, Patch, Update and UpdateStatus methods in the dynamic client all expect the name parameter to be non-empty, but did not validate this requirement, which could lead to a panic. Add explicit checks to these methods. * Fix empty array expansion error in cluster/gce/util.sh Empty array expansion causes "unbound variable" error in bash 4.2 and bash 4.3. * Improve volume operation metrics * Add e2e tests * ensuring that logic is checking for differences in listener * Kubernetes version v1.14.2-beta.0 openapi-spec file updates * Delete only unscheduled pods if node doesn't exist anymore. * Add/Update CHANGELOG-1.14.md for v1.14.1. * Use Node-Problem-Detector v0.6.3 on GCI * proxy: Take into account exclude CIDRs while deleting legacy real servers * kubeadm: Don't error out on join with --cri-socket override In the case where newControlPlane is true we don't go through getNodeRegistration() and initcfg.NodeRegistration.CRISocket is empty. This forces DetectCRISocket() to be called later on, and if there is more than one CRI installed on the system, it will error out, while asking for the user to provide an override for the CRI socket. Even if the user provides an override, the call to DetectCRISocket() can happen too early and thus ignore it (while still erroring out). However, if newControlPlane == true, initcfg.NodeRegistration is not used at all and it's overwritten later on. Thus it's necessary to supply some default value, that will avoid the call to DetectCRISocket() and as initcfg.NodeRegistration is discarded, setting whatever value here is harmless. Signed-off-by: Rostislav M. Georgiev <rostislavg@vmware.com> * Bump coreos/go-semver The https://github.com/coreos/go-semver/ dependency has formally release v0.3.0 at commit e214231b295a8ea9479f11b70b35d5acf3556d9b. This is the commit point we've been using, but the hack/verify-godeps.sh script notices the discrepancy and causes ci-kubernetes-verify job to fail. Fixes: kubernetes#76526 Signed-off-by: Tim Pepper <tpepper@vmware.com> * Fix Azure SLB support for multiple backend pools Azure VM and vmssVM support multiple backend pools for the same SLB, but not for different LBs. * Restore metrics-server using of IP addresses This preference list matches is used to pick prefered field from k8s node object. It was introduced in metrics-server 0.3 and changed default behaviour to use DNS instead of IP addresses. It was merged into k8s 1.12 and caused breaking change by introducing dependency on DNS configuration. * refactor detach azure disk retry operation * move disk lock process to azure cloud provider fix comments fix import keymux check error add unit test for attach/detach disk funcs * Fix concurrent map access in Portworx create volume call Fixes kubernetes#76340 Signed-off-by: Harsh Desai <harsh@portworx.com> * Fix race condition between actual and desired state in kublet volume manager This PR fixes the issue kubernetes#75345. This fix modified the checking volume in actual state when validating whether volume can be removed from desired state or not. Only if volume status is already mounted in actual state, it can be removed from desired state. For the case of mounting fails always, it can still work because the check also validate whether pod still exist in pod manager. In case of mount fails, pod should be able to removed from pod manager so that volume can also be removed from desired state. * fix validation message: apiServerEndpoints -> apiServerEndpoint * add shareName param in azure file storage class skip create azure file if it exists * Update Cluster Autoscaler to 1.14.2 * Create the "internal" firewall rule for kubemark master. This is equivalent to the "internal" firewall rule that is created for the regular masters. The main reason for doing it is to allow prometheus scraping metrics from various kubemark master components, e.g. kubelet. Ref. kubernetes/perf-tests#503 * fix disk list corruption issue * Restrict builds to officially supported platforms Prior to this change, including windows/amd64 in KUBE_BUILD_PLATFORMS would, for example, attempt to build the server binaries/tars/images for Windows, which is not supported. This can break downstream build steps. * Fix verify godeps failure github.com/evanphx/json-patch added a new tag at the same sha this morning: https://github.com/evanphx/json-patch/releases/tag/v4.2.0 This confused godeps. This PR updates our file to match godeps expectation. Fixes issue 77238 * Upgrade Stackdriver Logging Agent addon image from 1.6.0 to 1.6.8. * Test kubectl cp escape * Properly handle links in tar * Bump debian-iptables versions to v11.0.2. * os exit when option is true * Pin GCE Windows node image to 1809 v20190312. This is to work around kubernetes#76666. * Update the dynamic volume limit in GCE PD Currently GCE PD support 128 maximum disks attached to a node for all machines types except shared-core. This PR updates the limit number to date. Change-Id: Id9dfdbd24763b6b4138935842c246b1803838b78 * Use consistent imageRef during container startup * Replace vmss update API with instance-level update API commit * Cleanup codes that not required any more * Add unit tests * Upgrade compute API to version 2019-03-01 * Update vendors * Fix issues because of rebase * Pick up security patches for fluentd-gcp-scaler by upgrading to version 0.5.2 * Short-circuit quota admission rejection on zero-delta updates * Accept admission request if resource is being deleted * Error when etcd3 watch finds delete event with nil prevKV * Bump addon-manager to v9.0.1 - Rebase image on debian-base:v1.0.0. * Remove terminated pod from summary api. Signed-off-by: Lantao Liu <lantaol@google.com> * Expect the correct object type to be removed * check if Memory is not nil for container stats * Fix eviction dry-run * Update k8s-dns-node-cache image version This revised image resolves kubernetes dns#292 by updating the image from `k8s-dns-node-cache:1.15.2` to `k8s-dns-node-cache:1.15.2` * Update to go 1.12.4 * Update to go 1.12.5 * Bump ip-masq-agent version to v2.3.0 * fix incorrect prometheus metrics fix left incorrect metrics * In GuaranteedUpdate, retry on any error if we are working with stale data * BoundServiceAccountTokenVolume: fix InClusterConfig * Don't create a RuntimeClassManager without a KubeClient * Kubernetes version v1.14.3-beta.0 openapi-spec file updates * Add/Update CHANGELOG-1.14.md for v1.14.2. * fix CVE-2019-11244: `kubectl --http-cache=<world-accessible dir>` creates world-writeable cached schema files * Upgrade Azure network API version to 2018-07-01 * Update godeps * Terminate watchers when watch cache is destroyed * honor overridden tokenfile, add InClusterConfig override tests * Don't use mapfile as it isn't bash 3 compatible * fix unbound array variable * fix unbound variable release.sh * Don't use declare -g in build * Check KUBE_SERVER_PLATFORMS existence when compile kubectl on platform other than linux/amd64, we need to check the KUBE_SERVER_PLATFORMS array emptiness before assign it. the example command is: make WHAT=cmd/kubectl KUBE_BUILD_PLATFORMS="darwin/amd64 windows/amd64" * Backport of kubernetes#78137: godeps: update vmware/govmomi to v0.20.1 Cannot cherry-pick kubernetes#78137 (go mod vs godep) Includes fix for SAML token auth with vSphere and zones API Issue kubernetes#77360 See also: kubernetes#75742 * fix: failed to close kubelet->API connections on heartbeat failure * Revert "Use consistent imageRef during container startup" This reverts commit 26e3c86. * fix azure retry issue when return 2XX with error fix comments * Disable graceful termination for udp * cherry pick of 017f57a, had to do a very simple merge of BUILD * Fix memory leak from not closing hcs container handles * Fix volume mount tests issue for windows For windows node, security context is disabled. This PR fixes a bug so that fsGroup will not be applied to pods that run on windows node. Change-Id: Id9870416d2ad8ef791b3b4896d6747a2adbada2f * Kubernetes version v1.14.4-beta.0 openapi-spec file updates * Add/Update CHANGELOG-1.14.md for v1.14.3. * Fix kubectl apply skew test with extra properties * fix: update vm if detach a non-existing disk fix gofmt issue * picked up extra unnecessary dep in merge at least verify build thinks its unnecessary * Move CSIDriver Lister to the controller * Fix incorrect procMount defaulting * vSphere: allow SAML token delegation Issue kubernetes#77360 * Use any host that mounts the datastore to create Volume Also, This change makes zone to work per datacenter and cleans up dummy vms. There can be multiple datastores found for a given name. The datastore name is unique only within a datacenter. So this commit returns a list of datastores for a given datastore name in FindDatastoreByName() method. The calles are responsible to handle or find the right datastore to use among those returned. * ipvs: fix string check for IPVS protocol during graceful termination Signed-off-by: Andrew Sy Kim <kiman@vmware.com> * fix flexvol stuck issue due to corrupted mnt point fix comments about PathExists fix comments revert change in PathExists func * Avoid the default server mux * Ignore cgroup pid support if related feature gates are disabled * kubelet: retry pod sandbox creation when containers were never created If kubelet never gets past sandbox creation (i.e., never attempted to create containers for a pod), it should retry the sandbox creation on failure, regardless of the restart policy of the pod. * Default resourceGroup should be used when value of annotation azure-load-balancer-resource-group is empty string * fix kubelet can not delete orphaned pod directory when the kubelet's root directory symbolically links to another device's directory * Allow unit test to pass on machines without ipv6 * Fix AWS DHCP option set domain names causing garbled InternalDNS or Hostname addresses on Node * Fix closing of dirs in doSafeMakeDir This fixes the issue where "childFD" from syscall.Openat is assigned to a local variable inside the for loop, instead of the correct one in the function scope. This results in that when trying to close the "childFD" in the function scope, it will be equal to "-1", instead of the correct value. * There are various reasons that the HPA will decide not the change the current scale. Two important ones are when missing metrics might change the direction of scaling, and when the recommended scale is within tolerance of the current scale. The way that ReplicaCalculator signals it's desire to not change the current scale is by returning the current scale. However the current scale is from scale.Status.Replicas and can be larger than scale.Spec.Replicas (e.g. during Deployment rollout with configured surge). This causes a positive feedback loop because scale.Status.Replicas is written back into scale.Spec.Replicas, further increasing the current scale. This PR fixes the feedback loop by plumbing the replica count from spec through horizontal.go and replica_calculator.go so the calculator can punt with the right value. * edit google dns hostname

pjh added the kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. label Apr 16, 2019

k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Apr 16, 2019

k8s-ci-robot added sig/windows Categorizes an issue or PR as relevant to SIG Windows. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Apr 16, 2019

k8s-ci-robot assigned pjh Apr 16, 2019

pjh mentioned this issue Apr 16, 2019

Force Docker version to 18.09.3 pjh/kubernetes#77

Merged

pjh mentioned this issue Apr 16, 2019

Reenable Stackdriver 1.9. #76632

Closed

pjh mentioned this issue Apr 17, 2019

Pin Windows node image to 1809 v20190312. pjh/kubernetes#78

Merged

yujuhong mentioned this issue Apr 17, 2019

encountered an error during Terminate: hcsshim: the handle has already been closed microsoft/hcsshim#567

Open

pjh added a commit to pjh/kubernetes that referenced this issue Apr 17, 2019

Pin GCE Windows node image to 1809 v20190312.

81cc384

This is to work around kubernetes#76666.

pjh mentioned this issue Apr 17, 2019

Pin GCE Windows node image to 1809 v20190312. #76722

Merged

nagiesek mentioned this issue Apr 17, 2019

GCE/Windows: error while ProvisionEndpoint(): Element not found #75421

Closed

pjh added a commit to pjh/kubernetes that referenced this issue May 30, 2019

Repro of kubernetes#76666 on v20190514 with KB4497934.

cca1e7a

PatrickLang added this to Backlog in SIG-Windows Jun 4, 2019

pjh closed this as completed Jun 5, 2019

SIG-Windows automation moved this from Backlog to Done (v1.15) Jun 5, 2019

Windows tests failing on GCE with "Couldn't delete ns" timeouts #76666

Windows tests failing on GCE with "Couldn't delete ns" timeouts #76666

Comments

pjh commented Apr 16, 2019

pjh commented Apr 16, 2019

k8s-ci-robot commented Apr 16, 2019

pjh commented Apr 16, 2019

pjh commented Apr 16, 2019

pjh commented Apr 16, 2019

pjh commented Apr 16, 2019

pjh commented Apr 16, 2019

pjh commented Apr 16, 2019

pjh commented Apr 16, 2019

yujuhong commented Apr 16, 2019

yujuhong commented Apr 16, 2019 • edited Loading

pjh commented Apr 16, 2019

pjh commented Apr 17, 2019

pjh commented Apr 17, 2019

madhanrm commented Apr 17, 2019

madhanrm commented Apr 17, 2019

yujuhong commented Apr 17, 2019

pjh commented Apr 17, 2019

daschott commented May 18, 2019 • edited Loading

pjh commented May 20, 2019

PatrickLang commented May 21, 2019

pjh commented May 21, 2019

PatrickLang commented May 22, 2019

pjh commented May 22, 2019

pjh commented May 24, 2019

pjh commented May 28, 2019

dineshgovindasamy commented May 28, 2019

madhanrm commented May 28, 2019

pjh commented May 28, 2019

pjh commented May 30, 2019

pjh commented May 30, 2019

madhanrm commented May 30, 2019

madhanrm commented May 30, 2019

madhanrm commented May 30, 2019

madhanrm commented May 30, 2019

pjh commented May 30, 2019

dineshgovindasamy commented Jun 4, 2019

pjh commented Jun 4, 2019

pjh commented Jun 5, 2019

yujuhong commented Apr 16, 2019 •

edited

Loading

daschott commented May 18, 2019 •

edited

Loading