New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LCOW: Add UVM debugability by grabbing logs before tear-down #34846

Merged
merged 2 commits into from Sep 19, 2017

Conversation

Projects
None yet
7 participants
@jhowardmsft
Contributor

jhowardmsft commented Sep 13, 2017

Signed-off-by: John Howard jhoward@microsoft.com

@rn @johnstep - Adds the ability for docker to grab logs from the utility VM before it is torn down in advanced debugging mode (the daemon must be in debug mode itself, plus "OPENGCS_DEBUG_ENABLE" must be set in its environment.)

You can see an example of what it looks like from Microsoft/opengcs#134.

It's a far-from-perfect solution, but pretty much the best that is possible in the RS3 release.

LCOW: Add GCS debugging
Signed-off-by: John Howard <jhoward@microsoft.com>
@jhowardmsft

This comment has been minimized.

Show comment
Hide comment
@jhowardmsft

jhowardmsft Sep 13, 2017

Contributor

@friism FYI

Contributor

jhowardmsft commented Sep 13, 2017

@friism FYI

@rn

Just a minor style nit but otherwise LGTM. I've used this yesterday and it was very helpful for debugging indeed.

@@ -58,6 +59,8 @@ func (ctr *container) start(attachStdio StdioCallback) error {
return err
}
defer ctr.debugGCS()

This comment has been minimized.

@rn

rn Sep 15, 2017

Member

Given that defer ctr.debugGCS() is called both on the error path and the normal path, wouldn't it be cleaner to write this (and the above) as:

err = ctr.hcsContainer.Start()
defer ctr.debugGCS()
if err != nil {
...
@rn

rn Sep 15, 2017

Member

Given that defer ctr.debugGCS() is called both on the error path and the normal path, wouldn't it be cleaner to write this (and the above) as:

err = ctr.hcsContainer.Start()
defer ctr.debugGCS()
if err != nil {
...

This comment has been minimized.

@jhowardmsft

jhowardmsft Sep 15, 2017

Contributor

Ah no. If I deferred the one after start, it wouldn't fire as terminate would have been called by the time the defer was executed, and there wouldn't be a UVM around to debug. Make sense?

@jhowardmsft

jhowardmsft Sep 15, 2017

Contributor

Ah no. If I deferred the one after start, it wouldn't fire as terminate would have been called by the time the defer was executed, and there wouldn't be a UVM around to debug. Make sense?

This comment has been minimized.

@rn

rn Sep 16, 2017

Member

yes, sorry, missed the subtlety.

@rn

rn Sep 16, 2017

Member

yes, sorry, missed the subtlety.

@rn

rn approved these changes Sep 16, 2017

Not a maintainer, but this LGTM and I've used it for debug last week and it was very useful

@@ -58,6 +59,8 @@ func (ctr *container) start(attachStdio StdioCallback) error {
return err
}
defer ctr.debugGCS()

This comment has been minimized.

@rn

rn Sep 16, 2017

Member

yes, sorry, missed the subtlety.

@rn

rn Sep 16, 2017

Member

yes, sorry, missed the subtlety.

@thaJeztah

LGTM ping @simonferquel PTAL

Show outdated Hide outdated libcontainerd/utils_windows.go
@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah Sep 17, 2017

Member

vendor CI is failing;" 😢

05:05:23 The result of vndr differs
05:05:23 
05:05:23  D vendor/github.com/gotestyourself/gotestyourself/skip/skip.go
05:05:23 
05:05:23 Please vendor your package with github.com/LK4D4/vndr.
05:05:23 
Member

thaJeztah commented Sep 17, 2017

vendor CI is failing;" 😢

05:05:23 The result of vndr differs
05:05:23 
05:05:23  D vendor/github.com/gotestyourself/gotestyourself/skip/skip.go
05:05:23 
05:05:23 Please vendor your package with github.com/LK4D4/vndr.
05:05:23 
@jhowardmsft

This comment has been minimized.

Show comment
Hide comment
@jhowardmsft

jhowardmsft Sep 17, 2017

Contributor

Weird! Prob won't get time to do today though

Contributor

jhowardmsft commented Sep 17, 2017

Weird! Prob won't get time to do today though

@jhowardmsft

This comment has been minimized.

Show comment
Hide comment
@jhowardmsft

jhowardmsft Sep 18, 2017

Contributor

Truly bizarre. vndr for me showed no difference - both a full run and a specific vndr of gotestyourself/gotestyourself v1.1.0. So manually fixed up....

jhoward@jhoward-z420:[~/go/src/github.com/docker/docker]: $ grep gotestyourself vendor.conf
github.com/gotestyourself/gotestyourself v1.1.0
jhoward@jhoward-z420:[~/go/src/github.com/docker/docker]: $ vndr github.com/gotestyourself/gotestyourself v1.1.0
2017/09/18 09:27:10 Collecting initial packages
2017/09/18 09:27:11 Download dependencies
2017/09/18 09:27:13     Clone github.com/gotestyourself/gotestyourself, revision v1.1.0
2017/09/18 09:27:14     Finished clone github.com/gotestyourself/gotestyourself
2017/09/18 09:27:14 Dependencies downloaded. Download time: 1.0286058s
2017/09/18 09:27:14 Collecting all dependencies
2017/09/18 09:27:24 Clean vendor dir from unused packages
2017/09/18 09:27:24 Success
2017/09/18 09:27:24 Running time: 13.7501532s
jhoward@jhoward-z420:[~/go/src/github.com/docker/docker]: $ git status
On branch jjh/debuggcs
Your branch is up-to-date with 'msdocker/jjh/debuggcs'.
nothing to commit, working directory clean
jhoward@jhoward-z420:[~/go/src/github.com/docker/docker]: $ ls -l vendor/github.com/gotestyourself/gotestyourself/skip/
total 4
-rw-rw-rw- 1 jhoward jhoward 3413 Sep 18 09:27 skip.go
jhoward@jhoward-z420:[~/go/src/github.com/docker/docker]: $
Contributor

jhowardmsft commented Sep 18, 2017

Truly bizarre. vndr for me showed no difference - both a full run and a specific vndr of gotestyourself/gotestyourself v1.1.0. So manually fixed up....

jhoward@jhoward-z420:[~/go/src/github.com/docker/docker]: $ grep gotestyourself vendor.conf
github.com/gotestyourself/gotestyourself v1.1.0
jhoward@jhoward-z420:[~/go/src/github.com/docker/docker]: $ vndr github.com/gotestyourself/gotestyourself v1.1.0
2017/09/18 09:27:10 Collecting initial packages
2017/09/18 09:27:11 Download dependencies
2017/09/18 09:27:13     Clone github.com/gotestyourself/gotestyourself, revision v1.1.0
2017/09/18 09:27:14     Finished clone github.com/gotestyourself/gotestyourself
2017/09/18 09:27:14 Dependencies downloaded. Download time: 1.0286058s
2017/09/18 09:27:14 Collecting all dependencies
2017/09/18 09:27:24 Clean vendor dir from unused packages
2017/09/18 09:27:24 Success
2017/09/18 09:27:24 Running time: 13.7501532s
jhoward@jhoward-z420:[~/go/src/github.com/docker/docker]: $ git status
On branch jjh/debuggcs
Your branch is up-to-date with 'msdocker/jjh/debuggcs'.
nothing to commit, working directory clean
jhoward@jhoward-z420:[~/go/src/github.com/docker/docker]: $ ls -l vendor/github.com/gotestyourself/gotestyourself/skip/
total 4
-rw-rw-rw- 1 jhoward jhoward 3413 Sep 18 09:27 skip.go
jhoward@jhoward-z420:[~/go/src/github.com/docker/docker]: $
@jhowardmsft

This comment has been minimized.

Show comment
Hide comment
@jhowardmsft

jhowardmsft Sep 18, 2017

Contributor

CI vendoring passed. Removing failing CI label :)

Contributor

jhowardmsft commented Sep 18, 2017

CI vendoring passed. Removing failing CI label :)

@jhowardmsft

This comment has been minimized.

Show comment
Hide comment
@jhowardmsft

jhowardmsft Sep 18, 2017

Contributor

The janky failure doesn't look related - seems to be failing on multiple PRs.

If I'm reading it right, it's docker-py tests which are failing. @shin any ideas?

02:09:28.668  generated xml file: /go/src/github.com/docker/docker/bundles/test-docker-py/results.xml 
02:09:28.669 =========================== short test summary info ============================
02:09:28.669 XFAIL ../../../../../docker-py/tests/integration/api_container_test.py::CreateContainerTest::test_create_with_init_path
02:09:28.669   init-path removed in 17.05.0
02:09:28.669 XFAIL ../../../../../docker-py/tests/integration/api_container_test.py::CreateContainerTest::test_create_with_storage_opt
02:09:28.669   Not supported on most drivers
02:09:28.670 XFAIL ../../../../../docker-py/tests/integration/models_images_test.py::ImageCollectionTest::test_build_with_error
02:09:28.670   Engine 1.13 responds with status 500
02:09:28.670 SKIP [1] tests/integration/api_image_test.py:270: Doesn't work inside a container - FIXME
02:09:28.670 SKIP [1] tests/integration/api_service_test.py:287: API version is too low (< 1.30)
02:09:28.670 SKIP [1] tests/integration/api_network_test.py:454: API version is too low (< 1.29)
02:09:28.670 SKIP [1] /docker-py/tests/helpers.py:66: Feature requires Docker Engine experimental mode
02:09:28.671 SKIP [1] tests/integration/api_healthcheck_test.py:51: API version is too low (< 1.29)
02:09:28.671 SKIP [1] tests/integration/api_build_test.py:191: API version is too low (< 1.29)
02:09:28.671 SKIP [1] tests/integration/api_service_test.py:300: API version is too low (< 1.27)
02:09:28.671 SKIP [1] /docker-py/tests/integration/api_swarm_test.py:24: Test stalls the engine on 1.12.0
02:09:28.671 SKIP [1] tests/integration/models_services_test.py:84: Makes Swarm unstable?
02:09:28.671 ============== 244 passed, 9 skipped, 3 xfailed in 313.20 seconds ==============
02:09:28.700 ---> Making bundle: .integration-daemon-stop (in bundles/test-docker-py)
Contributor

jhowardmsft commented Sep 18, 2017

The janky failure doesn't look related - seems to be failing on multiple PRs.

If I'm reading it right, it's docker-py tests which are failing. @shin any ideas?

02:09:28.668  generated xml file: /go/src/github.com/docker/docker/bundles/test-docker-py/results.xml 
02:09:28.669 =========================== short test summary info ============================
02:09:28.669 XFAIL ../../../../../docker-py/tests/integration/api_container_test.py::CreateContainerTest::test_create_with_init_path
02:09:28.669   init-path removed in 17.05.0
02:09:28.669 XFAIL ../../../../../docker-py/tests/integration/api_container_test.py::CreateContainerTest::test_create_with_storage_opt
02:09:28.669   Not supported on most drivers
02:09:28.670 XFAIL ../../../../../docker-py/tests/integration/models_images_test.py::ImageCollectionTest::test_build_with_error
02:09:28.670   Engine 1.13 responds with status 500
02:09:28.670 SKIP [1] tests/integration/api_image_test.py:270: Doesn't work inside a container - FIXME
02:09:28.670 SKIP [1] tests/integration/api_service_test.py:287: API version is too low (< 1.30)
02:09:28.670 SKIP [1] tests/integration/api_network_test.py:454: API version is too low (< 1.29)
02:09:28.670 SKIP [1] /docker-py/tests/helpers.py:66: Feature requires Docker Engine experimental mode
02:09:28.671 SKIP [1] tests/integration/api_healthcheck_test.py:51: API version is too low (< 1.29)
02:09:28.671 SKIP [1] tests/integration/api_build_test.py:191: API version is too low (< 1.29)
02:09:28.671 SKIP [1] tests/integration/api_service_test.py:300: API version is too low (< 1.27)
02:09:28.671 SKIP [1] /docker-py/tests/integration/api_swarm_test.py:24: Test stalls the engine on 1.12.0
02:09:28.671 SKIP [1] tests/integration/models_services_test.py:84: Makes Swarm unstable?
02:09:28.671 ============== 244 passed, 9 skipped, 3 xfailed in 313.20 seconds ==============
02:09:28.700 ---> Making bundle: .integration-daemon-stop (in bundles/test-docker-py)
@jhowardmsft

This comment has been minimized.

Show comment
Hide comment
@jhowardmsft

jhowardmsft Sep 18, 2017

Contributor

Sorry @shin - I meant @shin-. ^^

Contributor

jhowardmsft commented Sep 18, 2017

Sorry @shin - I meant @shin-. ^^

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah Sep 18, 2017

Member

Wonder if there's a bug in vndr; there was an issue on one of the other LCOW PR's as well with the gotestyourself vendoring

Member

thaJeztah commented Sep 18, 2017

Wonder if there's a bug in vndr; there was an issue on one of the other LCOW PR's as well with the gotestyourself vendoring

@jhowardmsft

This comment has been minimized.

Show comment
Hide comment
@jhowardmsft

jhowardmsft Sep 19, 2017

Contributor

ping @simonferquel PTAL. Thanks!

Contributor

jhowardmsft commented Sep 19, 2017

ping @simonferquel PTAL. Thanks!

@simonferquel

LGTM

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah Sep 19, 2017

Member

😞 looks like we now have a missing dependency, but vendor check says it's ok?

09:10:29 integration/service/inspect_test.go:22:2:warning: undeclared name: skip (gosimple)
09:10:29 integration/service/inspect_test.go:15:2:warning: could not import github.com/gotestyourself/gotestyourself/skip (cannot find package "github.com/gotestyourself/gotestyourself/skip" in any of: (gosimple)

@dnephin vndr bug?

Member

thaJeztah commented Sep 19, 2017

😞 looks like we now have a missing dependency, but vendor check says it's ok?

09:10:29 integration/service/inspect_test.go:22:2:warning: undeclared name: skip (gosimple)
09:10:29 integration/service/inspect_test.go:15:2:warning: could not import github.com/gotestyourself/gotestyourself/skip (cannot find package "github.com/gotestyourself/gotestyourself/skip" in any of: (gosimple)

@dnephin vndr bug?

@dnephin

This comment has been minimized.

Show comment
Hide comment
@dnephin

dnephin Sep 19, 2017

Member

What's even more bizarre is that the powerpc and z builds actually PASS, even though the log claims that this package was removed. I see it running the TestInspect which uses this skip(). So something is very broken on those builds.

I'm not sure what's up with vndr but it works for everyone else on linux.

btw, the dependency is removed in this PR. It needs to be restored.

@jhowardmsft could you run vndr -verbose and paste the log somewhere?

Member

dnephin commented Sep 19, 2017

What's even more bizarre is that the powerpc and z builds actually PASS, even though the log claims that this package was removed. I see it running the TestInspect which uses this skip(). So something is very broken on those builds.

I'm not sure what's up with vndr but it works for everyone else on linux.

btw, the dependency is removed in this PR. It needs to be restored.

@jhowardmsft could you run vndr -verbose and paste the log somewhere?

@jhowardmsft

This comment has been minimized.

Show comment
Hide comment
@jhowardmsft

jhowardmsft Sep 19, 2017

Contributor

I installed an Ubuntu 17.04 instance and ran vndr from there. Let's see what happens this time first...

Contributor

jhowardmsft commented Sep 19, 2017

I installed an Ubuntu 17.04 instance and ran vndr from there. Let's see what happens this time first...

@dnephin

This comment has been minimized.

Show comment
Hide comment
@dnephin

dnephin Sep 19, 2017

Member

but vendor check says it's ok?

Right that's concerning too. I'll look into the old vendor job to see if the log has anything

Member

dnephin commented Sep 19, 2017

but vendor check says it's ok?

Right that's concerning too. I'll look into the old vendor job to see if the log has anything

Revendor Microsoft/opengcs @ v0.3.4
Signed-off-by: John Howard <jhoward@microsoft.com>
@jhowardmsft

This comment has been minimized.

Show comment
Hide comment
@jhowardmsft

jhowardmsft Sep 19, 2017

Contributor

That seemed OK, all jobs got to start running integration tests and vendor passed. So squashed the revendor commits and pushed update. Fingers crossed.

Contributor

jhowardmsft commented Sep 19, 2017

That seemed OK, all jobs got to start running integration tests and vendor passed. So squashed the revendor commits and pushed update. Fingers crossed.

@jhowardmsft

This comment has been minimized.

Show comment
Hide comment
@jhowardmsft

jhowardmsft Sep 19, 2017

Contributor

Yay, green!!!!

Contributor

jhowardmsft commented Sep 19, 2017

Yay, green!!!!

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah Sep 19, 2017

Member

whooooop!

Member

thaJeztah commented Sep 19, 2017

whooooop!

@thaJeztah thaJeztah merged commit 7cbbbb9 into moby:master Sep 19, 2017

7 checks passed

dco-signed All commits are signed
experimental Jenkins build Docker-PRs-experimental 36937 has succeeded
Details
janky Jenkins build Docker-PRs 45599 has succeeded
Details
powerpc Jenkins build Docker-PRs-powerpc 5983 has succeeded
Details
vendor Jenkins build Docker-PRs-vendor 3836 has succeeded
Details
windowsRS1 Jenkins build Docker-PRs-WoW-RS1 17183 has succeeded
Details
z Jenkins build Docker-PRs-s390x 5784 has succeeded
Details

@jhowardmsft jhowardmsft deleted the Microsoft:jjh/debuggcs branch Sep 19, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment