Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nsenter: improve debug logging #2460

Closed
wants to merge 1 commit into from
Closed

nsenter: improve debug logging #2460

wants to merge 1 commit into from

Conversation

cyphar
Copy link
Member

@cyphar cyphar commented Jun 7, 2020

In order to make 'runc --debug' actually useful for debugging nsexec
bugs, provide information about all the internal operations when in
debug mode.

Signed-off-by: Aleksa Sarai asarai@suse.de

In order to make 'runc --debug' actually useful for debugging nsexec
bugs, provide information about all the internal operations when in
debug mode.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
@cyphar
Copy link
Member Author

cyphar commented Jun 7, 2020

For reference, this is now the information you can get with runc --debug:

DEBU[0000] nsexec[3746]: => nsexec container setup      
DEBU[0000] nsexec-0[3746]: ~> nsexec stage-0            
DEBU[0000] nsexec-0[3746]: spawn stage-1                
DEBU[0000] nsexec-0[3746]: -> stage-1 synchronisation loop 
DEBU[0000] nsexec-1[3753]: ~> nsexec stage-1            
DEBU[0000] nsexec-1[3753]: unshare remaining namespace (except cgroupns) 
DEBU[0000] nsexec-1[3753]: spawn stage-2                
DEBU[0000] nsexec-1[3753]: request stage-0 to forward stage-2 pid (3756) 
DEBU[0000] nsexec-0[3746]: stage-1 requested pid to be forwarded 
DEBU[0000] nsexec-2[1]: ~> nsexec stage-2               
DEBU[0000] nsexec-1[3753]: signal completion to stage-0 
DEBU[0000] nsexec-0[3746]: forward stage-1 (3753) and stage-2 (3756) pids to runc 
DEBU[0000] nsexec-1[3753]: <~ nsexec stage-1            
DEBU[0000] nsexec-0[3746]: stage-1 complete             
DEBU[0000] nsexec-0[3746]: <- stage-1 synchronisation loop 
DEBU[0000] nsexec-0[3746]: -> stage-2 synchronisation loop 
DEBU[0000] nsexec-0[3746]: signalling stage-2 to run    
DEBU[0000] nsexec-2[1]: signal completion to stage-0    
DEBU[0000] nsexec-0[3746]: stage-2 complete             
DEBU[0000] nsexec-0[3746]: <- stage-2 synchronisation loop 
DEBU[0000] nsexec-0[3746]: <~ nsexec stage-0            
DEBU[0000] nsexec-2[1]: <= nsexec container setup       
DEBU[0000] nsexec-2[1]: booting up go runtime ...

@AkihiroSuda
Copy link
Member

CI failing

not ok 5 runc create (rootless + limits + no cgrouppath + no permission) fails with informative error
# (in test file tests/integration/cgroups.bats, line 94)
#   `[[ ${lines[1]} == *"rootless needs no limits + no cgrouppath when no permission is granted for cgroups"* ]] || [[ ${lines[1]} == *"cannot set pids limit: container could not join or create cgroup"* ]]' failed
# runc list (status=0):
# ID          PID         STATUS      BUNDLE      CREATED     OWNER
# runc list (status=0):
# ID          PID         STATUS      BUNDLE      CREATED     OWNER
# runc list (status=0):
# ID          PID         STATUS      BUNDLE      CREATED     OWNER
# runc spec --rootless (status=0):
# 
# runc run -d --console-socket /tmp/console.sock test_cgroups_permissions (status=1):
# time="2020-06-07T15:41:37Z" level=error msg="failed to decode \"{\\\"level\\\":\\\"debug\\\", \\\"msg\\\": \\\"nsexec-0[2265]: update /proc/2266/uid_map to '0 1000 1\\n\" to json: invalid character '\\n' in string literal"
# time="2020-06-07T15:41:37Z" level=error msg="failed to decode \"'\\\"}\\n\" to json: invalid character '\\'' looking for beginning of value"
# time="2020-06-07T15:41:37Z" level=error msg="failed to decode \"{\\\"level\\\":\\\"debug\\\", \\\"msg\\\": \\\"nsexec-0[2265]: update /proc/2266/gid_map to '0 1000 1\\n\" to json: invalid character '\\n' in string literal"
# time="2020-06-07T15:41:37Z" level=error msg="failed to decode \"'\\\"}\\n\" to json: invalid character '\\'' looking for beginning of value"
# time="2020-06-07T15:41:37Z" level=warning msg="signal: killed"
# time="2020-06-07T15:41:37Z" level=error msg="container_linux.go:353: starting container process caused: process_linux.go:450: container init caused: process_linux.go:416: setting cgroup config for procHooks process caused: cannot set pids limit: container could not join or create cgroup"

@cyphar
Copy link
Member Author

cyphar commented Jun 8, 2020

Right, it's because we're generating JSON in C and the log contains newlines. I'll need to change the logging so that invalid-in-JSON characters get ASCII-escaped.

@kolyshkin
Copy link
Contributor

Did a rebase and json escaping in #2487

@kolyshkin
Copy link
Contributor

Using < and > in logging is somewhat problematic since logrus.JSONFormatter escapes them, e.g.

{"level":"debug","msg":"nsexec-1[512861]: \u003c~ nsexec stage-1","time":"2020-06-25T10:25:06-07:00"}
{"level":"debug","msg":"nsexec-0[512859]: stage-1 complete","time":"2020-06-25T10:25:06-07:00"}
{"level":"debug","msg":"nsexec-0[512859]: \u003c- stage-1 synchronisation loop","time":"2020-06-25T10:25:06-07:00"}
{"level":"debug","msg":"nsexec-0[512859]: -\u003e stage-2 synchronisation loop","time":"2020-06-25T10:25:06-07:00"}

@cyphar
Copy link
Member Author

cyphar commented Jun 25, 2020

\u003c in a JSON string is identical in meaning to <, so while the JSON output is a little fruity it still means the same thing.

@cyphar
Copy link
Member Author

cyphar commented Jun 25, 2020

I also want to sit down and properly switch us over to pkg/errors all throughout runc. I might actually sit down and switch this PR to doing that as well so we can properly debug some more esoteric issues (in recent years I've found myself modifying runc to add annotations to errors more than once).

@kolyshkin
Copy link
Contributor

\u003c in a JSON string is identical in meaning to <, so while the JSON output is a little fruity it still means the same thing.

@kolyshkin kolyshkin closed this Jun 29, 2020
@kolyshkin
Copy link
Contributor

kolyshkin commented Jul 13, 2020

Hmm I don't remember why I closed it, most probably just clicked the wrong button; sorry about that.

Feel free to reopen and pick up json escaping stuff from #2487. I was still unable to fix CI failures in debug.json though :( Update: fixed

@kolyshkin kolyshkin reopened this Jul 13, 2020
@kolyshkin kolyshkin mentioned this pull request Mar 6, 2021
@kolyshkin
Copy link
Contributor

Carrying in #2836

@kolyshkin kolyshkin closed this Mar 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants