apiserver: command exec in pod randomly drops stdin #112834

pohly · 2022-10-03T19:15:20Z

What happened?

I was using

kubernetes/test/e2e/framework/exec_util.go

Lines 49 to 84 in 725993a

    
           // ExecWithOptions executes a command in the specified container, 
        
           // returning stdout, stderr and error. `options` allowed for 
        
           // additional parameters to be passed. 
        
           func (f *Framework) ExecWithOptions(options ExecOptions) (string, string, error) { 
        
           	if !options.Quiet { 
        
           		Logf("ExecWithOptions %+v", options) 
        
           	} 
        
           	config, err := LoadConfig() 
        
           	ExpectNoError(err, "failed to load restclient config") 
        
           	const tty = false 
        
           	Logf("ExecWithOptions: Clientset creation") 
        
           	req := f.ClientSet.CoreV1().RESTClient().Post(). 
        
           		Resource("pods"). 
        
           		Name(options.PodName). 
        
           		Namespace(options.Namespace). 
        
           		SubResource("exec"). 
        
           		Param("container", options.ContainerName) 
        
           	req.VersionedParams(&v1.PodExecOptions{ 
        
           		Container: options.ContainerName, 
        
           		Command:   options.Command, 
        
           		Stdin:     options.Stdin != nil, 
        
           		Stdout:    options.CaptureStdout, 
        
           		Stderr:    options.CaptureStderr, 
        
           		TTY:       tty, 
        
           	}, scheme.ParameterCodec) 
        
           	var stdout, stderr bytes.Buffer 
        
           	Logf("ExecWithOptions: execute(POST %s)", req.URL()) 
        
           	err = execute("POST", req.URL(), config, options.Stdin, &stdout, &stderr, tty) 
        
           	if options.PreserveWhitespace { 
        
           		return stdout.String(), stderr.String(), err 
        
           	} 
        
           	return strings.TrimSpace(stdout.String()), strings.TrimSpace(stderr.String()), err 
        
           }

(similar to kubectl exec -i) with dd as command and some data as stdin to create a file inside a container. This worked fine most of the time during local development. When used inside a Prow job, I ran into test flakes that could be traced down to dd returning after writing zero bytes although stdin was non-empty. No error was returned.

What did you expect to happen?

The data should be transferred reliably.

How can we reproduce it (as minimally and precisely as possible)?

Sorry, no test for this right now.

Anything else we need to know?

No response

Kubernetes version

462f412 (roughly current master)

Cloud provider

kind

OS version

No response

Install tools

kind

Container runtime (CRI) and version (if applicable)

containerd (from master)

Related plugins (CNI, CSI, ...) and versions (if applicable)

No response

The text was updated successfully, but these errors were encountered:

It turned out to be unreliable (see kubernetes#112834). Encoding the data inside the command as input for base64 is a workaround that is fine for small amounts of data. It becomes less efficient and/or unusable for large amounts.

neolit123 · 2022-10-08T11:17:15Z

/sig testing

aojea · 2022-10-10T23:33:41Z

maybe something breaking the spdy encapsulation?
it can be between e2e -- apiserver --- kubelet -- containerd

pohly · 2022-10-11T06:11:14Z

Yes, could be anywhere. Probably a good first step would be to write a stress test for this functionality.

aojea · 2022-10-11T08:54:35Z

yeah, it suffered of memory leaks in the past, I don't think that this feature has been stress tested

It turned out to be unreliable (see kubernetes#112834). Encoding the data inside the command as input for base64 is a workaround that is fine for small amounts of data. It becomes less efficient and/or unusable for large amounts.

k8s-triage-robot · 2023-01-09T08:57:23Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

pohly · 2023-01-09T09:15:16Z

/sig apimachinery

This is used by test/e2e/framework, but ultimately this is caused somewhere else.

/remove-lifecycle stale

k8s-ci-robot · 2023-01-09T09:15:18Z

@pohly: The label(s) sig/apimachinery cannot be applied, because the repository doesn't have them.

In response to this:

/sig apimachinery

This is used by test/e2e/framework, but ultimately this is caused somewhere else.

/remove-lifecycle stale

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

pohly · 2023-01-09T09:51:03Z

/sig api-machinery

cici37 · 2023-01-12T18:14:52Z

We are at api machinery bugscrub meeting.
Thanks for the tagging! Looks like the root cause hasn't been identified and test/e2e/storage/drivers/proxy/io.go could be used to reproduce?
@jpbetz Would you mind to take a look or help with redirecting to the right person? Thanks

pohly · 2023-01-13T07:58:52Z

I have asked a colleague to write a stress test for this. He won't be able to solve it, but it should be a start towards identifying the root cause.

/cc @hj-johannes-lee

aojea · 2023-01-13T23:10:11Z

the tests in pkg/kubelet/cri/streaming/server_test.go may serve as base

cici37 · 2023-01-24T21:22:02Z

/cc @hj-johannes-lee @pohly
Would you mind sharing if the stress test is planned or if there is any other plan on this? Thank you!

hj-johannes-lee · 2023-01-25T13:02:00Z

@cici37 Hi, I am working on it, but could not reproduce the exact problem.! We were supposed to have a meeting and discuss this matter, but we couldn't.
What I did until now is as follows:

Deploying a busybox pod.
Using ExecWithOptions with stdin option (here, data is read from a file that has a long enough contents) for creating a file inside a container.
Comparing the value of stdout read from the container to the value of data read from the file.
Looping the previous process so it will continue until the case is caught.

Let me discuss with @pohly asap and come with the "working" stress test.!

hj-johannes-lee · 2023-01-25T20:42:32Z

@cici37 @pohly
I just submitted a pr, but I know it should be improved in many ways.!

cici37 · 2023-02-14T21:20:47Z

/triage accepted

shawkins · 2023-03-06T21:54:11Z

We in the fabric8 java client are seeing similar problems fabric8io/kubernetes-client#4910 with exec stdin across a variety of our supported http clients. We're also seeing a problem the other direction on stdOut. Occasionally we'll see the web socket close being sent to before all of the stdOut data messages have been received. We are considering a workaround of waiting for the exit code message, but it is definitely not expected behavior to see additional messages after close.

aojea · 2023-03-06T23:57:37Z

websocket has a known issue #89899 (comment) and is reliable than the spdy implementation

shawkins · 2023-03-07T15:15:40Z

@aojea that known issue is fairly easy to workaround, at least for our internal usage scenarios. When you are done with stdIn, you send the web socket close. As noted in that issue it's not a great solution because you won't reliably see command completion - but the api server should process everything that was sent. Isn't SPDY deprecated?

The problem identified here is that the apiserver websocket seems to be the handling close out of order - when sending data it will process the incoming close prior to fully processing data messages it has already received, and on the send side it will send the close ahead of pending data messages.

aojea · 2023-03-07T19:24:33Z

Isn't SPDY deprecated?

#7452

but is the one used by kubectl exec/portforward 😄

There was one person trying to move on to websockets by default but it seems that is hard to keep the compatibility between versions #60140 (comment) and the ROI doesn't seem to be enough to try such complicated change

The problem identified here is that the apiserver websocket seems to be the handling close out of order - when sending data it will process the incoming close prior to fully processing data messages it has already received, and on the send side it will send the close ahead of pending data messages.

the close before the data is sent is similar to this #60140 (comment), but I'm just doing some quick correlations here , you should verify and double check those links

shawkins · 2023-03-09T13:30:27Z

the close before the data is sent is similar to this #60140 (comment), but I'm just doing some quick correlations here , you should verify and double check those links

My apologies. After reviewing these issues and the relevant net, apiserver, and kubernetes code, the issues that we are hitting are definitely within the fabric8 kubernetes client code. I'll get those addressed and see how often, if at all, we're actually hitting this or #60140

k8s-triage-robot · 2024-03-08T13:41:22Z

This issue has not been updated in over 1 year, and should be re-triaged.

You can:

Confirm that this issue is still relevant with /triage accepted (org members only)
Close this issue with /close

For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/

/remove-triage accepted

cici37 · 2024-04-11T16:47:40Z

/triage accepted

XciD · 2024-04-11T20:23:12Z

Hello,

Not sure it's the same issue. I don't succeed to use dd with stdin with kube exec.

echo 'test' > file

cat file | kubectl exec -i pod -- dd of=/home/user/file
0+0 records in
0+0 records out
0 bytes copied, 0.000927544 s, 0.0 kB/s

kubectl exec pod -- ls -lh /home/user/file
-rw-r--r-- 1 user user 0 Apr 11 22:19 /home/user/file

But running:

kubectl exec -it pod -- dd of=/home/user/file

And typing something with \d works

Also this works:

cat file | k exec -i pod -- bash -c 'cat -- | dd of=/tmp/file'

pohly added the kind/bug Categorizes issue or PR as related to a bug. label Oct 3, 2022

k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Oct 3, 2022

k8s-ci-robot added sig/testing Categorizes an issue or PR as relevant to SIG Testing. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Oct 8, 2022

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 9, 2023

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 9, 2023

k8s-ci-robot added the sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. label Jan 9, 2023

hj-johannes-lee mentioned this issue Jan 25, 2023

exec_util: add execWithOptions stress test #115323

Closed

pohly mentioned this issue Feb 13, 2023

Promote port forwarding e2e tests to conformance #112778

Closed

k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Feb 14, 2023

shawkins mentioned this issue Feb 23, 2023

Issues with upload / failures with PodIT fabric8io/kubernetes-client#4910

Closed

shawkins mentioned this issue Mar 8, 2023

Failures with PodIT.copyFile fabric8io/kubernetes-client#4923

Closed

pohly mentioned this issue Jan 22, 2024

apiserver: command exec drops individual characters #122913

Closed

k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. and removed triage/accepted Indicates an issue or PR is ready to be actively worked on. labels Mar 8, 2024

k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Apr 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

apiserver: command exec in pod randomly drops stdin #112834

apiserver: command exec in pod randomly drops stdin #112834

pohly commented Oct 3, 2022 •

edited

Loading

neolit123 commented Oct 8, 2022

aojea commented Oct 10, 2022

pohly commented Oct 11, 2022

aojea commented Oct 11, 2022

k8s-triage-robot commented Jan 9, 2023

pohly commented Jan 9, 2023

k8s-ci-robot commented Jan 9, 2023

pohly commented Jan 9, 2023

cici37 commented Jan 12, 2023

pohly commented Jan 13, 2023

aojea commented Jan 13, 2023

cici37 commented Jan 24, 2023

hj-johannes-lee commented Jan 25, 2023

hj-johannes-lee commented Jan 25, 2023

cici37 commented Feb 14, 2023

shawkins commented Mar 6, 2023 •

edited

Loading

aojea commented Mar 6, 2023

shawkins commented Mar 7, 2023 •

edited

Loading

aojea commented Mar 7, 2023

shawkins commented Mar 9, 2023

k8s-triage-robot commented Mar 8, 2024

cici37 commented Apr 11, 2024

XciD commented Apr 11, 2024 •

edited

Loading

apiserver: command exec in pod randomly drops stdin #112834

apiserver: command exec in pod randomly drops stdin #112834

Comments

pohly commented Oct 3, 2022 • edited Loading

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

Kubernetes version

Cloud provider

OS version

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

neolit123 commented Oct 8, 2022

aojea commented Oct 10, 2022

pohly commented Oct 11, 2022

aojea commented Oct 11, 2022

k8s-triage-robot commented Jan 9, 2023

pohly commented Jan 9, 2023

k8s-ci-robot commented Jan 9, 2023

pohly commented Jan 9, 2023

cici37 commented Jan 12, 2023

pohly commented Jan 13, 2023

aojea commented Jan 13, 2023

cici37 commented Jan 24, 2023

hj-johannes-lee commented Jan 25, 2023

hj-johannes-lee commented Jan 25, 2023

cici37 commented Feb 14, 2023

shawkins commented Mar 6, 2023 • edited Loading

aojea commented Mar 6, 2023

shawkins commented Mar 7, 2023 • edited Loading

aojea commented Mar 7, 2023

shawkins commented Mar 9, 2023

k8s-triage-robot commented Mar 8, 2024

cici37 commented Apr 11, 2024

XciD commented Apr 11, 2024 • edited Loading

pohly commented Oct 3, 2022 •

edited

Loading

shawkins commented Mar 6, 2023 •

edited

Loading

shawkins commented Mar 7, 2023 •

edited

Loading

XciD commented Apr 11, 2024 •

edited

Loading