Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/net/http2: Server-side processing delay on one request will block other unrelated requests #62722

Open
ghost opened this issue Sep 19, 2023 · 9 comments
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@ghost
Copy link

ghost commented Sep 19, 2023

Our application is experiencing unexpected stalls when uploading results back to a server. After some investigation determined that this is likely an implementation bug in HTTP/2 flow control. RFC 7540 section 2 page 5 explicitly states that:

Multiplexing of requests is achieved by having each HTTP request/response exchange associated with its own stream (Section 5). Streams are largely independent of each other, so a blocked or stalled request or response does not prevent progress on other streams.

But its actually very easy to stall the server, sometimes permanently. This bug has denial of service potential.

This issue has been reported before as #40816 and remains open.

What version of Go are you using (go version)?

go1.18.1

Does this issue reproduce with the latest release?

Don't know, reproducible in 1.19.

What did you do?

The program below demonstrates the problem. Three requests are sent by the same client to different URLs on the same server. One of the URLs has a hard-coded 5-second "processing" delay. Each request sends a 10MB dummy payload to the server. Upon executing the program, all three requests will stall for 5 seconds.

In case of #40816, the processing delay was a shared lock which resulted in a server-wide deadlock (denial of service potential).

You'll need to provide your own (self-signed) SSL certificates for the program to work.

package main

import (
	"crypto/tls"
	"fmt"
	"io"
	"log"
	"net/http"
	"os"
	"sync"
	"time"

	"golang.org/x/net/http2"
)

func main() {
	server := &http.Server{
		Addr: ":12345",
		Handler: http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
			log.Printf("receiving post to %s", r.URL.Path)
			if r.URL.Path == "/2" {
				time.Sleep(time.Second * 5)
			}
			_, err := io.Copy(io.Discard, r.Body)
			if err != nil {
				log.Fatal(err)
			}
			log.Printf("received post to %s", r.URL.Path)
		}),
	}
	go server.ListenAndServeTLS("public.crt", "private.key")
	client := &http.Client{
		Transport: &http2.Transport{
			TLSClientConfig: &tls.Config{
				InsecureSkipVerify: true,
			},
			DisableCompression: true,
			AllowHTTP:          false,
		},
	}
	time.Sleep(time.Second)
	var wg sync.WaitGroup
	wg.Add(3)
	go post_big_file(&wg, client, 1)
	go post_big_file(&wg, client, 2)
	go post_big_file(&wg, client, 3)
	wg.Wait()
}

func post_big_file(wg *sync.WaitGroup, client *http.Client, index int) {
	defer wg.Done()
	log.Printf("posting big file %d", index)
	file, err := os.Open("/dev/zero")
	if err != nil {
		log.Fatal(err)
	}
	defer file.Close()
	reader := io.LimitReader(file, 1024*1024*10)
	url := fmt.Sprintf("https://localhost:12345/%d", index)
	response, err := client.Post(url, "text/plain", reader)
	if err != nil {
		log.Fatal(err)
	}
	defer response.Body.Close()
	_, err = io.Copy(io.Discard, response.Body)
	if err != nil {
		log.Fatal(err)
	}
	log.Printf("big file %d posted", index)
}
@seankhliao seankhliao changed the title http2: Server-side processing delay on one request will block other unrelated requests x/net/http2: Server-side processing delay on one request will block other unrelated requests Sep 19, 2023
@thanm thanm added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Sep 25, 2023
@thanm
Copy link
Contributor

thanm commented Sep 25, 2023

@neild @tombergan per owners

@bsponge
Copy link

bsponge commented Oct 17, 2023

Ok, from what I was able to find the bug seems to look like this:
In the beginning, the data is sent from 3 clients through 3 streams but only data from 2 streams is read regularly (the remaining one is waiting for 5s in this case). The connection that holds these 3 streams has some capacity for the incoming data. The amount of data the client can send is defined by the Window Update frames sent from the server. These frames are sent each time a portion of data is read

n, err = b.pipe.Read(p)
from the request body.

b.conn.noteBodyReadFromHandler(b.stream, n, err)

|
V
case sc.bodyReadCh <- http2bodyReadMsg{st, n}:

|
V
https://github.com/golang/go/blob/7241fee9b06da568251617ce3a715fae3e9f2881/src/net/http/h2_bundle.go#L4784C4-L4784C4
|
V
sc.sendWindowUpdate(nil, n) // conn-level

While the data from one of the streams is not consumed, the inflow capacity of the connection shrinks. As a result, the last possible portion of data that can be sent through the connection is sent by the stream that is not being consumed by the server and the server is waiting for inflow capacity to grow again but it can only be done by the stream that handles the waiting handler. Finally when the waiting is done the data sent from the "waiting" stream is read and the server sends Window Update frames again.

@neild @tombergan

@bsponge
Copy link

bsponge commented Oct 17, 2023

It would be great to have some mechanism that would check whether the data sent by the stream is read. I'm afraid keeping some portion of the connection capacity reserved for each possible stream is neither optimal nor RFC compliant (not really sure about that). I'm just thinking out loud and I don't know what's the common approach in such cases in other languages.

@ghost
Copy link
Author

ghost commented Oct 18, 2023

Yes, per protocol specification the client is supposed to respect the window size set by the server. So I'm guessing since the server's receive window is full, the client ran out of "credit" and stops sending any further data. Since there is no more data for the server to process, everything stalls.

According to RFC, the WINDOW_UPDATE frame enables setting the window size for both the connection as a whole as well as per-stream. Thus the server should be able to adjust the connection and stream window sizes so that the client will be able to continue sending data for non-blocked streams.

According to your comment, it looks like the server isn't adjusting the window sizes properly. It should reduce the window size for blocked streams in order to prevent the client from sending any further data that will not be processed. Then, the connection window size should also be increased so that the client is able to send data for unblocked streams.

Alternatively, the server will need to buffer up until max window size for each blocked stream and increase the connection window size so that the client can send more data for unblocked streams.

@bsponge
Copy link

bsponge commented Oct 18, 2023

So what you propose is to increase the window size of the connection each time a new stream is created? In result the size of the connection window will be a sum of window sizes of all streams?

@ghost
Copy link
Author

ghost commented Oct 18, 2023

That's a blunt way of solving the problem, yes. But at least to increase the connection window size in the case when readers (on the server) are starving.

@bsponge
Copy link

bsponge commented Oct 18, 2023

Ok, I will try to come up with something

@bsponge
Copy link

bsponge commented Oct 21, 2023

I have rethought your idea.

If I understand the RFC correctly https://www.rfc-editor.org/rfc/rfc7540#section-6.9.3 then reducing the window size applies to all streams so other requests would also be affected. Alternatively, we could reduce the window size for all streams if one of them is getting filled up but I don't like it or maybe I missed something.

My proposal would be as follows:
If the inflow window sizes of a stream and the connection are 0 then we increase the window size of the connection by some value (maybe the default initial flow 65535?). Furthermore, we mark the stream so it decreases the window size by the previously mentioned value when the stream buffer is read by the server. I think we can assume that the stream is unblocked if the buffer is read and it might be better to bring the connection to the initial state.

@bsponge
Copy link

bsponge commented Oct 21, 2023

Well on the other hand if more than one stream gets blocked the connection window size would have to be increased a few times and the mentioned condition will not work out.

It would require keeping the sum of blocked streams and if it's equal to the max window size of the connection then we would know when to increase the window

@seankhliao seankhliao added this to the Unplanned milestone Jul 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

3 participants