Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upload large files gives timeout error sometimes #1818

Closed
moracabanas opened this issue Apr 9, 2022 · 13 comments · Fixed by #1827
Closed

Upload large files gives timeout error sometimes #1818

moracabanas opened this issue Apr 9, 2022 · 13 comments · Fixed by #1827

Comments

@moracabanas
Copy link

moracabanas commented Apr 9, 2022

I am running minio on Truenas, mc client works fine I uploaded 500Gb of photos at my full speed with no issues.

Now I am testing Console and getting upload errors.

Uploads starts normally

image

Some large uploads finished show red and pending.

image

Then waiting a couple of minutes some of them become green and OK. But others gives 500 error

image

ERROR:

message: "Put \"https://s3.example.com/test/%5BOne%20Pace%5D%5B1-7%5D%20Romance%20Dawn%20%5B1080p%5D/%5BOne%20Pace%5D%5B1%5D%20Romance%20Dawn%2001%20%5B1080p%5D%5BFB72C13F%5D.mkv\": net/http: timeout awaiting response headers"

@harshavardhana
Copy link
Member

Console uploads as a single stream, there should be no timeouts here AFAIK

@harshavardhana
Copy link
Member

Which version of MinIO is this @moracabanas ?

@moracabanas
Copy link
Author

moracabanas commented Apr 9, 2022

I am using RELEASE.2022-04-01T03-41-39Z on docker-compose and nginx as reverse proxy.

I also tested the next recommended nginx settings without luck

ignore_invalid_headers off;
client_max_body_size 0; # I know this is being applied as it fixed one headers issue using mc to upload large files.
proxy_buffering off;

When I check mc admin trace <minio alias> I can't see the [200 OK] s3.PutObject events until the file is write commited, but in all cases the file is there when It reached 100% and I refresh the page.

So in my understanding [200 OK] s3.PutObject as a callback is taking more time than ResponseHeaderTimeout which I guess is time.Minute as default in https://github.com/minio/minio-go/blob/a5859807301c980117f2ca0d9bfa1bfdb55fbe27/transport.go#L54

@harshavardhana
Copy link
Member

message: "Put "https://s3.example.com/test/%5BOne%20Pace%5D%5B1-7%5D%20Romance%20Dawn%20%5B1080p%5D/%5BOne%20Pace%5D%5B1%5D%20Romance%20Dawn%2001%20%5B1080p%5D%5BFB72C13F%5D.mkv\": net/http: timeout awaiting response headers"

Okay, this is because a single PUT operation takes a longer time than the set ResponseHeaderTimeout since you have a slow network. We need to simply remove this value or keep it high enough to cater to slow networks.

            ResponseHeaderTimeout: time.Minute,

@moracabanas
Copy link
Author

moracabanas commented Apr 9, 2022

This is the debug I could do about a single problematic download:

Tracing and Troubleshooting

  1. Select file to upload
  2. Upload begins at my network full speed. There is no mc admin trace about this yet
  3. Once file reach 100% on minio Console it logs this trace:
2022-04-09T22:02:56:000 [200 OK] s3.GetBucketLocation s3.example.com/test/?location=  192.168.1.1       264µs       ↑ 161 B ↓ 444 B
  1. One minute after this state, I get [200 OK] s3.PutObject as you can see on this log:
2022-04-09T22:03:23:000 [200 OK] s3.PutObject s3.example.com/test/PixelExperience_dipper-12.1-20220401-1927-OFFICIAL.zip 192.168.1.1       1m40.045418s  ↑ 1.7 GiB ↓ 325 B

But on the frontend Console I got errors

image

"Put "https://s3.example.com/test/PixelExperience_dipper-12.1-20220401-1927-OFFICIAL.zip": net/http: timeout awaiting response headers"

Once I refresh, the file is properly uploaded.

Questions

How is my network slow If I am uploading at 300Mbps stable?
Shouldn't it be uploaded already if it reaches 100% of upload state? There is something I am missunderstanding here.
Why is mc admin traze not trazing the upload until it reaches 100% of the streaming state?
Where could I adjust ResponseHeaderTimeout default value? It is hardcoded for 1 minute.

Sorry about the amount of questions and missunderstanding.

Thanks!

@dvaldivia
Copy link
Collaborator

bare in mind mc can leverage multi-part uploads, while the web-browser can't, this may affect the upload speed overall, perhaps we can rewrite the upload to behave like a multipart upload for these scenarios

@moracabanas
Copy link
Author

moracabanas commented Apr 10, 2022

Could I purpose a change on
https://github.com/minio/minio-go/blob/a5859807301c980117f2ca0d9bfa1bfdb55fbe27/transport.go#L54 ?
That way I could override this ENV variables on my deployment and fulfill my slow write commit setup usecase.
I suspect the web uploading implementation is failing in cases where uploads (single-part only available) streaming + write commit time exceeds ResponseHeaderTimeout for a whole streamed file.
This limits a proper event handling for minio Console in cases where you upload a file which takes more time for the streaming + write commit than ResponseHeaderTimeout

I think this is caused because the upload triggers a upload XHR (*1) which doesn't fire a PUT event (*3) on the server until the file gets streamed (*2) and then "write-commited" on the storage (*3). I show this behaviour on this picture:

image

I would do one of the next 4 things to fix this behaviour.

  1. Reimplementing web uploading so multipart uploading gets faster callbacks on slower write commit spinning rust setups.
  2. Wait the file to be streamed before firing an upload XHR event, so you only need to await write committing from the server but not the whole streaming+write committing.
  3. Add enough events between streaming and server s3.PutObject(write commit ended) so minio Console could get better callbacks from beginning streaming a file to write committing with the s3.PutObject event which fulfill the initial upload XHR.
  4. Let users increase the ResponseHeaderTimeout with a ENV variable so you can wait forever for the write commit callback (s3.PutObject) i.e:
...
@@ Please consider this is the first time I write a single line of Go 😅@@

var DefaultTransport = func(secure bool) (*http.Transport, error) {
	tr := &http.Transport{
		Proxy: http.ProxyFromEnvironment,
		DialContext: (&net.Dialer{
			Timeout:   30 * time.Second,
			KeepAlive: 30 * time.Second,
		}).DialContext,
		MaxIdleConns:          256,
		MaxIdleConnsPerHost:   16,
-	        ResponseHeaderTimeout: time.Minute,
-		IdleConnTimeout:       time.Minute,
+               ResponseHeaderTimeout: os.Getenv("RESPONSE_HEADER_TIMEOUT") || time.Minute,
+		IdleConnTimeout:       os.Getenv("IDLE_CONN_TIMEOUT") || time.Minute,
		TLSHandshakeTimeout:   10 * time.Second,
		ExpectContinueTimeout: 10 * time.Second,
		// Set this value so that the underlying transport round-tripper
		// doesn't try to auto decode the body of objects with
		// content-encoding set to `gzip`.
		//
		// Refer:
		//    https://golang.org/src/net/http/transport.go?h=roundTrip#L1843
		DisableCompression: true,
	}
...

Feel free to correct any misconception as I've never touched Go before.

Thanks you!

@harshavardhana
Copy link
Member

  • Reimplementing web uploading so multipart uploading gets faster callbacks on slower write commit spinning rust setups.

That will not work - multipart is expensive for browser UI. The correct fix is to not have ResponseHeaderTimeout for the PUT call here since it's undefined.

@dvaldivia
Copy link
Collaborator

I think that ResponseHeaderTimeout could be responsible for other timeouts that console suffers from when MinIO is replying with long responses, like object listing.

If we stage the files in the browser we could do multi-part upload, and stop/resume support for both uploads and downloads, which is great for large files @harshavardhana

@moracabanas
Copy link
Author

moracabanas commented Apr 10, 2022

Does staging files means you could implement an upload/listing solution where you can make chunks and get callbacks more frecuently on Console?
In that case you could handle chunk size so Console could be more responsive without bloating the browser with too much requests.

What about handling s3.PutObjectPart events with websockets? That way you could handle large uploads with play/pause compatibility on Console as well as less impact on the browser performance and better memory handling without overflowing the browser with tons of XHR. At a cost of harder inplementation. @dvaldivia

@dvaldivia
Copy link
Collaborator

I wouldn't use websockets to transfer large files, it'd be too chatty, but if we do the 64Mi chunks, we can be more efficient even over plain HTTP(s).

And yes, staging the files would allow us to pause/resume across sessions, so long as multiple STS sessions can commit a multi part put object which I believe it's possible.

@harshavardhana
Copy link
Member

If we stage the files in the browser we could do multi-part upload, and stop/resume support for both uploads and downloads, which is great for large files @harshavardhana

multipart is going to consume a lot of memory - staging files is not correct. The current implementation is fine, we just have to remove ResponseHeaderTimeout thats all.

That is what mc does

                                tr := &http.Transport{
                                        Proxy: http.ProxyFromEnvironment,
                                        DialContext: (&net.Dialer{
                                                Timeout:   10 * time.Second,
                                                KeepAlive: 15 * time.Second,
                                        }).DialContext,
                                        MaxIdleConnsPerHost:   256,
                                        IdleConnTimeout:       90 * time.Second,
                                        TLSHandshakeTimeout:   10 * time.Second,
                                        ExpectContinueTimeout: 10 * time.Second,
                                        // Set this value so that the underlying transport round-tripper
                                        // doesn't try to auto decode the body of objects with
                                        // content-encoding set to `gzip`.
                                        //
                                        // Refer:
                                        //    https://golang.org/src/net/http/transport.go?h=roundTrip#L1843
                                        DisableCompression: true,
                                }

@harshavardhana
Copy link
Member

For cancelable calls, there should be top-level context as needed by the caller.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants