Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/http hangs (timeouts) since 9.2.7.0 #6078

Closed
kares opened this issue Feb 18, 2020 · 50 comments
Closed

net/http hangs (timeouts) since 9.2.7.0 #6078

kares opened this issue Feb 18, 2020 · 50 comments

Comments

@kares
Copy link
Member

kares commented Feb 18, 2020

the issue was reported at MinIO's tracker but it seems to be a JRuby bug (given that it also occurs against other similar S3-like backends e.g. Ceph). esp. since MRI works as well as JRuby 9.2.6.0 (JRuby 9.2.7.0 seems to be the first to exhibit problems)

reproducer is a bit demanding, one needs MinIO setup + aws gem (might try dissecting later) :

gem install aws-sdk -v 2.11.436
docker run -p 9000:9000 minio/minio server /data

visit http://localhost:9000/minio/ and create a bucket: 'test'

require 'aws-sdk'

file_path = ARGV[0]
bucket = ARGV[1] || 'test'

admin = password = 'minioadmin'

aws_options_hash = {
  endpoint: 'http://localhost:9000',
  access_key_id: admin,
  secret_access_key: password,
  region: 'us-east-1',
  force_path_style: true,
  http_wire_trace: true
}

bucket = Aws::S3::Bucket.new(bucket, aws_options_hash)

obj = bucket.object(File.basename(file_path))
obj.upload_file(file_path, {})

NOTE: the issue reproduces with bigger files that are split into multipart upload, reproduction is not 100% (one needs 2-3 tries) :

jruby -v aws_minio.rb lib/target/jruby-stdlib-9.2.10.0-SNAPSHOT.jar test

initially thought this might be related to a stdlib update but there was none in 9.2.7.0
(only in 9.2.6.0 - which is working: 7b964fd#diff-07931640fbf96ee2b8fc616794b0a314)

also noticed the net/http patches aws does are not there in their 3.x line, but I tried disabling their patching and the issue still occurs.

connection(s) will tend to hang and raise timeouts (aws seems to do auto-retry by default) :

read 0 bytes
Conn keep-alive

Conn close because of error Net::ReadTimeout
Conn close because of error Net::ReadTimeout
Conn close because of error Net::ReadTimeout
Conn close because of error Net::ReadTimeout
opening connection to localhost:9000...
opening connection to localhost:9000...

here's a failure stack using JRuby 9.2.9.0 :

```
2020-02-18T15:49:55.472+01:00 [main] INFO TraceType : Exception raised: Net::ReadTimeout : Net::ReadTimeout
2020-02-18T15:49:55.473+01:00 [main] INFO TraceType : Backtrace generated:
              rbuf_fill at /opt/local/rvm/rubies/jruby-9.2.9.0/lib/ruby/stdlib/net/protocol.rb:181
              readuntil at /opt/local/rvm/rubies/jruby-9.2.9.0/lib/ruby/stdlib/net/protocol.rb:157
               readline at /opt/local/rvm/rubies/jruby-9.2.9.0/lib/ruby/stdlib/net/protocol.rb:167
       read_status_line at /opt/local/rvm/rubies/jruby-9.2.9.0/lib/ruby/stdlib/net/http/response.rb:40
               read_new at /opt/local/rvm/rubies/jruby-9.2.9.0/lib/ruby/stdlib/net/http/response.rb:29
  new_transport_request at /opt/local/rvm/gems/jruby-9.2.9.0@logstash/gems/aws-sdk-core-2.11.436/lib/seahorse/client/net_http/patches.rb:29
                  catch at org/jruby/RubyKernel.java:1193
  new_transport_request at /opt/local/rvm/gems/jruby-9.2.9.0@logstash/gems/aws-sdk-core-2.11.436/lib/seahorse/client/net_http/patches.rb:26
                request at /opt/local/rvm/rubies/jruby-9.2.9.0/lib/ruby/stdlib/net/http.rb:1474
                request at /opt/local/rvm/gems/jruby-9.2.9.0@logstash/gems/aws-sdk-core-2.11.436/lib/seahorse/client/net_http/connection_pool.rb:330
               transmit at /opt/local/rvm/gems/jruby-9.2.9.0@logstash/gems/aws-sdk-core-2.11.436/lib/seahorse/client/net_http/handler.rb:72
                session at /opt/local/rvm/gems/jruby-9.2.9.0@logstash/gems/aws-sdk-core-2.11.436/lib/seahorse/client/net_http/handler.rb:121
            session_for at /opt/local/rvm/gems/jruby-9.2.9.0@logstash/gems/aws-sdk-core-2.11.436/lib/seahorse/client/net_http/connection_pool.rb:96
                session at /opt/local/rvm/gems/jruby-9.2.9.0@logstash/gems/aws-sdk-core-2.11.436/lib/seahorse/client/net_http/handler.rb:119
               transmit at /opt/local/rvm/gems/jruby-9.2.9.0@logstash/gems/aws-sdk-core-2.11.436/lib/seahorse/client/net_http/handler.rb:71
                   call at /opt/local/rvm/gems/jruby-9.2.9.0@logstash/gems/aws-sdk-core-2.11.436/lib/seahorse/client/net_http/handler.rb:45
                   call at /opt/local/rvm/gems/jruby-9.2.9.0@logstash/gems/aws-sdk-core-2.11.436/lib/seahorse/client/plugins/content_length.rb:12
                   call at /opt/local/rvm/gems/jruby-9.2.9.0@logstash/gems/aws-sdk-core-2.11.436/lib/aws-sdk-core/plugins/s3_request_signer.rb:88
                   call at /opt/local/rvm/gems/jruby-9.2.9.0@logstash/gems/aws-sdk-core-2.11.436/lib/aws-sdk-core/plugins/s3_request_signer.rb:23
                   call at /opt/local/rvm/gems/jruby-9.2.9.0@logstash/gems/aws-sdk-core-2.11.436/lib/aws-sdk-core/plugins/s3_host_id.rb:14
                   call at /opt/local/rvm/gems/jruby-9.2.9.0@logstash/gems/aws-sdk-core-2.11.436/lib/aws-sdk-core/xml/error_handler.rb:8
                   call at /opt/local/rvm/gems/jruby-9.2.9.0@logstash/gems/aws-sdk-core-2.11.436/lib/aws-sdk-core/plugins/helpful_socket_errors.rb:10
                   call at /opt/local/rvm/gems/jruby-9.2.9.0@logstash/gems/aws-sdk-core-2.11.436/lib/aws-sdk-core/plugins/s3_request_signer.rb:65
                   call at /opt/local/rvm/gems/jruby-9.2.9.0@logstash/gems/aws-sdk-core-2.11.436/lib/aws-sdk-core/plugins/s3_redirects.rb:15
                   call at /opt/local/rvm/gems/jruby-9.2.9.0@logstash/gems/aws-sdk-core-2.11.436/lib/aws-sdk-core/plugins/retry_errors.rb:108
                   call at /opt/local/rvm/gems/jruby-9.2.9.0@logstash/gems/aws-sdk-core-2.11.436/lib/aws-sdk-core/plugins/s3_dualstack.rb:32
                   call at /opt/local/rvm/gems/jruby-9.2.9.0@logstash/gems/aws-sdk-core-2.11.436/lib/aws-sdk-core/plugins/s3_accelerate.rb:49
                   call at /opt/local/rvm/gems/jruby-9.2.9.0@logstash/gems/aws-sdk-core-2.11.436/lib/aws-sdk-core/plugins/s3_md5s.rb:31
                   call at /opt/local/rvm/gems/jruby-9.2.9.0@logstash/gems/aws-sdk-core-2.11.436/lib/aws-sdk-core/plugins/s3_iad_regional_endpoint.rb:31
                   call at /opt/local/rvm/gems/jruby-9.2.9.0@logstash/gems/aws-sdk-core-2.11.436/lib/aws-sdk-core/plugins/s3_expect_100_continue.rb:21
                   call at /opt/local/rvm/gems/jruby-9.2.9.0@logstash/gems/aws-sdk-core-2.11.436/lib/aws-sdk-core/plugins/s3_bucket_name_restrictions.rb:12
                   call at /opt/local/rvm/gems/jruby-9.2.9.0@logstash/gems/aws-sdk-core-2.11.436/lib/aws-sdk-core/rest/handler.rb:7
                   call at /opt/local/rvm/gems/jruby-9.2.9.0@logstash/gems/aws-sdk-core-2.11.436/lib/aws-sdk-core/plugins/user_agent.rb:12
                   call at /opt/local/rvm/gems/jruby-9.2.9.0@logstash/gems/aws-sdk-core-2.11.436/lib/seahorse/client/plugins/endpoint.rb:41
                   call at /opt/local/rvm/gems/jruby-9.2.9.0@logstash/gems/aws-sdk-core-2.11.436/lib/aws-sdk-core/plugins/param_validator.rb:21
                   call at /opt/local/rvm/gems/jruby-9.2.9.0@logstash/gems/aws-sdk-core-2.11.436/lib/seahorse/client/plugins/raise_response_errors.rb:14
                   call at /opt/local/rvm/gems/jruby-9.2.9.0@logstash/gems/aws-sdk-core-2.11.436/lib/aws-sdk-core/plugins/s3_sse_cpk.rb:19
                   call at /opt/local/rvm/gems/jruby-9.2.9.0@logstash/gems/aws-sdk-core-2.11.436/lib/aws-sdk-core/plugins/s3_dualstack.rb:24
                   call at /opt/local/rvm/gems/jruby-9.2.9.0@logstash/gems/aws-sdk-core-2.11.436/lib/aws-sdk-core/plugins/s3_accelerate.rb:34
                   call at /opt/local/rvm/gems/jruby-9.2.9.0@logstash/gems/aws-sdk-core-2.11.436/lib/aws-sdk-core/plugins/jsonvalue_converter.rb:20
                   call at /opt/local/rvm/gems/jruby-9.2.9.0@logstash/gems/aws-sdk-core-2.11.436/lib/aws-sdk-core/plugins/idempotency_token.rb:18
                   call at /opt/local/rvm/gems/jruby-9.2.9.0@logstash/gems/aws-sdk-core-2.11.436/lib/aws-sdk-core/plugins/param_converter.rb:20
                   call at /opt/local/rvm/gems/jruby-9.2.9.0@logstash/gems/aws-sdk-core-2.11.436/lib/seahorse/client/plugins/response_target.rb:21
           send_request at /opt/local/rvm/gems/jruby-9.2.9.0@logstash/gems/aws-sdk-core-2.11.436/lib/seahorse/client/request.rb:70
             put_object at /opt/local/rvm/gems/jruby-9.2.9.0@logstash/gems/aws-sdk-core-2.11.436/lib/seahorse/client/base.rb:207
             put_object at /opt/local/rvm/gems/jruby-9.2.9.0@logstash/gems/aws-sdk-resources-2.11.436/lib/aws-sdk-resources/services/s3/file_uploader.rb:42
              open_file at /opt/local/rvm/gems/jruby-9.2.9.0@logstash/gems/aws-sdk-resources-2.11.436/lib/aws-sdk-resources/services/s3/file_uploader.rb:49
             put_object at /opt/local/rvm/gems/jruby-9.2.9.0@logstash/gems/aws-sdk-resources-2.11.436/lib/aws-sdk-resources/services/s3/file_uploader.rb:41
                 upload at /opt/local/rvm/gems/jruby-9.2.9.0@logstash/gems/aws-sdk-resources-2.11.436/lib/aws-sdk-resources/services/s3/file_uploader.rb:34
            upload_file at /opt/local/rvm/gems/jruby-9.2.9.0@logstash/gems/aws-sdk-resources-2.11.436/lib/aws-sdk-resources/services/s3/object.rb:252
```
@kvmuralidhar
Copy link

kvmuralidhar commented Feb 25, 2020

Hi there,

Anyone had a chance to review this request ?

Thanks
Murali

@kvmuralidhar
Copy link

kvmuralidhar commented Feb 27, 2020

Hi There,

I tried the new version of jruby (9.2.10.0) and it's timing out as well.

Thanks
Murali

@headius
Copy link
Member

headius commented Mar 2, 2020

Investigating.

@headius
Copy link
Member

headius commented Mar 2, 2020

I was not able to run this successfully on 9.2.6.0, so either I'm doing something wrong or the issue regressed earlier than that.

[] ~/projects/jruby $ jruby aws_minio.rb lib/jruby.jar 
Ignoring jruby-launcher-1.1.10-java because its extensions are not built. Try: gem pristine jruby-launcher --version 1.1.10
opening connection to localhost:9000...
opened
<- "PUT /test/jruby.jar HTTP/1.1\r\nContent-Type: \r\nAccept-Encoding: \r\nUser-Agent: aws-sdk-ruby2/2.11.436 jruby/2.5.3 java resources\r\nExpect: 100-continue\r\nContent-Md5: x6WwLblKtE+02JyD0Q8MRQ==\r\nX-Amz-Date: 20200302T193300Z\r\nHost: localhost:9000\r\nX-Amz-Content-Sha256: 8b35d9b59dac47f5e65d0930ad9512c99c8ea7b358de44067ce6177a2f1d37f1\r\nAuthorization: AWS4-HMAC-SHA256 Credential=minioadmin/20200302/us-east-1/s3/aws4_request, SignedHeaders=content-md5;host;x-amz-content-sha256;x-amz-date, Signature=be9c44b11191f0f893437bd773f26294204b2e2b89d81ff0d64e4d3c6deb2fe2\r\nContent-Length: 14999593\r\nAccept: */*\r\n\r\n"
-> "HTTP/1.1 100 Continue\r\n"
-> "\r\n"
Conn close because of error Net::ReadTimeout
Conn close because of error Net::ReadTimeout

@headius
Copy link
Member

headius commented Mar 2, 2020

Ok I've also tested 9.2.5.0, 9.2.4.0, 9.2.0.0, and 9.1.0.0 and they all display the same timeouts.

Did this ever work?

@kvmuralidhar
Copy link

kvmuralidhar commented Mar 2, 2020

Hi Charles,

I just tried with jruby version 9.2.6.0 and got the same error. It works with jruby version 9.1.13.0.

Thanks
Murali

@headius
Copy link
Member

headius commented Mar 2, 2020

@kvmuralidhar It still times out for me on 9.1.13.0. Perhaps I'm doing something wrong? I believe I followed @kares instructions correctly, and CRuby does appear to upload successfully.

It fails consistently, as well... I have not had to retry to get it to fail.

I am on Darwin, on OpenJDK 8 (AdoptOpenJDK build).

@kvmuralidhar
Copy link

kvmuralidhar commented Mar 2, 2020

Hi Charles,

JRuby version is => jruby 9.1.13.0 (2.3.3) 2017-09-06 8e1c115 OpenJDK 64-Bit Server VM 25.242-b08 on 1.8.0_242-b08 +jit [linux-x86_64]

I am using similar code as Kares to upload a file. With jruby 9.2.6.0 and higher, it consistently fails after three tries.

#!/usr/bin/env ruby
require 'aws-sdk'
#
Aws.config.update(
    endpoint: 'http://nceph.dest.com:80',
    access_key_id: 'admin',
    secret_access_key: 'xxxxxx',
    region: 'ceph-region',
    force_path_style: true,
    http_wire_trace: true
)

s3 = Aws::S3::Resource.new
obj = s3.bucket('lstest').object('ls.s3.8d2e7a16-0280-43ee-8084-bdb821184bad.2020-02-19T19.39.part0.txt.gz')
obj.upload_file('/u/users/elastic/ls.s3.8d2e7a16-0280-43ee-8084-bdb821184bad.2020-02-19T19.39.part0.txt.gz')

Thanks
Murali

@headius
Copy link
Member

headius commented Mar 2, 2020

Thank you for the information. I am still unable to get this to work on any version of JRuby as far back as 9.1.0.0.

FYI if you want to mark multiple lines as a code region, you must put ``` on its own line before and after those lines. I have fixed your comment.

@kvmuralidhar
Copy link

kvmuralidhar commented Mar 2, 2020

Hi Charles,

Thank you for fixing my comment. I am uploading files to CEPH storage with an s3 compatible service. @kares , do you have any ideas ?

Thanks
Murali

@kvmuralidhar
Copy link

kvmuralidhar commented Mar 3, 2020

Hi Charles,

I setup a minio server locally and was able to upload the file with jruby version 9.1.13.0. Did run into issues with jruby version 9.2.6.0. This is what I did to setup minio server locally.

docker pull minio/minio
 docker run -p 9000:9000 -e "MINIO_ACCESS_KEY=admin" -e "MINIO_SECRET_KEY=admin" -v /mnt/data/logstash-working-dir/data:/data -v /mnt/data/logstash-working-dir/minio/config:/root/.minio minio/minio server /data

-----------------------
Script to upload the file
-------------------------

cd jruby/bin
./jruby -S gem install aws-sdk -v 2.11.453
export LD_LIBRARY_PATH=/jruby/lib
jruby/bin/jruby --debug -J-Djava.library.path=$LD_LIBRARY_PATH  /u/users/elastic/test.rb

Thanks
Murali

@Goltergaul
Copy link

Goltergaul commented Mar 4, 2020

i also see this issue with jruby-9.2.9.0, it started after an upgrade from jruby-9.1.10.0

@Goltergaul
Copy link

Goltergaul commented Mar 4, 2020

i believe it might have something to do with the file itself. i inspected the debug output of mri and jruby-9.2.9.0 and the headers and urls it uses for the multipart upload do not differ. So i'd assume its the body that might cause a problem? For me it works to reproduce the issue with this file of random bytes i generated: https://easyupload.io/5au2w3

@muralivenkataraman
Copy link

muralivenkataraman commented Mar 4, 2020

@Goltergaul, I am able to upload the same file (.jar or .gz) with the older version of jruby (9.1.13.0) but just not any version > 9.2.6.0.

@Goltergaul
Copy link

Goltergaul commented Mar 5, 2020

yes thats also what i observed (works with jruby-9.1.10.0, but not with 9.2.9.0) i also tried 9.2.11.0, does also not work

@kvmuralidhar
Copy link

kvmuralidhar commented Mar 5, 2020

Hi Charles,

Greetings. Have you had a chance to work on this issue ?

Thanks
Murali

@headius
Copy link
Member

headius commented Mar 5, 2020

@kvmuralidhar Not since yesteday, no!

I will try again to find a working version tomorrow. There may be a platform difference (I am on MacOS).

@kvmuralidhar
Copy link

kvmuralidhar commented Mar 12, 2020

Hi Charles,

Greetings. Were you able to find a working version ?

Thanks
Murali

@headius
Copy link
Member

headius commented Mar 17, 2020

I am still unable to reproduce this working.

I am able to upload the same file (.jar or .gz) with the older version of jruby (9.1.13.0)

[] ~/projects/jruby $ jruby -v
jruby 9.1.13.0 (2.3.3) 2017-09-06 8e1c115 OpenJDK 64-Bit Server VM 25.222-b10 on 1.8.0_222-b10 +jit [darwin-x86_64]

[] ~/projects/jruby $ jruby aws_minio.rb lib/jruby.jar 
opening connection to localhost:9000...
opened
<- "PUT /test/jruby.jar HTTP/1.1\r\nContent-Type: \r\nAccept-Encoding: \r\nUser-Agent: aws-sdk-ruby2/2.11.436 jruby/2.3.3 java resources\r\nExpect: 100-continue\r\nContent-Md5: ytfj1pDCAko1Iaj71HZ47w==\r\nX-Amz-Date: 20200317T203542Z\r\nHost: localhost:9000\r\nX-Amz-Content-Sha256: 77c59154d133e7472d73a7e523375dc1603e60f8a992cb8e4563fa30c4c3c7f1\r\nAuthorization: AWS4-HMAC-SHA256 Credential=minioadmin/20200317/us-east-1/s3/aws4_request, SignedHeaders=content-md5;host;x-amz-content-sha256;x-amz-date, Signature=cfee5208fa3694fb741cb7de7c24e2c94b24caab5873ef307a3eeb88b832384f\r\nContent-Length: 15369310\r\nAccept: */*\r\n\r\n"
-> "HTTP/1.1 100 Continue\r\n"
-> "\r\n"
Conn close because of error Net::ReadTimeout
Conn close because of error Net::ReadTimeout
opening connection to localhost:9000...
opened

yes thats also what i observed (works with jruby-9.1.10.0, but not with 9.2.9.0)

[] ~/projects/jruby $ jruby -v
jruby 9.1.10.0 (2.3.3) 2017-05-25 b09c48a OpenJDK 64-Bit Server VM 25.222-b10 on 1.8.0_222-b10 +jit [darwin-x86_64]

[] ~/projects/jruby $ jruby aws_minio.rb lib/jruby.jar 
opening connection to localhost:9000...
opened
<- "PUT /test/jruby.jar HTTP/1.1\r\nContent-Type: \r\nAccept-Encoding: \r\nUser-Agent: aws-sdk-ruby2/2.11.436 jruby/2.3.3 java resources\r\nExpect: 100-continue\r\nContent-Md5: ytfj1pDCAko1Iaj71HZ47w==\r\nX-Amz-Date: 20200317T204503Z\r\nHost: localhost:9000\r\nX-Amz-Content-Sha256: 77c59154d133e7472d73a7e523375dc1603e60f8a992cb8e4563fa30c4c3c7f1\r\nAuthorization: AWS4-HMAC-SHA256 Credential=minioadmin/20200317/us-east-1/s3/aws4_request, SignedHeaders=content-md5;host;x-amz-content-sha256;x-amz-date, Signature=32e149ab54ab096fc6a19d0c5d9356c9f12a24ff884a297b29dadd4d23338d17\r\nContent-Length: 15369310\r\nAccept: */*\r\n\r\n"
-> "HTTP/1.1 100 Continue\r\n"
-> "\r\n"
Conn close because of error Net::ReadTimeout
Conn close because of error Net::ReadTimeout
opening connection to localhost:9000...
opened

Help me out folks, I'm not sure what I'm doing wrong. I have not been able to get the example to work successfully on any JRuby 9k version.

@headius
Copy link
Member

headius commented Mar 17, 2020

Maybe someone can prove to me it works by pasting a session based on the original report? I'm following the original instructions (on Darwin).

@kvmuralidhar
Copy link

kvmuralidhar commented Mar 18, 2020

Hi Charles,

Greetings. I do not have access to Darwin so not sure if I could help there. Just wanted to see if you are able to upload a file to the minio server using curl statement ?

Thanks
Murali

@muralivenkataraman
Copy link

muralivenkataraman commented Mar 28, 2020

Hi Charles,

We are kind of on a stand still because of this issue. How can I help ?

Thanks
Murali

@Adithya-copart
Copy link

Adithya-copart commented Mar 31, 2020

@headius I was able to follow the instructions and reproduce a failure in 9.2.9.0.
It is successful in 9.1.10.0.

Eliminating the threads in aws-sdk and adding a break statement after the first multipart request, here are the differences:

9.2.9.0

<- "PUT /test/test.pdf?partNumber=1&uploadId=20810512-669d-44f1-8077-387019de8d8b HTTP/1.1\r\nContent-Type: \r\nAccept-Encoding: \r\nUser-Agent: aws-sdk-ruby2/2.11.436 jruby/2.5.7 java resources\r\nExpect: 100-continue\r\nContent-Md5: 4lpdKzOF5sdEdPXDpDzo/Q==\r\nX-Amz-Date: 20200331T001405Z\r\nHost: localhost:9000\r\nX-Amz-Content-Sha256: 8ae2921ed724d13fc04151b87b9667195c8d59570142a7eefca603b2cea3b174\r\nAuthorization: AWS4-HMAC-SHA256 Credential=minioadmin/20200331/us-east-1/s3/aws4_request, SignedHeaders=content-md5;host;x-amz-content-sha256;x-amz-date, Signature=ba3ef026342f924b313417814a6b2193feebee485d776bfaaae03b7158c79f34\r\nContent-Length: 5242880\r\nAccept: */*\r\n\r\n"
-> "HTTP/1.1 100 Continue\r\n"
-> "\r\n"

# Raises a read timeout here.

9.1.10.0

<- "PUT /test/test.pdf?partNumber=1&uploadId=86a81c86-2124-47c5-92c0-debcd5c91ab0 HTTP/1.1\r\nContent-Type: \r\nAccept-Encoding: \r\nUser-Agent: aws-sdk-ruby2/2.11.436 jruby/2.3.3 java resources\r\nExpect: 100-continue\r\nContent-Md5: 4lpdKzOF5sdEdPXDpDzo/Q==\r\nX-Amz-Date: 20200331T001018Z\r\nHost: localhost:9000\r\nX-Amz-Content-Sha256: 8ae2921ed724d13fc04151b87b9667195c8d59570142a7eefca603b2cea3b174\r\nAuthorization: AWS4-HMAC-SHA256 Credential=minioadmin/20200331/us-east-1/s3/aws4_request, SignedHeaders=content-md5;host;x-amz-content-sha256;x-amz-date, Signature=988b9814b154eccd26fe55bda23e45ebcc8af455e7fdc167da4efe8ab2af5029\r\nContent-Length: 5242880\r\nAccept: */*\r\n\r\n"
-> "HTTP/1.1 100 Continue\r\n"
-> "\r\n"

-> "HTTP/1.1 200 OK\r\n"
-> "Accept-Ranges: bytes\r\n"
-> "Content-Length: 0\r\n"
-> "Content-Security-Policy: block-all-mixed-content\r\n"
-> "ETag: \"e25a5d2b3385e6c74474f5c3a43ce8fd\"\r\n"
-> "Server: MinIO/RELEASE.2020-03-25T07-03-04Z\r\n"
-> "Vary: Origin\r\n"
-> "X-Amz-Request-Id: 16013B00CE107B21\r\n"
-> "X-Xss-Protection: 1; mode=block\r\n"
-> "Date: Tue, 31 Mar 2020 00:10:18 GMT\r\n"
-> "\r\n"
reading 0 bytes...
-> ""
read 0 bytes
Conn keep-alive

The subsequent 200 response with no body does not show up in 9.2.9.0. It is related to the following request-response in minio.

localhost PUT /test/queued_20200211114951_0002.pdf?partNumber=1&uploadId=f634541e-97a7-441e-92b5-f68ca3412528
localhost Host: localhost:9000
localhost Authorization: AWS4-HMAC-SHA256 Credential=minioadmin/20200331/us-east-1/s3/aws4_request, SignedHeaders=content-md5;host;x-amz-content-sha256;x-amz-date, Signature=b68727b07dabc7ec0ac54b9c6855da9cf790aef67b3cd7bc5a6a8e36cf3ab7de
localhost Content-Length: 5242880
localhost Content-Type: 
localhost X-Amz-Content-Sha256: 8ae2921ed724d13fc04151b87b9667195c8d59570142a7eefca603b2cea3b174
localhost Accept: */*
localhost Content-Md5: 4lpdKzOF5sdEdPXDpDzo/Q==
localhost Expect: 100-continue
localhost User-Agent: aws-sdk-ruby2/2.11.436 jruby/2.3.3 java resources
localhost X-Amz-Date: 20200331T001603Z
localhost Accept-Encoding: 
localhost <BODY>

localhost [RESPONSE] [00:16:04.124] [ Duration 177.315ms  ↑ 5.0 MiB  ↓ 259 B ]
localhost 200 OK
localhost Server: MinIO/RELEASE.2020-03-25T07-03-04Z
localhost Vary: Origin
localhost X-Amz-Request-Id: 16013B513F051E0D
localhost X-Xss-Protection: 1; mode=block
localhost Accept-Ranges: bytes
localhost Content-Length: 0
localhost Content-Security-Policy: block-all-mixed-content
localhost ETag: "e25a5d2b3385e6c74474f5c3a43ce8fd"
localhost <BODY>
localhost 

@Adithya-copart
Copy link

Adithya-copart commented Mar 31, 2020

Looks similar to https://bugs.ruby-lang.org/issues/14451.

net/httpis patched in aws-sdk and the same patch is applied against 9.2.9.0 and 9.2.1.0.

@headius
Copy link
Member

headius commented Mar 31, 2020

I have still been unable to get any version to work, but I'm proceeding to dig into debugging why current JRuby 9.2 times out.

So far I've figured out that the AWS client requests HTTP Continue from the server, which the net/http subsystem waits for correctly. However after the continue, the client loop appears to go back in asking for another header that never arrives.

It seems like after the Continue has been received, the data is not being sent. Trying to figure out why.

@headius
Copy link
Member

headius commented Mar 31, 2020

Ok, the data is being sent, via a copy_stream call in net/http/generic_request.rb. Now the question is why the server doesn't respond.

@headius
Copy link
Member

headius commented Mar 31, 2020

Running minio locally I can get some admin output, but it isn't so helpful... these messages appear after the client times out each request:

14:21:14.230 [400 Bad Request] s3.PutObject localhost:9000/test/jruby.jar 127.0.0.1         34.824802s   ↑ 4.8 MiB ↓ 609 B
14:21:58.704 [400 Bad Request] s3.PutObject localhost:9000/test/jruby.jar 127.0.0.1         36.624553s   ↑ 5.2 MiB ↓ 609 B
14:23:06.598 [400 Bad Request] s3.PutObject localhost:9000/test/jruby.jar 127.0.0.1         1m0.079286s  ↑ 5.3 MiB ↓ 609 B
14:24:06.623 [400 Bad Request] s3.PutObject localhost:9000/test/jruby.jar 127.0.0.1         1m0.017263s  ↑ 132 B ↓ 609 B
14:24:16.946 [400 Bad Request] s3.PutObject localhost:9000/test/jruby.jar 127.0.0.1         9.928029s    ↑ 4.2 MiB ↓ 609 B

@headius
Copy link
Member

headius commented Mar 31, 2020

Logging the communication with a little proxy script yields this result:

in2out: "PUT /test/jruby.jar HTTP/1.1\r\nContent-Type: \r\nAccept-Encoding: \r\nUser-Agent: aws-sdk-ruby2/2.11.436 jruby/2.6.5 java resources\r\nExpect: 100-continue\r\nContent-Md5: G4RMo9phQi1le/o067UK5w==\r\nX-Amz-Date: 20200331T143842Z\r\nHost: localhost:9001\r\nX-Amz-Content-Sha256: db5a4e5b66f3b02cff8dc36d851976e8fc13bf41833bb0c9069712cab06db288\r\nAuthorization: AWS4-HMAC-SHA256 Credential=minioadmin/20200331/us-east-1/s3/aws4_request, SignedHeaders=content-md5;host;x-amz-content-sha256;x-amz-date, Signature=bb550848c6c0608791bceab4f50b1d0c1eb9905ea4d1e9ec614145493539c052\r\nContent-Length: 15444499\r\nAccept: */*\r\n\r\n"
out2in: "HTTP/1.1 100 Continue\r\n\r\n"
in2out: "PK\x03\x04\x14\x00\b\b\b\x00\xBBq~P\x00
...

The file is sent in its entirety, but then the server doesn't respond.

@headius
Copy link
Member

headius commented Mar 31, 2020

After comparing proxy output with CRuby, it's pretty clear that the entire file isn't getting sent. Investigating IO.copy_stream now to see if somehow it's quietly giving up before the full stream content has been sent.

@headius
Copy link
Member

headius commented Mar 31, 2020

Bug dates back to the original implementation of this code in 728fd53:

Implement IO.copy_stream for non-file input. Fixes #1103.

This implementation is still pretty naive, just expecting -1 to
indicate that the incoming channel has reached a terminating
point. With the amount of wrapping the JDK does around the
actual file descriptors, it may be the best we can do without
building more of our IO layer using native calls.

Pretty naive indeed.

@kvmuralidhar
Copy link

kvmuralidhar commented Mar 31, 2020

Hi Charles,

Thank you :-).

Murali

@enebo
Copy link
Member

enebo commented Mar 31, 2020

@headius What amazes me is how much write must tend to write out the full amount...

@kvmuralidhar
Copy link

kvmuralidhar commented Mar 31, 2020

Hi Charles,

How do I apply this change to test ?

Thanks
Murali

@Goltergaul
Copy link

Goltergaul commented Mar 31, 2020

@headius wow cool debugging! <3 That sounds like a rather serious bug though since it would mean that data was lost in those cases?

@headius
Copy link
Member

headius commented Mar 31, 2020

@Goltergaul Yes, data would be lost and copy_stream would report that everything was written.

It appears the paths that use a file for input follow different logic that should continue to try to write. However that only works with JDK FileChannel and most files in JRuby are opened using a native file descriptor and jnr-enxio's NativeDeviceChannel. That falls back on channel-to-channel transfer which had this bug in it.

Sadly there's no specs or tests for this behavior, so it was never caught until now.

@headius
Copy link
Member

headius commented Mar 31, 2020

Ruby proxy script I used, for posterity:

require 'socket'

proxy = TCPServer.new(ARGV[0].to_i)
out = TCPSocket.new(ARGV[1], ARGV[2].to_i)
inn = proxy.accept

inn2out = Thread.new do
  while true
    x = inn.read_nonblock(1024, exception: false)
    next if x == :wait_readable
    next if x == nil
    out.write(x)
    puts "in2out: #{x.inspect[0..100]} ..."
  end
end

out2in = Thread.new do
  while true
    x = out.read_nonblock(1024, exception: false)
    next if x == :wait_readable
    next if x == nil
    inn.write(x)
    puts "out2in: #{x.inspect[0..100]} ..."
  end
end

[inn2out, out2in].each(&:join)

headius added a commit that referenced this issue Mar 31, 2020
When the receiving stream doesn't accept all data written by
IO.copy_stream, the runtime should continue attempting to write
that data until what it read has been consumed.

See #6078 for an example bug in JRuby.
@headius
Copy link
Member

headius commented Mar 31, 2020

Sorry for the delay on this one, folks, but it will be fixed in the next release (9.3).

I was never able to get this to work on any version of JRuby, since this copy_stream function is very old and always had the bug. It is a mystery why it worked for some of you.

You can test this using our snapshot builds once this build finishes:

https://travis-ci.org/github/jruby/jruby/builds/669417554

The snapshots are here:

https://oss.sonatype.org/content/repositories/snapshots/org/jruby/jruby-dist/9.3.0.0-SNAPSHOT/

@kvmuralidhar
Copy link

kvmuralidhar commented Mar 31, 2020

Hi Charles,

Thanks again for your help with this :).

Murali

@headius
Copy link
Member

headius commented Apr 1, 2020

@enebo I think when I wrote this logic I just blanked out the fact that a single rb_io_write call in Ruby will loop until it writes out everything you've given it. I replaced that with WriteableByteChannel.write which does not and didn't think about the return value.

@enebo
Copy link
Member

enebo commented Apr 1, 2020

@headius but in how many years we did not see this reported as a front and center problem meaning in most cases the write did write all the data? That is what I am impressed with.

@kares kares added this to the JRuby 9.3.0.0 milestone Apr 1, 2020
@kvmuralidhar
Copy link

kvmuralidhar commented Apr 1, 2020

Hi Charles,

I tried uploading a 21Mb file with the 9.3.0.0 snapshot and got the same failure as before.

Thanks
Murali

@headius
Copy link
Member

headius commented Apr 1, 2020

@kvmuralidhar Could you show me the jruby -v output from that snapshot?

@headius
Copy link
Member

headius commented Apr 1, 2020

@enebo Yeah this feature is rarely used in general, and usually not with sockets, so I suppose folks never saw this bug. When either end is a file, we use a different path that does ensure everything gets written, so this only affects non-file to non-file, where the target may do partial reads.

@kvmuralidhar
Copy link

kvmuralidhar commented Apr 1, 2020

Hi Charles,

Here is the output

 /mnt/data/elasticStack/jruby9300/bin/jruby -v
jruby 9.3.0.0-SNAPSHOT (2.5.7) 2020-03-24 2a0c1e6cfc OpenJDK 64-Bit Server VM 25.242-b08 on 1.8.0_242-b08 +jit [linux-x86_64]

Thanks
Murali

@Adithya-copart
Copy link

Adithya-copart commented Apr 1, 2020

Works for me in a mac.
Used rvm install jruby-head though.

@kvmuralidhar
Copy link

kvmuralidhar commented Apr 1, 2020

Hi Charles,

looks like a user error as the build I am testing is as of 3/24

Thanks
Murali

@kvmuralidhar
Copy link

kvmuralidhar commented Apr 1, 2020

Hi Charles,

All good - the upload works as expected!

Thanks
Murali

@headius
Copy link
Member

headius commented Apr 2, 2020

@Adithya-copart @kvmuralidhar Thank you for confirming!

eregon pushed a commit to ruby/spec that referenced this issue Apr 5, 2020
When the receiving stream doesn't accept all data written by
IO.copy_stream, the runtime should continue attempting to write
that data until what it read has been consumed.

See jruby/jruby#6078 for an example bug in JRuby.
@kvmuralidhar
Copy link

kvmuralidhar commented Jun 5, 2020

Sorry for the delay on this one, folks, but it will be fixed in the next release (9.3).

I was never able to get this to work on any version of JRuby, since this copy_stream function is very old and always had the bug. It is a mystery why it worked for some of you.

You can test this using our snapshot builds once this build finishes:

https://travis-ci.org/github/jruby/jruby/builds/669417554

The snapshots are here:

https://oss.sonatype.org/content/repositories/snapshots/org/jruby/jruby-dist/9.3.0.0-SNAPSHOT/

Hi Charles,

Any idea on the release date for 9.3.0.0 ?

Thanks
Murali

kares pushed a commit to kares/jruby that referenced this issue Jul 30, 2020
On target streams that may do partial reads, the logic here fails
to write all content read from the source stream. The return value
of Channel.write is never checked, but the total bytes written is
incremented, resulting in silently losing data.

The patch loops until all bytes have been written to the output
channel after each read.

CRuby does similar logic. We may need to add additional blocking
write or thread event checks here, in case the target stream
blocks during the write loop, but that will require the different
`transfer` paths to be expanded to support nonblocking IO
channels.

Fixes jruby#6078
kares added a commit that referenced this issue Jul 30, 2020
On target streams that may do partial reads, the logic here fails
to write all content read from the source stream. The return value
of Channel.write is never checked, but the total bytes written is
incremented, resulting in silently losing data.

The patch loops until all bytes have been written to the output
channel after each read.

CRuby does similar logic. We may need to add additional blocking
write or thread event checks here, in case the target stream
blocks during the write loop, but that will require the different
`transfer` paths to be expanded to support nonblocking IO
channels.

Fixes #6078

Back-port of bc9a5ac
Co-authored-by: Charles Oliver Nutter <headius@headius.com>
@kares kares modified the milestones: JRuby 9.3.0.0, 9.2.13.0 Jul 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants