Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

A race condition between checking the file size and reading the file content. #117

Closed
Teshootub7 opened this Issue · 5 comments

2 participants

Morita Sho Hiroshi Nakamura
Morita Sho

Hi,

When uploading a file, HTTPClient checks the file size for calculating Content-Length, then read the content of the file until EOF. This can occur a race condition.
If file size was changed AFTER checking the file size and BEFORE reading the content, the value of Content-Length and actual body length can differ. It makes the HTTP request invalid.

It would be nice if it records the file size on calculating Content-Length, and read the file content only for the recored file size.
I made a patch to achieve this. The method readbytes are taken from readbytes.rb that was included in Ruby 1.8.

diff --git a/lib/httpclient/http.rb b/lib/httpclient/http.rb
index 2945540..8cde071 100644
--- a/lib/httpclient/http.rb
+++ b/lib/httpclient/http.rb
@@ -449,6 +449,7 @@ module HTTP
         @body = nil
         @size = nil
         @positions = nil
+        @sizes = nil
         @chunk_size = nil
       end

@@ -456,6 +457,7 @@ module HTTP
       def init_request(body = nil, boundary = nil)
         @boundary = boundary
         @positions = {}
+        @sizes = {}
         set_content(body, boundary)
         @chunk_size = DEFAULT_CHUNK_SIZE
       end
@@ -542,7 +544,7 @@ module HTTP
           # bear in mind that server may not support it. at least ruby's CGI doesn't.
           @body = body
           remember_pos(@body)
-          @size = body.respond_to?(:size) ? body.size - body.pos : nil
+          @size = remember_size(body)
         elsif boundary and Message.multiparam_query?(body)
           @body = build_query_multipart_str(body, boundary)
           @size = @body.size
@@ -561,11 +563,43 @@ module HTTP
         io.pos = @positions[io] if @positions.key?(io)
       end

+      def remember_size(io)
+        pos = 0
+        begin
+          pos = io.pos if io.respond_to?(:pos)
+        rescue Errno::ESPIPE
+        end
+        if io.respond_to?(:lstat)
+          @sizes[io] = io.lstat.size - pos
+        elsif io.respond_to?(:size)
+          @sizes[io] = io.size
+          @sizes[io] -= pos if @sizes[io]
+        else
+          nil
+        end
+      end
+
+      def readbytes(io, n, outbuf)
+        io.read(n, outbuf)
+        if outbuf == nil
+          raise EOFError, "End of file reached"
+        end
+        if outbuf.bytesize < n
+          raise "data truncated"
+        end
+        outbuf
+      end
+
       def dump_file(io, dev)
         buf = ''
-        while !io.read(@chunk_size, buf).nil?
+        size = @sizes[io]
+        while size > @chunk_size
+          readbytes( io, @chunk_size, buf )
           dev << buf
+          size -= @chunk_size
         end
+        readbytes( io, size, buf )
+        dev << buf
       end

       def dump_chunks(io, dev)
@@ -596,21 +630,14 @@ module HTTP
           @as_stream = false
         end

-        def add(part)
+        def add(part, size = nil)
           if Message.file?(part)
             @as_stream = true
             @body << part
-            if part.respond_to?(:lstat)
-              @size += part.lstat.size
-            elsif part.respond_to?(:size)
-              if sz = part.size
-                @size += sz
-              else
-                @size = nil
-              end
-            else
-              # use chunked upload
+            if size.nil?
               @size = nil
+            elsif @size
+              @size += size
             end
           elsif @body[-1].is_a?(String)
             @body[-1] += part.to_s
@@ -634,9 +661,11 @@ module HTTP
         parts = Parts.new
         query.each do |attr, value|
           value ||= ''
+          size = nil
           headers = ["--#{boundary}"]
           if Message.file?(value)
             remember_pos(value)
+            size = remember_size(value)
             param_str = params_from_file(value).collect { |k, v|
               "#{k}=\"#{v}\""
             }.join("; ")
@@ -656,12 +685,15 @@ module HTTP
             h.each do |h_key, h_val|
               headers << %{#{h_key}: #{h_val}} if h_key != :content
             end
-            remember_pos(value) if Message.file?(value)
+            if Message.file?(value)
+              remember_pos(value)
+              size = remember_size(value)
+            end
           else
             headers << %{Content-Disposition: form-data; name="#{attr}"}
           end
           parts.add(headers.join(CRLF) + CRLF + CRLF)
-          parts.add(value)
+          parts.add(value, size)
           parts.add(CRLF)
         end
         parts.add("--#{boundary}--" + CRLF + CRLF) # empty epilogue
diff --git a/test/test_httpclient.rb b/test/test_httpclient.rb
index ba26b03..4358b95 100644
--- a/test/test_httpclient.rb
+++ b/test/test_httpclient.rb
@@ -754,7 +754,7 @@ EOS
     assert_match(/\r\n2\r\n/m, res.content)
     assert_match(/\r\nContent-Disposition: form-data; name="3"; filename=""\r\n/m, res.content)
     assert_match(/\r\nContent-Length:/m, str.string)
-    assert_equal(3, myio.called)
+    assert_equal(2, myio.called)
   end

   def test_post_with_io_nosize # streaming + chunked post

Thanks,

Hiroshi Nakamura
Owner

Thanks! Looks like it's a bug but I'm not sure I really understand possible situations this bug affects. Can you create testcases that fail with current implementation and run fine with this patch?

Morita Sho

Thanks for your reply.
I think this race condition is "possibility". It will not happens easily.

A possible situation is when uploading a log file in /var/log.

  • HTTPClient first determines the file size of the log file.
  • Some daemon appends a new entry to the log file.
  • HTTPClient read the content of the log file.

It results the file size and the content size HTTPClient read can differ.

Or, some user may executes "cat /dev/urandom > file_to_upload" for unknown reason, and upload 'file_to_upload' while the command is running.

Anyway, the problem is the difference between the file size HTTPClient determined and the content size HTTPClient read. If HTTPClient determined the file size is n bytes, then HTTPClient should read only n bytes, not read until EOF. To simulate this situation, I made a following testcase.

diff --git a/test/test_httpclient.rb b/test/test_httpclient.rb
index ba26b03..c2d35b5 100644
--- a/test/test_httpclient.rb
+++ b/test/test_httpclient.rb
@@ -771,6 +771,20 @@ EOS
     assert_match(/\r\nTransfer-Encoding: chunked\r\n/m, str.string)
   end

+  def test_post_with_io_size_mismatch
+    myio = StringIO.new("45")
+    def myio.size
+      1
+    end
+    @client.debug_dev = str = StringIO.new
+    res = @client.post(serverurl + 'servlet', {1=>2, 3=>myio})
+    assert_match(/\r\nContent-Disposition: form-data; name="1"\r\n/m, res.content)
+    assert_match(/\r\n2\r\n/m, res.content)
+    assert_match(/\r\nContent-Disposition: form-data; name="3"; filename=""\r\n/m, res.content)
+    assert_match(/\r\n4\r\n/m, res.content)
+    assert_match(/\r\nContent-Length: /m, str.string)
+  end
+
   def test_post_async
     param = {'1'=>'2', '3'=>'4'}
     conn = @client.post_async(serverurl + 'servlet', param)

In this testcase, myio#size returns 1 but the actual content that HTTPClient will read is 2 bytes. It results the Content-Length header in HTTP Request is 283, but the actual request body is 284 bytes.

POST /servlet HTTP/1.1
Content-Type: multipart/form-data; boundary=fe8e0fe127f774dadaacb9521042a53ecebe8a0c
Accept: */*
Date: Sun, 07 Oct 2012 12:48:03 GMT
Content-Length: 283
Host: localhost:34248

--fe8e0fe127f774dadaacb9521042a53ecebe8a0c
Content-Disposition: form-data; name="1"

2
--fe8e0fe127f774dadaacb9521042a53ecebe8a0c
Content-Disposition: form-data; name="3"; filename=""
Content-Type: application/octet-stream

45
--fe8e0fe127f774dadaacb9521042a53ecebe8a0c--

The testcase I made should fail as following:

% rake test TEST=./test/test_httpclient.rb TESTOPTS=-ntest_post_with_io_size_mismatch
[...]
test_post_with_io_size_mismatch(TestHTTPClient)
/home/qw/tmp/httpclient/test/test_httpclient.rb:784:in `test_post_with_io_size_mismatch'
     781:     assert_match(/\r\nContent-Disposition: form-data; name="1"\r\n/m, res.content)
     782:     assert_match(/\r\n2\r\n/m, res.content)
     783:     assert_match(/\r\nContent-Disposition: form-data; name="3"; filename=""\r\n/m, res.content)
  => 784:     assert_match(/\r\n4\r\n/m, res.content)
     785:     assert_match(/\r\nContent-Length: /m, str.string)
     786:   end
     787: 
<> expected but was
<>

diff:
  nil
===============================================================================


Finished in 0.034659499 seconds.

1 tests, 4 assertions, 1 failures, 0 errors, 0 pendings, 0 omissions, 0 notifications
0% passed

After applying the patch I posted, the testcase can be passed.

Thanks,

Hiroshi Nakamura nahi closed this issue from a commit
Hiroshi Nakamura Do not read IO till EOF when the size is given
HTTPClient checks if the given IO responds to :size and set
"Content-Length" header if the size is given.  But it read too much till
EOF.  Should read just as the specified size.

To make the logic simple, now HTTPClient doesn't try to calculate the
whole size of request body for Content-Length if the request body is a
multipart and contains an IO as a parameter.

Fixes #117.
1270672
Hiroshi Nakamura nahi closed this in 1270672
Hiroshi Nakamura
Owner

Thanks for the test! I included it in my fix commit (with a little modification.) To make the logic simple, I decided to not try to calculate the whole size of request body for Content-Length if the request body is a multipart and contains an IO as a parameter. Please reopen the ticket if it looks bad!

Hiroshi Nakamura nahi referenced this issue from a commit
Hiroshi Nakamura Allow chunked multipart POST of sized IO
Based on the fix for #117, this commit allows chunked multipart POST of
sized IO.  Fixes #112.
13c3f87
Morita Sho

Thanks for your work.
But I'm afraid of compatibility if request body is sent using chunked encoding when uploading a file.

A comment in lib/httpclient/http.rb described about chunked encoding. It said:

bear in mind that server may not support it. at least ruby's CGI doesn't

I'm not sure how many servers support (or not support) chunked encoding, but I think it would be better to send with Content-Length if it is possible to calculate Content-Length.
As a server side, providing Content-Length in HTTP request makes it easy to decide whether or not to accept the file upload by checking Content-Length before reading the whole content body.

Hiroshi Nakamura nahi reopened this
Hiroshi Nakamura
Owner

Thanks, I guess #121 hits this problem...

Hiroshi Nakamura nahi closed this issue from a commit
Hiroshi Nakamura fix: avoid inconsistent Content-Length and actual body
If lengths of all posted arguments are known HTTPClient sends
'Content-Length' as a sum length of all arguments. But the length of
actual body was wrong because it read as much as possible regardless of
what IO#size returned. So if the file is getting bigger while HTTPClient
is processing a request the request has inconsistent Content-Length and
body.

This bug is found, and the fix is proposed both by @Teshootub7. Thank
you very much for patient trouble shooting!

Fixes #117.
b42c6da
Hiroshi Nakamura nahi closed this in b42c6da
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.