Skip to content

IO performance #2166

Closed
hosiawak opened this Issue Feb 18, 2013 · 10 comments

8 participants

@hosiawak
Rubinius member
# test.txt is a 40 MB file created by 
# cat /dev/random > test.txt
# and hitting ctrl + c after a few seconds

File.open('test.txt', 'r') do |f|
  f.each_byte {}
end

$ time ruby-head test.rb

real 0m1.918s
user 0m1.899s
sys 0m0.018s

$ time ./bin/rbx test.rb

real 0m41.168s
user 0m41.117s
sys 0m0.243s

I noticed similar performance on other IO methods, eg. IO#getc is 6x slower, IO#gets is 12x slower on rbx.

Running with -Xprofile shows a lot of locking/unlocking. I'm wondering if adding vm_object_lock/unlock to the JIT would optimize most these calls as a first step.

@hosiawak
Rubinius member
hosiawak commented Mar 4, 2013

Without Rubinius.synchronize locks in io.rb, io18.rb and io19.rb the benchmark above completes in 8 seconds
File.read is 2x faster.

The locks were added as a result of #244

@ChristianSi

I noticed this performance iissue when iterating a file line by line -- a pretty common use case. A very simple example:

#!/usr/bin/env ruby

# Count and print the lines in a file.
def count_lines(file)
  count = 0
  file.each_line do |line|
    count += 1
  end
  puts "#{count} lines"
end

# Main program: process ARGV or stdin
if ARGV.empty?
  count_lines $stdin
else
  ARGV.each do |filename|
    File.open(filename) { |file| count_lines file }
  end
end

Running this scipt on a 124456-line, 5.2 MB text file, ruby 1.9.3 takes 0.04 secs, rubinius master 1.13 secs. Ouch!

@ioquatix

Well, it looks like the performance issue is due to the excessive locking in the underlying IO buffer.

1/ Is the current implementation a bottleneck, using some kind of channel based approach rather than a mutex?

2/ Should the underlying IO operations support multi-threaded access at the expense of performance?

3/ Wouldn't it be far more practical to lock the IO buffers at a higher level, and then perhaps push this burden to the library implementers who are choosing to use multithreads.

4/ Underlying OS read/write calls aren't necessarily synchronous for multithreads - is this something that rbx should be trying to guarantee?

@dbussink
Rubinius member

Actually, last time I checked the overhead of locking was fairly limited. Still significant, but not the largest contributing factor (anymore).

Also we don't use a channel approach anymore and haven't for a long time:

https://github.com/rubinius/rubinius/blob/master/kernel/common/io.rb#L65

@dbussink
Rubinius member

BTW, how did you reach this conclusion? Any benchmark / profiling that shows that locking currently is the biggest problem here?

@ioquatix

@dbussink, see the reply by @hosiawak

@YorickPeterse
Rubinius member

Currently looking into this using the following basic benchmark:

require 'benchmark'
require 'tempfile'

handle   = Tempfile.new('rbx-bench')
max_size = 10 * 1024 * 1024
amount   = 10

while handle.size <= max_size
  handle.write(File.read('/dev/urandom', 1024 * 1024))
end

Benchmark.bmbm(20) do |bench|
  bench.report '#each_byte' do
    amount.times do
      handle.rewind
      handle.each_byte {}
    end
  end
end

On Rbx you get the following timings:

Rehearsal -------------------------------------------------------
#each_byte           62.446660   0.360000  62.806660 ( 62.801676)
--------------------------------------------- total: 62.806660sec

                          user     system      total        real
#each_byte           61.106660   0.336667  61.443327 ( 61.435222)

On MRI however you get this:

Rehearsal --------------------------------------------------------
#each_byte             5.350000   0.030000   5.380000 (  5.376409)
----------------------------------------------- total: 5.380000sec

                           user     system      total        real
#each_byte             5.420000   0.010000   5.430000 (  5.428334)

Sadly the profiler is currently broken so I can't run the benchmark with that enabled.

@undying
undying commented Feb 18, 2014

Also found an issue with network IO.

require 'fiber'
require 'eventmachine'
require 'em-http-request'

def async(url)
  f = Fiber.current
  http = EM::HttpRequest.new(url).get
  http.errback { 
    f.resume
    EM.stop
  }
  http.callback {
    f.resume
    EM.stop
  }
  Fiber.yield
end

EM.epoll
EM.run {
  Fiber.new {
    async('http://www.yandex.ru')
  }.resume
}
kron@kron /tmp % strace -c ruby r.rb                                                                                                                                                   23:32:59
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 27.03    0.000010           0        58           mprotect
 24.32    0.000009           0      1074       758 open
 16.22    0.000006           0       507       278 stat
 16.22    0.000006           0       811           lstat
 16.22    0.000006           0       161       157 ioctl
  0.00    0.000000           0       502        23 read
  0.00    0.000000           0         1           write
  0.00    0.000000           0       324         1 close
  0.00    0.000000           0       701           fstat
  0.00    0.000000           0         2           poll
  0.00    0.000000           0        57           lseek
  0.00    0.000000           0        90           mmap
  0.00    0.000000           0        16           munmap
  0.00    0.000000           0        93           brk
  0.00    0.000000           0        16           rt_sigaction
  0.00    0.000000           0        28           rt_sigprocmask
  0.00    0.000000           0         1           writev
  0.00    0.000000           0         1         1 access
  0.00    0.000000           0         1           pipe
  0.00    0.000000           0         2           socket
  0.00    0.000000           0         2         1 connect
  0.00    0.000000           0         1           sendto
  0.00    0.000000           0         1           recvfrom
  0.00    0.000000           0         2         1 shutdown
  0.00    0.000000           0         1           getpeername
  0.00    0.000000           0         2           setsockopt
  0.00    0.000000           0         2           getsockopt
  0.00    0.000000           0         1           clone
  0.00    0.000000           0         1           execve
  0.00    0.000000           0         2           uname
  0.00    0.000000           0       289           fcntl
  0.00    0.000000           0         4           getdents
  0.00    0.000000           0         1           getcwd
  0.00    0.000000           0         6           getrlimit
  0.00    0.000000           0       101           getuid
  0.00    0.000000           0       101           getgid
  0.00    0.000000           0       166           geteuid
  0.00    0.000000           0       102           getegid
  0.00    0.000000           0         1           sigaltstack
  0.00    0.000000           0         1           arch_prctl
  0.00    0.000000           0         7         1 futex
  0.00    0.000000           0         5           sched_getaffinity
  0.00    0.000000           0         1           epoll_create
  0.00    0.000000           0         1           set_tid_address
  0.00    0.000000           0         1           clock_gettime
  0.00    0.000000           0        26           epoll_wait
  0.00    0.000000           0         7           epoll_ctl
  0.00    0.000000           0         5         3 openat
  0.00    0.000000           0        26           ppoll
  0.00    0.000000           0         1           set_robust_list
  0.00    0.000000           0         2           pipe2
------ ----------- ----------- --------- --------- ----------------
100.00    0.000037                  5315      1224 total
kron@kron /tmp % strace -c rbx r.rb                                                                                                                                                    23:44:17
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 59.00    0.001000         500         2           poll
 29.09    0.000493          33        15           munmap
  5.78    0.000098           0      7798      7047 stat
  3.30    0.000056           0      5497           lstat
  0.83    0.000014           0      1804           getdents
  0.53    0.000009           0      1330        19 read
  0.47    0.000008           0       635         5 open
  0.41    0.000007           0       902           openat
  0.35    0.000006           0      1538         1 close
  0.24    0.000004           0       891           fcntl
  0.00    0.000000           0        35           fstat
  0.00    0.000000           0         1           lseek
  0.00    0.000000           0        86           mmap
  0.00    0.000000           0       323           mprotect
  0.00    0.000000           0        71           brk
  0.00    0.000000           0        28           rt_sigaction
  0.00    0.000000           0         1           rt_sigprocmask
  0.00    0.000000           0         3           ioctl
  0.00    0.000000           0         1           writev
  0.00    0.000000           0         1         1 access
  0.00    0.000000           0         1           pipe
  0.00    0.000000           0        22           select
  0.00    0.000000           0         2           socket
  0.00    0.000000           0         2         1 connect
  0.00    0.000000           0         1           sendto
  0.00    0.000000           0         1           recvfrom
  0.00    0.000000           0         2         1 shutdown
  0.00    0.000000           0         1           getpeername
  0.00    0.000000           0         2           setsockopt
  0.00    0.000000           0         2           getsockopt
  0.00    0.000000           0         5           clone
  0.00    0.000000           0         1           execve
  0.00    0.000000           0         3           uname
  0.00    0.000000           0         2           getcwd
  0.00    0.000000           0         2           getrlimit
  0.00    0.000000           0       417           geteuid
  0.00    0.000000           0         1           arch_prctl
  0.00    0.000000           0       203         4 futex
  0.00    0.000000           0         1           epoll_create
  0.00    0.000000           0         1           set_tid_address
  0.00    0.000000           0        22           epoll_wait
  0.00    0.000000           0         7           epoll_ctl
  0.00    0.000000           0         1           set_robust_list
------ ----------- ----------- --------- --------- ----------------
100.00    0.001695                 21664      7079 total
kron@kron ~ % ruby --version
ruby 2.0.0p353 (2013-11-22 revision 43784) [x86_64-linux]
kron@kron ~ % rbx --version
rubinius 2.2.5.n50 (2.1.0 56d2edb4 2014-02-19 JI) [x86_64-linux-gnu]
kron@kron ~ % uname -a
Linux kron 3.13.2-gentoo #1 SMP Fri Feb 14 15:14:06 MSK 2014 x86_64 Intel(R) Core(TM) i7-2677M CPU @ 1.80GHz GenuineIntel GNU/Linux
@YorickPeterse YorickPeterse added a commit to rubinius/rubinius-benchmark that referenced this issue Mar 19, 2014
@YorickPeterse YorickPeterse Benchmark for File#each_byte.
This benchmark was added for rubinius/rubinius#2166
to make it easier to run and keep track of. It does also benchmark File#rewind
but I'm not sure if there's a way around that. The alternative would be to use
File.open() for each iteration but then we'd end up benchmarking even more.
97aa8bd
@tak1n
Rubinius member
tak1n commented May 23, 2015

Running @ChristianSi script on a 6MB file.

Rubinius 2.5.5:

rubinius-bugs ➤ ruby -v                                                                                                                                                                                git:master*
rubinius 2.5.5.c16 (2.1.0 621953b6 2015-05-21 3.5.0 JI) [x86_64-linux-gnu]
rubinius-bugs ➤ time ./file_io.rb large.txt                                                                                                                                                            git:master*
14949 lines
./file_io.rb large.txt  0,27s user 0,03s system 102% cpu 0,295 total

Ruby 2.2.2:

rubinius-bugs ➤ ruby -v                                                                                                                                                                                git:master*
ruby 2.2.2p95 (2015-04-13 revision 50295) [x86_64-linux]
rubinius-bugs ➤ time ./file_io.rb large.txt                                                                                                                                                            git:master*
14949 lines
./file_io.rb large.txt  0,02s user 0,01s system 99% cpu 0,030 total

Also tried a script with IO#each_byte:

# require 'rubinius/profiler' if RUBY_ENGINE == 'rbx'

large_file  = File.join(File.dirname(__FILE__), 'large.txt')

large_handle  = File.open(large_file, 'r')

# if RUBY_ENGINE == 'rbx'
#   profiler = Rubinius::Profiler::Instrumenter.new
#   profiler.start
# end

large_handle.each_byte {}
large_handle.rewind

# if RUBY_ENGINE == 'rbx'
#   profiler.stop
#   profiler.show
# end

Gives me following results:

Rubinius 2.5.5

rubinius-bugs ➤ time rbx io_performance.rb                                                                                                                                                             git:master*
rbx io_performance.rb  1,32s user 0,02s system 114% cpu 1,174 total

Ruby 2.2.2

rubinius-bugs ➤ time ruby io_performance.rb                                                                                                                                                            git:master*
ruby io_performance.rb  0,24s user 0,00s system 100% cpu 0,248 total

And finally ran the benchmark in rubinius/rubinius-benchmark
https://github.com/rubinius/rubinius-benchmark/blob/master/core/file/bench_each_byte.rb

Rubinius 2.5.5:

rubinius-bugs ➤ rbx bench_io.rb                                                                                                                                                                        git:master*
Calculating -------------------------------------
File.read a small file
                        87.000  i/100ms
File.read a medium file
                        10.000  i/100ms
File.read a large file
                         1.000  i/100ms
-------------------------------------------------
File.read a small file
                        884.497  (± 2.4%) i/s -      4.437k
File.read a medium file
                         99.991  (± 2.0%) i/s -    500.000 
File.read a large file
                          1.979  (± 0.0%) i/s -     10.000 

Ruby 2.2.2:

rubinius-bugs ➤ ruby bench_io.rb                                                                                                                                                                       git:master*
Calculating -------------------------------------
File.read a small file
                       199.000  i/100ms
File.read a medium file
                        22.000  i/100ms
File.read a large file
                         1.000  i/100ms
-------------------------------------------------
File.read a small file
                          1.996k (± 0.8%) i/s -     10.149k
File.read a medium file
                        223.381  (± 0.9%) i/s -      1.122k
File.read a large file
                          4.451  (± 0.0%) i/s -     23.000 
@brixen
Rubinius member
brixen commented Dec 28, 2015

There are a number of IO related performance optimizations possible, but they should be approached with specific PRs. There is an ffi-io branch to replace the C++ primitives with FFI-based methods. That would be a good start.

@brixen brixen closed this Dec 28, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.