IO performance #2166

Closed
hosiawak opened this Issue Feb 18, 2013 · 10 comments

Projects

None yet

8 participants

@hosiawak
Rubinius member
# test.txt is a 40 MB file created by 
# cat /dev/random > test.txt
# and hitting ctrl + c after a few seconds

File.open('test.txt', 'r') do |f|
  f.each_byte {}
end

$ time ruby-head test.rb

real 0m1.918s
user 0m1.899s
sys 0m0.018s

$ time ./bin/rbx test.rb

real 0m41.168s
user 0m41.117s
sys 0m0.243s

I noticed similar performance on other IO methods, eg. IO#getc is 6x slower, IO#gets is 12x slower on rbx.

Running with -Xprofile shows a lot of locking/unlocking. I'm wondering if adding vm_object_lock/unlock to the JIT would optimize most these calls as a first step.

@hosiawak
Rubinius member

Without Rubinius.synchronize locks in io.rb, io18.rb and io19.rb the benchmark above completes in 8 seconds
File.read is 2x faster.

The locks were added as a result of #244

@ChristianSi

I noticed this performance iissue when iterating a file line by line -- a pretty common use case. A very simple example:

#!/usr/bin/env ruby

# Count and print the lines in a file.
def count_lines(file)
  count = 0
  file.each_line do |line|
    count += 1
  end
  puts "#{count} lines"
end

# Main program: process ARGV or stdin
if ARGV.empty?
  count_lines $stdin
else
  ARGV.each do |filename|
    File.open(filename) { |file| count_lines file }
  end
end

Running this scipt on a 124456-line, 5.2 MB text file, ruby 1.9.3 takes 0.04 secs, rubinius master 1.13 secs. Ouch!

@ioquatix

Well, it looks like the performance issue is due to the excessive locking in the underlying IO buffer.

1/ Is the current implementation a bottleneck, using some kind of channel based approach rather than a mutex?

2/ Should the underlying IO operations support multi-threaded access at the expense of performance?

3/ Wouldn't it be far more practical to lock the IO buffers at a higher level, and then perhaps push this burden to the library implementers who are choosing to use multithreads.

4/ Underlying OS read/write calls aren't necessarily synchronous for multithreads - is this something that rbx should be trying to guarantee?

@dbussink
Rubinius member

Actually, last time I checked the overhead of locking was fairly limited. Still significant, but not the largest contributing factor (anymore).

Also we don't use a channel approach anymore and haven't for a long time:

https://github.com/rubinius/rubinius/blob/master/kernel/common/io.rb#L65

@dbussink
Rubinius member

BTW, how did you reach this conclusion? Any benchmark / profiling that shows that locking currently is the biggest problem here?

@ioquatix

@dbussink, see the reply by @hosiawak

@YorickPeterse
Rubinius member

Currently looking into this using the following basic benchmark:

require 'benchmark'
require 'tempfile'

handle   = Tempfile.new('rbx-bench')
max_size = 10 * 1024 * 1024
amount   = 10

while handle.size <= max_size
  handle.write(File.read('/dev/urandom', 1024 * 1024))
end

Benchmark.bmbm(20) do |bench|
  bench.report '#each_byte' do
    amount.times do
      handle.rewind
      handle.each_byte {}
    end
  end
end

On Rbx you get the following timings:

Rehearsal -------------------------------------------------------
#each_byte           62.446660   0.360000  62.806660 ( 62.801676)
--------------------------------------------- total: 62.806660sec

                          user     system      total        real
#each_byte           61.106660   0.336667  61.443327 ( 61.435222)

On MRI however you get this:

Rehearsal --------------------------------------------------------
#each_byte             5.350000   0.030000   5.380000 (  5.376409)
----------------------------------------------- total: 5.380000sec

                           user     system      total        real
#each_byte             5.420000   0.010000   5.430000 (  5.428334)

Sadly the profiler is currently broken so I can't run the benchmark with that enabled.

@undying

Also found an issue with network IO.

require 'fiber'
require 'eventmachine'
require 'em-http-request'

def async(url)
  f = Fiber.current
  http = EM::HttpRequest.new(url).get
  http.errback { 
    f.resume
    EM.stop
  }
  http.callback {
    f.resume
    EM.stop
  }
  Fiber.yield
end

EM.epoll
EM.run {
  Fiber.new {
    async('http://www.yandex.ru')
  }.resume
}
kron@kron /tmp % strace -c ruby r.rb                                                                                                                                                   23:32:59
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 27.03    0.000010           0        58           mprotect
 24.32    0.000009           0      1074       758 open
 16.22    0.000006           0       507       278 stat
 16.22    0.000006           0       811           lstat
 16.22    0.000006           0       161       157 ioctl
  0.00    0.000000           0       502        23 read
  0.00    0.000000           0         1           write
  0.00    0.000000           0       324         1 close
  0.00    0.000000           0       701           fstat
  0.00    0.000000           0         2           poll
  0.00    0.000000           0        57           lseek
  0.00    0.000000           0        90           mmap
  0.00    0.000000           0        16           munmap
  0.00    0.000000           0        93           brk
  0.00    0.000000           0        16           rt_sigaction
  0.00    0.000000           0        28           rt_sigprocmask
  0.00    0.000000           0         1           writev
  0.00    0.000000           0         1         1 access
  0.00    0.000000           0         1           pipe
  0.00    0.000000           0         2           socket
  0.00    0.000000           0         2         1 connect
  0.00    0.000000           0         1           sendto
  0.00    0.000000           0         1           recvfrom
  0.00    0.000000           0         2         1 shutdown
  0.00    0.000000           0         1           getpeername
  0.00    0.000000           0         2           setsockopt
  0.00    0.000000           0         2           getsockopt
  0.00    0.000000           0         1           clone
  0.00    0.000000           0         1           execve
  0.00    0.000000           0         2           uname
  0.00    0.000000           0       289           fcntl
  0.00    0.000000           0         4           getdents
  0.00    0.000000           0         1           getcwd
  0.00    0.000000           0         6           getrlimit
  0.00    0.000000           0       101           getuid
  0.00    0.000000           0       101           getgid
  0.00    0.000000           0       166           geteuid
  0.00    0.000000           0       102           getegid
  0.00    0.000000           0         1           sigaltstack
  0.00    0.000000           0         1           arch_prctl
  0.00    0.000000           0         7         1 futex
  0.00    0.000000           0         5           sched_getaffinity
  0.00    0.000000           0         1           epoll_create
  0.00    0.000000           0         1           set_tid_address
  0.00    0.000000           0         1           clock_gettime
  0.00    0.000000           0        26           epoll_wait
  0.00    0.000000           0         7           epoll_ctl
  0.00    0.000000           0         5         3 openat
  0.00    0.000000           0        26           ppoll
  0.00    0.000000           0         1           set_robust_list
  0.00    0.000000           0         2           pipe2
------ ----------- ----------- --------- --------- ----------------
100.00    0.000037                  5315      1224 total
kron@kron /tmp % strace -c rbx r.rb                                                                                                                                                    23:44:17
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 59.00    0.001000         500         2           poll
 29.09    0.000493          33        15           munmap
  5.78    0.000098           0      7798      7047 stat
  3.30    0.000056           0      5497           lstat
  0.83    0.000014           0      1804           getdents
  0.53    0.000009           0      1330        19 read
  0.47    0.000008           0       635         5 open
  0.41    0.000007           0       902           openat
  0.35    0.000006           0      1538         1 close
  0.24    0.000004           0       891           fcntl
  0.00    0.000000           0        35           fstat
  0.00    0.000000           0         1           lseek
  0.00    0.000000           0        86           mmap
  0.00    0.000000           0       323           mprotect
  0.00    0.000000           0        71           brk
  0.00    0.000000           0        28           rt_sigaction
  0.00    0.000000           0         1           rt_sigprocmask
  0.00    0.000000           0         3           ioctl
  0.00    0.000000           0         1           writev
  0.00    0.000000           0         1         1 access
  0.00    0.000000           0         1           pipe
  0.00    0.000000           0        22           select
  0.00    0.000000           0         2           socket
  0.00    0.000000           0         2         1 connect
  0.00    0.000000           0         1           sendto
  0.00    0.000000           0         1           recvfrom
  0.00    0.000000           0         2         1 shutdown
  0.00    0.000000           0         1           getpeername
  0.00    0.000000           0         2           setsockopt
  0.00    0.000000           0         2           getsockopt
  0.00    0.000000           0         5           clone
  0.00    0.000000           0         1           execve
  0.00    0.000000           0         3           uname
  0.00    0.000000           0         2           getcwd
  0.00    0.000000           0         2           getrlimit
  0.00    0.000000           0       417           geteuid
  0.00    0.000000           0         1           arch_prctl
  0.00    0.000000           0       203         4 futex
  0.00    0.000000           0         1           epoll_create
  0.00    0.000000           0         1           set_tid_address
  0.00    0.000000           0        22           epoll_wait
  0.00    0.000000           0         7           epoll_ctl
  0.00    0.000000           0         1           set_robust_list
------ ----------- ----------- --------- --------- ----------------
100.00    0.001695                 21664      7079 total
kron@kron ~ % ruby --version
ruby 2.0.0p353 (2013-11-22 revision 43784) [x86_64-linux]
kron@kron ~ % rbx --version
rubinius 2.2.5.n50 (2.1.0 56d2edb4 2014-02-19 JI) [x86_64-linux-gnu]
kron@kron ~ % uname -a
Linux kron 3.13.2-gentoo #1 SMP Fri Feb 14 15:14:06 MSK 2014 x86_64 Intel(R) Core(TM) i7-2677M CPU @ 1.80GHz GenuineIntel GNU/Linux
@YorickPeterse YorickPeterse added a commit to rubinius/rubinius-benchmark that referenced this issue Mar 19, 2014
@YorickPeterse YorickPeterse Benchmark for File#each_byte.
This benchmark was added for rubinius/rubinius#2166
to make it easier to run and keep track of. It does also benchmark File#rewind
but I'm not sure if there's a way around that. The alternative would be to use
File.open() for each iteration but then we'd end up benchmarking even more.
97aa8bd
@tak1n
Rubinius member

Running @ChristianSi script on a 6MB file.

Rubinius 2.5.5:

rubinius-bugs ➤ ruby -v                                                                                                                                                                                git:master*
rubinius 2.5.5.c16 (2.1.0 621953b6 2015-05-21 3.5.0 JI) [x86_64-linux-gnu]
rubinius-bugs ➤ time ./file_io.rb large.txt                                                                                                                                                            git:master*
14949 lines
./file_io.rb large.txt  0,27s user 0,03s system 102% cpu 0,295 total

Ruby 2.2.2:

rubinius-bugs ➤ ruby -v                                                                                                                                                                                git:master*
ruby 2.2.2p95 (2015-04-13 revision 50295) [x86_64-linux]
rubinius-bugs ➤ time ./file_io.rb large.txt                                                                                                                                                            git:master*
14949 lines
./file_io.rb large.txt  0,02s user 0,01s system 99% cpu 0,030 total

Also tried a script with IO#each_byte:

# require 'rubinius/profiler' if RUBY_ENGINE == 'rbx'

large_file  = File.join(File.dirname(__FILE__), 'large.txt')

large_handle  = File.open(large_file, 'r')

# if RUBY_ENGINE == 'rbx'
#   profiler = Rubinius::Profiler::Instrumenter.new
#   profiler.start
# end

large_handle.each_byte {}
large_handle.rewind

# if RUBY_ENGINE == 'rbx'
#   profiler.stop
#   profiler.show
# end

Gives me following results:

Rubinius 2.5.5

rubinius-bugs ➤ time rbx io_performance.rb                                                                                                                                                             git:master*
rbx io_performance.rb  1,32s user 0,02s system 114% cpu 1,174 total

Ruby 2.2.2

rubinius-bugs ➤ time ruby io_performance.rb                                                                                                                                                            git:master*
ruby io_performance.rb  0,24s user 0,00s system 100% cpu 0,248 total

And finally ran the benchmark in rubinius/rubinius-benchmark
https://github.com/rubinius/rubinius-benchmark/blob/master/core/file/bench_each_byte.rb

Rubinius 2.5.5:

rubinius-bugs ➤ rbx bench_io.rb                                                                                                                                                                        git:master*
Calculating -------------------------------------
File.read a small file
                        87.000  i/100ms
File.read a medium file
                        10.000  i/100ms
File.read a large file
                         1.000  i/100ms
-------------------------------------------------
File.read a small file
                        884.497  (± 2.4%) i/s -      4.437k
File.read a medium file
                         99.991  (± 2.0%) i/s -    500.000 
File.read a large file
                          1.979  (± 0.0%) i/s -     10.000 

Ruby 2.2.2:

rubinius-bugs ➤ ruby bench_io.rb                                                                                                                                                                       git:master*
Calculating -------------------------------------
File.read a small file
                       199.000  i/100ms
File.read a medium file
                        22.000  i/100ms
File.read a large file
                         1.000  i/100ms
-------------------------------------------------
File.read a small file
                          1.996k (± 0.8%) i/s -     10.149k
File.read a medium file
                        223.381  (± 0.9%) i/s -      1.122k
File.read a large file
                          4.451  (± 0.0%) i/s -     23.000 
@brixen
Rubinius member

There are a number of IO related performance optimizations possible, but they should be approached with specific PRs. There is an ffi-io branch to replace the C++ primitives with FFI-based methods. That would be a good start.

@brixen brixen closed this Dec 28, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment