CSV is much slower than MRI #985

Closed
aubergene opened this Issue Jun 8, 2011 · 24 comments

Comments

Projects
None yet
@aubergene

** edited to require fastercsv in 1.8 as I forgot the standard CSV library in 1.8 is very slow

begin
  require "faster_csv"
  FasterCSV.build_csv_interface
rescue LoadError
  require "csv"
end

CSV.foreach('thirty-megs.csv') do |row|
end

ruby 1.8.7 (2011-02-18 patchlevel 334) [i686-darwin10.7.0]

real 0m20.300s
user 0m20.118s
sys 0m0.117s

ruby 1.9.2p180 (2011-02-18 revision 30909) [x86_64-darwin10.7.0]

real 0m12.281s
user 0m11.952s
sys 0m0.159s

rubinius 2.0.0dev (1.8.7 23af0f7 yyyy-mm-dd JI) [x86_64-apple-darwin10.7.0]

real 1m39.303s
user 1m39.536s
sys 0m0.713s

@headius

This comment has been minimized.

Show comment Hide comment
@headius

headius Jun 8, 2011

Contributor

Do you have a link to this "thirty-megs.csv" file?

Contributor

headius commented Jun 8, 2011

Do you have a link to this "thirty-megs.csv" file?

@aubergene

This comment has been minimized.

Show comment Hide comment
@aubergene

aubergene Jun 9, 2011

Yes, sorry, it is 1 millions lines like this

Source,Target,Type,Weight
193690,208136,Undirected,0.2081897050
193690,206471,Undirected,0.2890272079
193690,209291,Undirected,0.2465690519

Yes, sorry, it is 1 millions lines like this

Source,Target,Type,Weight
193690,208136,Undirected,0.2081897050
193690,206471,Undirected,0.2890272079
193690,209291,Undirected,0.2465690519
@headius

This comment has been minimized.

Show comment Hide comment
@headius

headius Jun 9, 2011

Contributor

I think this may mostly be a factor of the 1.9 library for csv being so much faster than the 1.8 version, which is the version that ships in Rubinius currently.

I ran your benchmark with JRuby, since we have both modes in place today. The 1.9 mode csv.rb is quite a bit faster than 1.8's version:

~/projects/jruby ➔ time jruby --1.9 -X+C csv_bench.rb 
real    0m20.552s
user    0m27.023s
sys 0m1.227s

~/projects/jruby ➔ time ruby1.9 csv_bench.rb 
real    0m16.079s
user    0m14.569s
sys 0m0.193s

~/projects/jruby ➔ time jruby -X+C csv_bench.rb 
real    0m53.274s
user    0m53.407s
sys 0m1.242s

So the comparison of Rubinius in 1.8 mode to Ruby 1.9 is probably not valid here. I attempted to try Rubinius with the 1.9 csv.rb, but it uses 1.9 syntax and I don't have a 1.9-enabled Rubinius build handy.

Contributor

headius commented Jun 9, 2011

I think this may mostly be a factor of the 1.9 library for csv being so much faster than the 1.8 version, which is the version that ships in Rubinius currently.

I ran your benchmark with JRuby, since we have both modes in place today. The 1.9 mode csv.rb is quite a bit faster than 1.8's version:

~/projects/jruby ➔ time jruby --1.9 -X+C csv_bench.rb 
real    0m20.552s
user    0m27.023s
sys 0m1.227s

~/projects/jruby ➔ time ruby1.9 csv_bench.rb 
real    0m16.079s
user    0m14.569s
sys 0m0.193s

~/projects/jruby ➔ time jruby -X+C csv_bench.rb 
real    0m53.274s
user    0m53.407s
sys 0m1.242s

So the comparison of Rubinius in 1.8 mode to Ruby 1.9 is probably not valid here. I attempted to try Rubinius with the 1.9 csv.rb, but it uses 1.9 syntax and I don't have a 1.9-enabled Rubinius build handy.

@aubergene

This comment has been minimized.

Show comment Hide comment
@aubergene

aubergene Jun 9, 2011

Oh, yes I forgot it's 1.8 I re-ran using fastercsv-1.5.4 which I understand is very similar to CSV in 1.9, it's quicker than before, but still slow.

ruby-1.8.7-p334
real    0m19.323s
user    0m19.154s
sys     0m0.114s

rbx-2.0.0pre
real    1m37.845s
user    1m38.044s
sys     0m0.685s

Oh, yes I forgot it's 1.8 I re-ran using fastercsv-1.5.4 which I understand is very similar to CSV in 1.9, it's quicker than before, but still slow.

ruby-1.8.7-p334
real    0m19.323s
user    0m19.154s
sys     0m0.114s

rbx-2.0.0pre
real    1m37.845s
user    1m38.044s
sys     0m0.685s
@headius

This comment has been minimized.

Show comment Hide comment
@headius

headius Jun 9, 2011

Contributor

Ok...I'll shut up then. At least the comparison is a bit more valid now. Thanks for re-running!

Contributor

headius commented Jun 9, 2011

Ok...I'll shut up then. At least the comparison is a bit more valid now. Thanks for re-running!

@pgrzegorczyk

This comment has been minimized.

Show comment Hide comment
@pgrzegorczyk

pgrzegorczyk Jun 18, 2011

Contributor

I made a patch improving performance when indexing String object. More info in pull request #1027.

31MB file times:

  • ruby 1.9.2-p180:
    ruby test.rb 8,96s user 0,06s system 99% cpu 9,028 total
  • ruby 1.8.7-p334:
    ruby test.rb 197,54s user 0,52s system 99% cpu 3:18,26 total
  • rubinius 1.2.4dev (1.8.7 5cb7a7d yyyy-mm-dd JI):
    ruby test.rb 168,26s user 0,60s system 100% cpu 2:48,36 total
  • rubinius-patched:
    ruby test.rb 116,09s user 0,40s system 100% cpu 1:55,85 total
Contributor

pgrzegorczyk commented Jun 18, 2011

I made a patch improving performance when indexing String object. More info in pull request #1027.

31MB file times:

  • ruby 1.9.2-p180:
    ruby test.rb 8,96s user 0,06s system 99% cpu 9,028 total
  • ruby 1.8.7-p334:
    ruby test.rb 197,54s user 0,52s system 99% cpu 3:18,26 total
  • rubinius 1.2.4dev (1.8.7 5cb7a7d yyyy-mm-dd JI):
    ruby test.rb 168,26s user 0,60s system 100% cpu 2:48,36 total
  • rubinius-patched:
    ruby test.rb 116,09s user 0,40s system 100% cpu 1:55,85 total
@hosiawak

This comment has been minimized.

Show comment Hide comment
@hosiawak

hosiawak Jun 19, 2011

Member

83341ab and e1b6562 improve this benchmark's perf by about 50% on my machine, please confirm. Thanks.

Member

hosiawak commented Jun 19, 2011

83341ab and e1b6562 improve this benchmark's perf by about 50% on my machine, please confirm. Thanks.

@aubergene

This comment has been minimized.

Show comment Hide comment
@aubergene

aubergene Jun 20, 2011

Forgive my ignorance, I looked around for how to apply this patch to Rubinius but wasn't sure what I should do or how to do it? I'm happy to follow instructions.

Also I updated the test so it will include FasterCsv for 1.8 and CSV in 1.9 as CSV is very slow in 1.8 anyway.

Forgive my ignorance, I looked around for how to apply this patch to Rubinius but wasn't sure what I should do or how to do it? I'm happy to follow instructions.

Also I updated the test so it will include FasterCsv for 1.8 and CSV in 1.9 as CSV is very slow in 1.8 anyway.

@hosiawak

This comment has been minimized.

Show comment Hide comment
@hosiawak

hosiawak Jun 20, 2011

Member

If you're using rvm:

rvm install rbx-head

If not get the latest head with:

git clone https://github.com/rubinius/rubinius.git
cd rubinius
rake

to build it, then ./bin/rbx path_to_benchmark.rb to run it.

Member

hosiawak commented Jun 20, 2011

If you're using rvm:

rvm install rbx-head

If not get the latest head with:

git clone https://github.com/rubinius/rubinius.git
cd rubinius
rake

to build it, then ./bin/rbx path_to_benchmark.rb to run it.

@aubergene

This comment has been minimized.

Show comment Hide comment
@aubergene

aubergene Jun 22, 2011

# rubinius 1.2.4dev (1.8.7 7ae451a1 yyyy-mm-dd JI) [x86_64-apple-darwin10.7.0]
real    1m39.720s
user    1m39.645s
sys 0m0.506s
# rubinius 1.2.4dev (1.8.7 7ae451a1 yyyy-mm-dd JI) [x86_64-apple-darwin10.7.0]
real    1m39.720s
user    1m39.645s
sys 0m0.506s
@hosiawak

This comment has been minimized.

Show comment Hide comment
@hosiawak

hosiawak Jun 23, 2011

Member

aubergene, you're using a build from June 8 7ae451a . Please follow the documentation to build the latest master and try that. You should see 50% speedup (about 40s). still slower than mri but we're getting there.

Member

hosiawak commented Jun 23, 2011

aubergene, you're using a build from June 8 7ae451a . Please follow the documentation to build the latest master and try that. You should see 50% speedup (about 40s). still slower than mri but we're getting there.

@aubergene

This comment has been minimized.

Show comment Hide comment
@aubergene

aubergene Jun 23, 2011

Thanks hosiawak, I ran it twice and got these results, I got results about 30% faster which is still good.

rubinius 1.2.4dev (1.8.7 83b015818124294a769d9ffb39736d9c52fb00b9 yyyy-mm-dd JI) [x86_64-apple-darwin10.7.0]
real    1m10.265s
user    1m11.112s
sys 0m0.617s

Thanks hosiawak, I ran it twice and got these results, I got results about 30% faster which is still good.

rubinius 1.2.4dev (1.8.7 83b015818124294a769d9ffb39736d9c52fb00b9 yyyy-mm-dd JI) [x86_64-apple-darwin10.7.0]
real    1m10.265s
user    1m11.112s
sys 0m0.617s
@frodsan

This comment has been minimized.

Show comment Hide comment
@frodsan

frodsan Nov 17, 2012

Contributor

@aubergene Is this still an issue?

Contributor

frodsan commented Nov 17, 2012

@aubergene Is this still an issue?

@aubergene

This comment has been minimized.

Show comment Hide comment
@aubergene

aubergene Nov 19, 2012

@frodsan just ran the tests again, different machine, but seems like it's still similar to the previous runs, RBX takes approximate twice the time.

ruby 1.9.3p194 (2012-04-20 revision 35410) [x86_64-darwin11.3.0]
real    0m13.149s
user    0m12.780s
sys 0m0.087s
rubinius 2.0.0rc1 (1.8.7 75596a44 2012-11-02 JI) [x86_64-apple-darwin11.4.2]
real    0m37.987s
user    0m38.263s
sys 0m0.269s

@frodsan just ran the tests again, different machine, but seems like it's still similar to the previous runs, RBX takes approximate twice the time.

ruby 1.9.3p194 (2012-04-20 revision 35410) [x86_64-darwin11.3.0]
real    0m13.149s
user    0m12.780s
sys 0m0.087s
rubinius 2.0.0rc1 (1.8.7 75596a44 2012-11-02 JI) [x86_64-apple-darwin11.4.2]
real    0m37.987s
user    0m38.263s
sys 0m0.269s
@MSNexploder

This comment has been minimized.

Show comment Hide comment
@MSNexploder

MSNexploder Apr 16, 2013

Looks like this is still an issue.

rubinius 2.0.0.rc1 (1.9.3 96db2d8e yyyy-mm-dd JI) [x86_64-apple-darwin12.3.0]
33,09s user 0,17s system 106% cpu 31,336 total

rubinius 2.0.0.rc1 (2.0.0 96db2d8e yyyy-mm-dd JI) [x86_64-apple-darwin12.3.0]
32,87s user 0,16s system 105% cpu 31,283 total

ruby 1.9.3p374 (2013-01-15 revision 38858) [x86_64-darwin12.2.0]
10,42s user 0,04s system 99% cpu 10,462 total

ruby 2.0.0p0 (2013-02-24 revision 39474) [x86_64-darwin12.2.0]
10,09s user 0,04s system 99% cpu 10,147 total

I did some simple profiling: https://gist.github.com/MSNexploder/5398947
Around 40% of the time is spent in Rubinius::Splitter.split

Looks like this is still an issue.

rubinius 2.0.0.rc1 (1.9.3 96db2d8e yyyy-mm-dd JI) [x86_64-apple-darwin12.3.0]
33,09s user 0,17s system 106% cpu 31,336 total

rubinius 2.0.0.rc1 (2.0.0 96db2d8e yyyy-mm-dd JI) [x86_64-apple-darwin12.3.0]
32,87s user 0,16s system 105% cpu 31,283 total

ruby 1.9.3p374 (2013-01-15 revision 38858) [x86_64-darwin12.2.0]
10,42s user 0,04s system 99% cpu 10,462 total

ruby 2.0.0p0 (2013-02-24 revision 39474) [x86_64-darwin12.2.0]
10,09s user 0,04s system 99% cpu 10,147 total

I did some simple profiling: https://gist.github.com/MSNexploder/5398947
Around 40% of the time is spent in Rubinius::Splitter.split

@leocassarani

This comment has been minimized.

Show comment Hide comment
@leocassarani

leocassarani Apr 17, 2013

Member

01593ae and 04ca45e should have improved this by making Rubinius::Splitter.split considerably faster in the specific cases that the CSV library is using.

Running a test similar to @aubergene's shows a distinct improvement:

rubinius 2.0.0.rc1 (1.9.3 04ca45e0 yyyy-mm-dd JI) [x86_64-apple-darwin12.3.0]
13.40s user 0.13s system 109% cpu 12.353 total

ruby 1.9.3p374 (2013-01-15 revision 38858) [x86_64-darwin12.2.0]
8.69s user 0.12s system 99% cpu 8.819 total

@MSNexploder could you try running your benchmarks against master and report back?

Member

leocassarani commented Apr 17, 2013

01593ae and 04ca45e should have improved this by making Rubinius::Splitter.split considerably faster in the specific cases that the CSV library is using.

Running a test similar to @aubergene's shows a distinct improvement:

rubinius 2.0.0.rc1 (1.9.3 04ca45e0 yyyy-mm-dd JI) [x86_64-apple-darwin12.3.0]
13.40s user 0.13s system 109% cpu 12.353 total

ruby 1.9.3p374 (2013-01-15 revision 38858) [x86_64-darwin12.2.0]
8.69s user 0.12s system 99% cpu 8.819 total

@MSNexploder could you try running your benchmarks against master and report back?

@MSNexploder

This comment has been minimized.

Show comment Hide comment
@MSNexploder

MSNexploder Apr 17, 2013

Way better 👏

rubinius 2.0.0.rc1 (1.9.3 04ca45e0 yyyy-mm-dd JI) [x86_64-apple-darwin12.3.0]
17,11s user 0,11s system 108% cpu 15,818 total

rubinius 2.0.0.rc1 (2.0.0 04ca45e0 yyyy-mm-dd JI) [x86_64-apple-darwin12.3.0]
17,27s user 0,11s system 108% cpu 15,978 total

ruby 1.9.3p374 (2013-01-15 revision 38858) [x86_64-darwin12.2.0]
10,25s user 0,04s system 99% cpu 10,293 total

ruby 2.0.0p0 (2013-02-24 revision 39474) [x86_64-darwin12.2.0]
10,21s user 0,04s system 99% cpu 10,265 total

I also updated my profiling gist: https://gist.github.com/MSNexploder/5398947
Looks like IO performance is the next bottleneck - Similar to #2166

Way better 👏

rubinius 2.0.0.rc1 (1.9.3 04ca45e0 yyyy-mm-dd JI) [x86_64-apple-darwin12.3.0]
17,11s user 0,11s system 108% cpu 15,818 total

rubinius 2.0.0.rc1 (2.0.0 04ca45e0 yyyy-mm-dd JI) [x86_64-apple-darwin12.3.0]
17,27s user 0,11s system 108% cpu 15,978 total

ruby 1.9.3p374 (2013-01-15 revision 38858) [x86_64-darwin12.2.0]
10,25s user 0,04s system 99% cpu 10,293 total

ruby 2.0.0p0 (2013-02-24 revision 39474) [x86_64-darwin12.2.0]
10,21s user 0,04s system 99% cpu 10,265 total

I also updated my profiling gist: https://gist.github.com/MSNexploder/5398947
Looks like IO performance is the next bottleneck - Similar to #2166

@ghost

This comment has been minimized.

Show comment Hide comment
@ghost

ghost Apr 17, 2013

Wow, what an improvement from two years ago. Great work!

ghost commented Apr 17, 2013

Wow, what an improvement from two years ago. Great work!

@YorickPeterse

This comment has been minimized.

Show comment Hide comment
@YorickPeterse

YorickPeterse Oct 28, 2013

Member

Performance update:

Consider the following script:

require 'csv'
require 'benchmark'

timings = []
amount  = 5

amount.times do
  timing = Benchmark.measure { CSV.foreach('/tmp/rubinius_985.csv') { |row| } }
  timings << timing.real
end

min = timings.min.round(3)
max = timings.max.round(3)
avg = (timings.inject(:+) / timings.length).round(3)

puts "Min: #{min} sec"
puts "Max: #{max} sec"
puts "Avg: #{avg} sec"

And the following 15MB input file: http://downloads.yorickpeterse.com/files/rubinius_985.csv (MD5 is 05fc37e8026a09cea0519377dcfdcf58 for the paranoid).

Running the above script using time ruby /tmp/test.rb yields the following results on MRI 2.0 p247:

Min: 3.12 sec
Max: 3.144 sec
Avg: 3.13 sec

real    0m15.718s
user    0m15.680s
sys     0m0.043s

And on Rbx 2.1:

Min: 5.734 sec
Max: 6.808 sec
Avg: 6.08 sec

real    0m33.564s
user    0m33.123s
sys     0m0.307s

So it's no longer 8 times as slow, only a "mere" 2 times.

Member

YorickPeterse commented Oct 28, 2013

Performance update:

Consider the following script:

require 'csv'
require 'benchmark'

timings = []
amount  = 5

amount.times do
  timing = Benchmark.measure { CSV.foreach('/tmp/rubinius_985.csv') { |row| } }
  timings << timing.real
end

min = timings.min.round(3)
max = timings.max.round(3)
avg = (timings.inject(:+) / timings.length).round(3)

puts "Min: #{min} sec"
puts "Max: #{max} sec"
puts "Avg: #{avg} sec"

And the following 15MB input file: http://downloads.yorickpeterse.com/files/rubinius_985.csv (MD5 is 05fc37e8026a09cea0519377dcfdcf58 for the paranoid).

Running the above script using time ruby /tmp/test.rb yields the following results on MRI 2.0 p247:

Min: 3.12 sec
Max: 3.144 sec
Avg: 3.13 sec

real    0m15.718s
user    0m15.680s
sys     0m0.043s

And on Rbx 2.1:

Min: 5.734 sec
Max: 6.808 sec
Avg: 6.08 sec

real    0m33.564s
user    0m33.123s
sys     0m0.307s

So it's no longer 8 times as slow, only a "mere" 2 times.

@jc00ke

This comment has been minimized.

Show comment Hide comment
@jc00ke

jc00ke May 16, 2014

Member

@YorickPeterse your CSV is 404'ing so I uploaded a new one here.

MRI 2.1.2p95

$> time ruby csv.rb sample.csv
Min: 2.55 sec
Max: 2.585 sec
Avg: 2.57 sec
ruby csv.rb sample.csv  12.84s user 0.05s system 99% cpu 12.903 total

rubinius 2.2.6.n135 (2.1.0 06d6f25c 2014-05-15 JI) [x86_64-linux-gnu]

$> time ./bin/rbx csv.rb sample.csv
Min: 3.791 sec
Max: 4.758 sec
Avg: 4.007 sec
./bin/rbx csv.rb sample.csv  22.36s user 0.24s system 110% cpu 20.539 total

1.6x slower nowadays.

What's our goal for this issue? Match or beat MRI?

Member

jc00ke commented May 16, 2014

@YorickPeterse your CSV is 404'ing so I uploaded a new one here.

MRI 2.1.2p95

$> time ruby csv.rb sample.csv
Min: 2.55 sec
Max: 2.585 sec
Avg: 2.57 sec
ruby csv.rb sample.csv  12.84s user 0.05s system 99% cpu 12.903 total

rubinius 2.2.6.n135 (2.1.0 06d6f25c 2014-05-15 JI) [x86_64-linux-gnu]

$> time ./bin/rbx csv.rb sample.csv
Min: 3.791 sec
Max: 4.758 sec
Avg: 4.007 sec
./bin/rbx csv.rb sample.csv  22.36s user 0.24s system 110% cpu 20.539 total

1.6x slower nowadays.

What's our goal for this issue? Match or beat MRI?

@tmornini

This comment has been minimized.

Show comment Hide comment
@tmornini

tmornini May 18, 2014

Contributor

@jc00ke I've always appreciated JRuby's performance stance: if it's slower than MRI in any way (other than startup time), it's a bug.

There was a time when this didn't make sense for Rubinius, because just about everything was far slower (as clearly seen above). Perhaps it's time for Rubinius to adopt this policy?

Contributor

tmornini commented May 18, 2014

@jc00ke I've always appreciated JRuby's performance stance: if it's slower than MRI in any way (other than startup time), it's a bug.

There was a time when this didn't make sense for Rubinius, because just about everything was far slower (as clearly seen above). Perhaps it's time for Rubinius to adopt this policy?

@jc00ke

This comment has been minimized.

Show comment Hide comment
@jc00ke

jc00ke May 18, 2014

Member

@tmornini That would work for me. A policy of sorts would be helpful.

Member

jc00ke commented May 18, 2014

@tmornini That would work for me. A policy of sorts would be helpful.

@codeforkjeff

This comment has been minimized.

Show comment Hide comment
@codeforkjeff

codeforkjeff Jan 17, 2015

I'm seeing csv in MRI as 6x's faster than Rubinius.

This is my test script:

# cat read_csv.rb

require 'csv'

t_start = Time.now.to_f

count = 0
# 10,000 row file, ~4.5MB
# available at http://codefork.com/schoenberg_small.csv
CSV.foreach("schoenberg_small.csv") do |row|
  count += 1
end

elapsed = Time.now.to_f - t_start
rate = count / elapsed

puts "Finished in #{elapsed}s, at rate of #{rate} rows/sec"

Here's the output:

# ruby -v
ruby 2.1.5p273 (2014-11-13) [x86_64-linux-gnu]

# ruby read_csv.rb
Finished in 1.3033735752105713s, at rate of 7572.656211328679 rows/sec

# rbx -v
rubinius 2.4.1 (2.1.0 b332f133 2014-12-04 3.5.0 JI) [x86_64-linux-gnu]

# rbx read_csv.rb
Finished in 7.770043134689331s, at rate of 1270.2632184801419 rows/sec

Here's the profiler output: https://gist.github.com/codeforkjeff/c9fef79b5b091a911018

String#count is taking 36% of the total time, which seems high?

I'm seeing csv in MRI as 6x's faster than Rubinius.

This is my test script:

# cat read_csv.rb

require 'csv'

t_start = Time.now.to_f

count = 0
# 10,000 row file, ~4.5MB
# available at http://codefork.com/schoenberg_small.csv
CSV.foreach("schoenberg_small.csv") do |row|
  count += 1
end

elapsed = Time.now.to_f - t_start
rate = count / elapsed

puts "Finished in #{elapsed}s, at rate of #{rate} rows/sec"

Here's the output:

# ruby -v
ruby 2.1.5p273 (2014-11-13) [x86_64-linux-gnu]

# ruby read_csv.rb
Finished in 1.3033735752105713s, at rate of 7572.656211328679 rows/sec

# rbx -v
rubinius 2.4.1 (2.1.0 b332f133 2014-12-04 3.5.0 JI) [x86_64-linux-gnu]

# rbx read_csv.rb
Finished in 7.770043134689331s, at rate of 1270.2632184801419 rows/sec

Here's the profiler output: https://gist.github.com/codeforkjeff/c9fef79b5b091a911018

String#count is taking 36% of the total time, which seems high?

@chuckremes

This comment has been minimized.

Show comment Hide comment
@chuckremes

chuckremes Jan 17, 2015

Member

Might be an easy opportunity for some performance tuning. Want to try?

On Jan 16, 2015, at 9:09 PM, Jeff Chiu notifications@github.com wrote:

I'm seeing csv in MRI as 6x's faster than Rubinius.

This is my test script:

cat read_csv.rb

require 'csv'

t_start = Time.now.to_f

count = 0

10,000 row file, ~4.5MB

available at http://codefork.com/schoenberg_small.csv

CSV.foreach("schoenberg_small.csv") do |row|
count += 1
end

elapsed = Time.now.to_f - t_start
rate = count / elapsed

puts "Finished in #{elapsed}s, at rate of #{rate} rows/sec"
Here's the output:

ruby -v

ruby 2.1.5p273 (2014-11-13) [x86_64-linux-gnu]

ruby read_csv.rb

Finished in 1.3033735752105713s, at rate of 7572.656211328679 rows/sec

rbx -v

rubinius 2.4.1 (2.1.0 b332f13 2014-12-04 3.5.0 JI) [x86_64-linux-gnu]

rbx read_csv.rb

Finished in 7.770043134689331s, at rate of 1270.2632184801419 rows/sec
Here's the profiler output: https://gist.github.com/codeforkjeff/c9fef79b5b091a911018 https://gist.github.com/codeforkjeff/c9fef79b5b091a911018
String#count is taking 36% of the total time, which seems high?


Reply to this email directly or view it on GitHub #985 (comment).

Member

chuckremes commented Jan 17, 2015

Might be an easy opportunity for some performance tuning. Want to try?

On Jan 16, 2015, at 9:09 PM, Jeff Chiu notifications@github.com wrote:

I'm seeing csv in MRI as 6x's faster than Rubinius.

This is my test script:

cat read_csv.rb

require 'csv'

t_start = Time.now.to_f

count = 0

10,000 row file, ~4.5MB

available at http://codefork.com/schoenberg_small.csv

CSV.foreach("schoenberg_small.csv") do |row|
count += 1
end

elapsed = Time.now.to_f - t_start
rate = count / elapsed

puts "Finished in #{elapsed}s, at rate of #{rate} rows/sec"
Here's the output:

ruby -v

ruby 2.1.5p273 (2014-11-13) [x86_64-linux-gnu]

ruby read_csv.rb

Finished in 1.3033735752105713s, at rate of 7572.656211328679 rows/sec

rbx -v

rubinius 2.4.1 (2.1.0 b332f13 2014-12-04 3.5.0 JI) [x86_64-linux-gnu]

rbx read_csv.rb

Finished in 7.770043134689331s, at rate of 1270.2632184801419 rows/sec
Here's the profiler output: https://gist.github.com/codeforkjeff/c9fef79b5b091a911018 https://gist.github.com/codeforkjeff/c9fef79b5b091a911018
String#count is taking 36% of the total time, which seems high?


Reply to this email directly or view it on GitHub #985 (comment).

@aubergene aubergene closed this Nov 19, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment