Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Axlsx performance #61

Closed
jurriaan opened this Issue Mar 23, 2012 · 18 comments

Comments

Projects
None yet
4 participants
Collaborator

jurriaan commented Mar 23, 2012

Just did a quick benchmark to see how well axlsx performs..
Plain csv is 40 times faster than standard axlsx.. I think there's room for improvement ;)
Streaming is also faster than Package#serialize?

Results:

                           user     system      total        real
axlsx_noautowidth          7.730000   0.110000   7.840000 (  8.188405)
axlsx                      12.640000   0.260000  12.900000 ( 13.422525)
axlsx_shared               38.700000   0.300000  39.000000 ( 39.085976)
axlsx_stream               11.380000   0.180000  11.560000 ( 11.591120)
csv                        0.270000   0.020000   0.290000 (  0.336972)

Code:

#!/usr/bin/env ruby -s
# -*- coding: utf-8 -*-
$:.unshift "#{File.dirname(__FILE__)}/../lib"
require 'axlsx'
require 'csv'

require 'benchmark'
row = []
input = (32..126).to_a.pack('U*').chars.to_a
20.times { row << input.shuffle.join}
times = 1000
Benchmark.bm(100) do |x|
  x.report('axlsx_noautowidth') {
    p = Axlsx::Package.new
    wb = p.workbook

    #A Simple Workbook

    wb.add_worksheet do |sheet|
      times.times do 
        sheet << row
      end
    end
    p.use_autowidth = false
    p.serialize("example.xlsx")
  }
  x.report('axlsx') {
    p = Axlsx::Package.new
    wb = p.workbook

    #A Simple Workbook

    wb.add_worksheet do |sheet|
      times.times do 
        sheet << row
      end
    end
    p.serialize("example.xlsx")
  }

  x.report('axlsx_shared') {
    p = Axlsx::Package.new
    wb = p.workbook

    #A Simple Workbook

    wb.add_worksheet do |sheet|
      times.times do 
        sheet << row
      end
    end
    p.use_shared_strings = true
    p.serialize("example.xlsx")
  }

  x.report('axlsx_stream') {
    p = Axlsx::Package.new
    wb = p.workbook

    #A Simple Workbook

    wb.add_worksheet do |sheet|
      times.times do
        sheet << row
      end
    end

    s = p.to_stream()
    File.open('example_streamed.xlsx', 'w') { |f| f.write(s.read) }
  }
  x.report('csv') {
    CSV.open("example.csv", "wb") do |csv|
      times.times do 
        csv << row
      end
    end
  }
end

@ghost ghost assigned randym Mar 27, 2012

Owner

randym commented Mar 27, 2012

Fast enough?

                              user     system      total        real
axlsx_noautowidth         1.560000   0.030000   1.590000 (  1.717595)
axlsx                     4.360000   0.140000   4.500000 (  5.748329)
axlsx_shared              6.880000   0.160000   7.040000 (  9.325648)
axlsx_stream              4.320000   0.120000   4.440000 (  5.642124)
csv                       0.240000   0.010000   0.250000 (  0.301004)
Collaborator

jurriaan commented Mar 27, 2012

It's much better! ;) I don't understand why stream is faster every time, but the results are too close to say streaming is faster right now ;).
If I've time I'll look how I can improve the performance. 18 times slower than csv is still not very impressive ;)
Maybe it's possible to reduce the time spend on auto width.. I'll take a look at the perftools.rb output :)

Owner

randym commented Mar 27, 2012

Comparing OOXML to CSV is a bit unrealistic I think. They are simply different domains.
I wonder how this looks against something like odf-reports?

That said - If we can get this running something close to "ten times slower" than something as trivial as CSV - well it would be one hell of an accomplishment.

Collaborator

jurriaan commented Mar 27, 2012

That's true, comparing to CSV is somewhat unrealistic.
But, waiting 6 seconds for generating a 1000 row xlsx is still too much imho. ;)
We went from 40x slower to 18x slower. It should be possible to get close to 10x slower.
I think Excel is much faster.

We should also add more XLS(X)/ODF report generators to the benchmark for a better comparison.

Collaborator

jurriaan commented Mar 27, 2012

@randym See commit b6843a2 for some cleanup I did.

Owner

randym commented Mar 28, 2012

After re-writing autowidth to implement TC_COL properly:

                  user     system      total        real
axlsx_noautowidth 0.810000   0.020000   0.830000 (  0.836274)
axlsx             1.430000   0.160000   1.590000 (  1.776305)
axlsx_shared      9.360000   0.160000   9.520000 (  9.662113)
axlsx_stream      1.320000   0.110000   1.430000 (  1.429806)
csv               0.260000   0.020000   0.280000 (  0.296828)

However - this means if you apply styles after creating rows (e.g. cell.sz = 100) the width will not automatically update.

I'd say 5 times slower than CSV is pretty damn fast!
Should we close this for now? Or do we want to keep trying to make this faster?

Collaborator

jurriaan commented Mar 28, 2012

It's ok to close this, but on my system axlsx is still ~15 times slower

@joekain joekain added a commit to joekain/axlsx that referenced this issue Mar 31, 2012

@joekain joekain Build self_hash up from INLINE_STYLES
Iterate over each value in INLINE_STYLES instead of iterating over
each value in instances_values and rejecting unwanted items.  This
version proceses fewer values and runs a little faster.

Issue #61 - Axlsx performance
8190b14
Collaborator

joekain commented Apr 1, 2012

With the change in Pull request #72 I see the following improvement in performance:

master:
                           user     system      total        real
axlsx_noautowidth      1.250000   0.110000   1.360000 (  1.355938)
axlsx                  1.260000   0.120000   1.380000 (  1.383181)
axlsx_shared           9.700000   0.190000   9.890000 (  9.890973)
axlsx_stream           1.080000   0.110000   1.190000 (  1.192197)
csv                    0.200000   0.010000   0.210000 (  0.217332)

                       user     system      total        real
axlsx_noautowidth      1.260000   0.120000   1.380000 (  1.374209)
axlsx                  1.290000   0.120000   1.410000 (  1.422443)
axlsx_shared           9.730000   0.190000   9.920000 (  9.923795)
axlsx_stream           1.080000   0.120000   1.200000 (  1.205190)
csv                    0.200000   0.010000   0.210000 (  0.206190)

                           user     system      total        real
axlsx_noautowidth      1.260000   0.130000   1.390000 (  1.390412)
axlsx                  1.270000   0.120000   1.390000 (  1.391234)
axlsx_shared           9.800000   0.200000  10.000000 (  9.998483)
axlsx_stream           1.090000   0.110000   1.200000 (  1.204623)
csv                    0.180000   0.010000   0.190000 (  0.191816)



Pull request #72
                           user     system      total        real
axlsx_noautowidth      1.250000   0.110000   1.360000 (  1.368082)
axlsx                  1.290000   0.120000   1.410000 (  1.406685)
axlsx_shared           8.480000   0.180000   8.660000 (  8.668626)
axlsx_stream           1.120000   0.110000   1.230000 (  1.238980)
csv                    0.190000   0.010000   0.200000 (  0.195624)

                           user     system      total        real
axlsx_noautowidth      1.250000   0.110000   1.360000 (  1.365485)
axlsx                  1.280000   0.120000   1.400000 (  1.391993)
axlsx_shared           8.550000   0.200000   8.750000 (  8.763224)
axlsx_stream           1.090000   0.120000   1.210000 (  1.206772)
csv                    0.190000   0.010000   0.200000 (  0.208227)

                           user     system      total        real
axlsx_noautowidth      1.310000   0.130000   1.440000 (  1.443642)
axlsx                  1.300000   0.120000   1.420000 (  1.423780)
axlsx_shared           8.490000   0.190000   8.680000 (  8.680640)
axlsx_stream           1.170000   0.120000   1.290000 (  1.295424)
csv                    0.190000   0.010000   0.200000 (  0.196803)
Owner

randym commented Apr 1, 2012

Ill be pushing a large change set today to bring support for JRuby online and a few other performance patches as well in preparation for the release tonight.

Owner

randym commented Apr 1, 2012

Where we stand for 1.1.0

                          user     system      total        real
axlsx_noautowidth     1.160000   0.030000   1.190000 (  1.202237)
axlsx                 1.710000   0.150000   1.860000 (  2.298028)
axlsx_shared          2.650000   0.150000   2.800000 (  2.864474)
axlsx_stream          1.650000   0.140000   1.790000 (  1.848033)
csv                   0.270000   0.020000   0.290000 (  0.291206)

@randym randym closed this Apr 1, 2012

daveed commented Apr 3, 2012

Hi,

Is the support for JRuby pushed yet? Released version is still 1.0.18 and not 1.1.0...
Cheers.

Owner

randym commented Apr 3, 2012

@mazhout sorry mate - lots of firefighting at the day job since last week.
Jruby (And Rubinius) support is up and active on master, I just have not released an official gem to rubygems yet.

Time to get that done!

Owner

randym commented Apr 3, 2012

releasing 1.1.0 now

Collaborator

jurriaan commented Apr 3, 2012

👍

daveed commented Apr 3, 2012

Cheers mate!

So basically if I just jruby -S gem install axlsx it should just work?

Coz I have this error because of RMagick with C dependencies... :(

Any help would be awesome :)

Thanks,

  • David

On Apr 3, 2012, at 8:14 PM, Jurriaan Pruisreply@reply.github.com wrote:

👍


Reply to this email directly or view it on GitHub:
#61 (comment)

Owner

randym commented Apr 3, 2012

@mazhout

Can you give us the error info along with your system details?
I would guess without any information - that you need to install Image Magic or some other dependency.

daveed commented Apr 10, 2012

@randym

System details:

mazhout:sb/ (master✗) $ uname -a 
Linux mazhout 2.6.32-21-generic #32-Ubuntu SMP Fri Apr 16 08:10:02 UTC 2010 i686 GNU/Linux

mazhout:sb/ (master✗) $ jruby -v
jruby 1.6.7 (ruby-1.8.7-p357) (2012-02-22 3e82bc8) (OpenJDK Server VM 1.6.0_20) [linux-i386-java]

mazhout:sb/ (master✗) $ jruby -S gem list rmagick

*** LOCAL GEMS ***

rmagick (2.13.1)
rmagick4j (0.3.7)

Image Magic is installed. but I still have the following error using the console:

>> require 'axlsx'
=> []
>> 
?> p = Axlsx::Package.new
=> #<Axlsx::Package:0x1d5c103 @app=#<Axlsx::App:0xd95e69>, @workbook=nil, @core=#<Axlsx::Core:0xc569fd @creator="axlsx">>
>> wb = p.workbook
=> #<Axlsx::Workbook:0x895972 @charts=#<Axlsx::SimpleTypedList:0x14f840f @locked_at=nil, @list=[], @allowed_types=[Axlsx::Chart], @serialize_as=nil>, @tables=#<Axlsx::SimpleTypedList:0x890a59 @locked_at=nil, @list=[], @allowed_types=[Axlsx::Table], @serialize_as=nil>, @worksheets=#<Axlsx::SimpleTypedList:0x65d075 @locked_at=nil, @list=[], @allowed_types=[Axlsx::Worksheet], @serialize_as=nil>, @images=#<Axlsx::SimpleTypedList:0x1edf119 @locked_at=nil, @list=[], @allowed_types=[Axlsx::Pic], @serialize_as=nil>, @styles=#<Axlsx::Styles:0x1aaa07f @fills=#<Axlsx::SimpleTypedList:0x10f0424 @locked_at=2, @list=[#<Axlsx::Fill:0x1338621 @fill_type=#<Axlsx::PatternFill:0x1ab0c3a @patternType=:none>>, #<Axlsx::Fill:0xafc935 @fill_type=#<Axlsx::PatternFill:0xc2c549 @patternType=:gray125>>], @allowed_types=[Axlsx::Fill], @serialize_as="fills">, @tableStyles=#<Axlsx::TableStyles:0xc12ae3 @locked_at=0, @defaultTableStyle="TableStyleMedium9", @list=[], @serialize_as=nil, @allowed_types=[Axlsx::TableStyle], @defaultPivotStyle="PivotStyleLight16">, @cellStyles=#<Axlsx::SimpleTypedList:0x869739 @locked_at=1, @list=[#<Axlsx::CellStyle:0x14fa707 @builtinId=0, @name="Normal", @xfId=0>], @allowed_types=[Axlsx::CellStyle], @serialize_as="cellStyles">, @cellStyleXfs=#<Axlsx::SimpleTypedList:0x1c88831 @locked_at=1, @list=[#<Axlsx::Xf:0x18b628 @fontId=0, @fillId=0, @numFmtId=0, @borderId=0>], @allowed_types=[Axlsx::Xf], @serialize_as="cellStyleXfs">, @fonts=#<Axlsx::SimpleTypedList:0x1a9dbac @locked_at=1, @list=[#<Axlsx::Font:0x9e31f0 @name="Arial", @family=1, @sz=11>], @allowed_types=[Axlsx::Font], @serialize_as="fonts">, @numFmts=#<Axlsx::SimpleTypedList:0x275a2e @locked_at=2, @list=[#<Axlsx::NumFmt:0x15f0269 @numFmtId=100, @formatCode="yyyy/mm/dd">, #<Axlsx::NumFmt:0x976434 @numFmtId=101, @formatCode="yyyy/mm/dd hh:mm:ss">], @allowed_types=[Axlsx::NumFmt], @serialize_as="numFmts">, @cellXfs=#<Axlsx::SimpleTypedList:0x2fbdd3 @locked_at=3, @list=[#<Axlsx::Xf:0x15a95c3 @fontId=0, @fillId=0, @xfId=0, @numFmtId=0, @borderId=0>, #<Axlsx::Xf:0x1e54ad @fontId=0, @fillId=0, @xfId=0, @numFmtId=0, @borderId=1>, #<Axlsx::Xf:0x1492cfe @fontId=0, @applyNumberFormat=1, @fillId=0, @xfId=0, @numFmtId=14, @borderId=0>], @allowed_types=[Axlsx::Xf], @serialize_as="cellXfs">, @borders=#<Axlsx::SimpleTypedList:0xe489c6 @locked_at=2, @list=[#<Axlsx::Border:0x4e2837 @prs=#<Axlsx::SimpleTypedList:0x1326f8f @locked_at=nil, @list=[], @allowed_types=[Axlsx::BorderPr], @serialize_as=nil>>, #<Axlsx::Border:0x1fa6c48 @prs=#<Axlsx::SimpleTypedList:0x1daed02 @locked_at=nil, @list=[#<Axlsx::BorderPr:0x7fafa4 @style=:thin, @color=#<Axlsx::Color:0xe43d5e @rgb="FF000000">, @name=:left>, #<Axlsx::BorderPr:0x7909a7 @style=:thin, @color=#<Axlsx::Color:0x68e679 @rgb="FF000000">, @name=:right>, #<Axlsx::BorderPr:0xf3360c @style=:thin, @color=#<Axlsx::Color:0x1615685 @rgb="FF000000">, @name=:top>, #<Axlsx::BorderPr:0x1775e79 @style=:thin, @color=#<Axlsx::Color:0x1db9b4f @rgb="FF000000">, @name=:bottom>], @allowed_types=[Axlsx::BorderPr], @serialize_as=nil>>], @allowed_types=[Axlsx::Border], @serialize_as="borders">, @dxfs=#<Axlsx::SimpleTypedList:0x68d91 @locked_at=0, @list=[], @allowed_types=[Axlsx::Xf], @serialize_as="dxfs">>, @use_autowidth=true, @drawings=#<Axlsx::SimpleTypedList:0x73c8a9 @locked_at=nil, @list=[], @allowed_types=[Axlsx::Drawing], @serialize_as=nil>>
>> wb.add_worksheet(:name => "Basic Worksheet") do |sheet|
?>   sheet.add_row ["First Column", "Second", "Third"]
>>   sheet.add_row [1, 2, 3]
>> end
LoadError: load error: RMagick2 -- java.lang.UnsatisfiedLinkError: /usr/lib/jruby/lib/ruby/gems/1.8/gems/rmagick-2.13.1/lib/RMagick2.so: undefined symbol: rb_frame_last_func
from org/jruby/RubyKernel.java:1033:in `require'
from /usr/lib/jruby/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:36:in `require'
from /usr/lib/jruby/lib/ruby/gems/1.8/gems/activesupport-2.3.12/lib/active_support/dependencies.rb:182:in `require'
from /usr/lib/jruby/lib/ruby/gems/1.8/gems/activesupport-2.3.12/lib/active_support/dependencies.rb:547:in `new_constants_in'
from /usr/lib/jruby/lib/ruby/gems/1.8/gems/activesupport-2.3.12/lib/active_support/dependencies.rb:546:in `new_constants_in'
from /usr/lib/jruby/lib/ruby/gems/1.8/gems/activesupport-2.3.12/lib/active_support/dependencies.rb:182:in `require'
from /usr/lib/jruby/lib/ruby/gems/1.8/gems/rmagick-2.13.1/lib/RMagick.rb:11:in `(root)'
from org/jruby/RubyKernel.java:1033:in `require'
from /usr/lib/jruby/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:36:in `require'
from /usr/lib/jruby/lib/ruby/gems/1.8/gems/activesupport-2.3.12/lib/active_support/dependencies.rb:182:in `require'
from /usr/lib/jruby/lib/ruby/gems/1.8/gems/activesupport-2.3.12/lib/active_support/dependencies.rb:547:in `new_constants_in'
from /usr/lib/jruby/lib/ruby/gems/1.8/gems/activesupport-2.3.12/lib/active_support/dependencies.rb:546:in `new_constants_in'
from /usr/lib/jruby/lib/ruby/gems/1.8/gems/activesupport-2.3.12/lib/active_support/dependencies.rb:182:in `require'
from /usr/lib/jruby/lib/ruby/gems/1.8/gems/rmagick-2.13.1/lib/RMagick.rb:109:in `initialize'
from /usr/lib/jruby/lib/ruby/gems/1.8/gems/axlsx-1.1.0/lib/axlsx/workbook/workbook.rb:156:in `add_worksheet'
from (irb):5:in `evaluate'
from org/jruby/RubyKernel.java:1083:in `eval'
from /usr/lib/jruby/lib/ruby/1.8/irb.rb:158:in `eval_input'
from /usr/lib/jruby/lib/ruby/1.8/irb.rb:271:in `signal_status'
from /usr/lib/jruby/lib/ruby/1.8/irb.rb:155:in `eval_input'
from org/jruby/RubyKernel.java:1410:in `loop'
from org/jruby/RubyKernel.java:1183:in `catch'
from /usr/lib/jruby/lib/ruby/1.8/irb.rb:154:in `eval_input'
from /usr/lib/jruby/lib/ruby/1.8/irb.rb:71:in `start'
from org/jruby/RubyKernel.java:1183:in `catch'
from /usr/lib/jruby/lib/ruby/1.8/irb.rb:70:in `start'
from /usr/lib/jruby/bin/jirb:13:in `(root)'>> 
Owner

randym commented Apr 10, 2012

moved to: #78

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment