Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

mruby is bits slow #980

Open
mattn opened this Issue Mar 12, 2013 · 42 comments

Comments

Projects
None yet
Contributor

mattn commented Mar 12, 2013

http://www.quredo.net/2012/04/mruby-%E8%BB%BD%E9%87%8Fruby-%E3%82%BD%E3%83%BC%E3%82%B9%E5%85%AC%E9%96%8B/ Sorry, in japanese

http://hnakamur.github.com/blog/2012/05/04/fib-39-benchmark-in-lua/

I get benchmark result in few minutes ago on Windwos XP 3GHz

def fib n
  return n if n < 2
  fib(n-2) + fib(n-1)
end

puts fib(39)
$ w32time ruby fib.rb
63245986
real    22.799
system  0.046
user    17.515

$ w32time mruby fib.rb
63245986
real    61.330
system  0.000
user    60.000

My mruby is latest in today. cruby is ruby 2.0.0p0 (2013-02-24) [i386-mingw32]

Contributor

mattn commented Mar 12, 2013

Below is a gprof result.

https://gist.github.com/mattn/5139544

root@porteus:~/mruby_test/mruby# uname -a
Linux porteus 3.7.5-porteus #1 SMP PREEMPT Mon Jan 28 12:02:20 UTC 2013 i686 AMD C-60 APU with Radeon(tm) HD Graphics AuthenticAMD GNU/Linux

root@porteus:~/mruby_test/mruby# time mruby ~/fib.rb
63245986

real 2m53.117s
user 2m52.396s
sys 0m0.122s
root@porteus:~/mruby_test/mruby# time ruby ~/fib.rb
63245986

real 1m22.941s
user 1m20.038s
sys 0m0.751s

Member

bovi commented Mar 14, 2013

I didn't do any big tests yet but do you use the same optimisation options in GCC for Ruby and mruby? The default settings for mruby are development friendly but for performance tests the debugging feature should probably be disabled.

Member

bovi commented Mar 14, 2013

Funny. Actually when I remove the debugging symbols I get slightly worse performance. So just ignore my last post.

With -g

vagrant@mruby:~/mruby$ time ./bin/mruby fib.rb 
63245986

real    0m49.381s
user    0m47.367s
sys 0m1.500s

Without -g

vagrant@mruby:~/mruby$ time ./bin/mruby fib.rb 
63245986

real    0m54.505s
user    0m52.603s
sys 0m1.380s
Owner

matz commented Mar 14, 2013

On my linux box, mruby is only slightly slower than CRuby (trunk).
Maybe something we should fix on Windows. Memory allocation perhaps?

Contributor

monaka commented Mar 20, 2013

On WinXP/MinGW, it requires 12seconds to execute mrbtest. Meanwhile on OSX and Ubuntu, they requires about 1second.
The difference is caused by the infinite loop test.
Does this give us some hints?

Contributor

skandhas commented Mar 20, 2013

@matz @monaka @mattn
I think a possible reason is that the sizeof(mrb_value) > 32 bit, so mrb_value cannot be passed by registers on 32bit OS(like as WinXP).
And mrb_value is used everywhere in mruby core, like parameters, return values, ....
Adverse effects is accumulated ,so mruby-32bit-version is slower than 64bit version .

Contributor

monaka commented Mar 20, 2013

@skandhas , I thought about sizeof(mrb_value), too. So I tested on Ubuntu-10.04-LTS_32bit.
It runs similar to OSX_x86-64 and Ubuntu_64bit.

Contributor

monaka commented Mar 20, 2013

And it should be whole functions slower if it is caused by sizeof(mrb_value). As there are so many mrb_values.
But we have no evidences at least now.

Contributor

cremno commented Mar 23, 2013

This is indeed Windows-specific. Memory (re-)allocation is not as fast/good as on Linux. I think it's best to reduce calls to the memory allocation function of the VM (the default one allocf in state.c uses our culprit realloc) by using a higher value for MRB_STACK_GROWTH (vm.c - also, I think this should be moved to mrbconf.h) or not extending the stack size linear but rather doubling it (but keep linear growth for devices with low memory -> mrbconf.h).

Besides fib can easily be written iteratively if you need a faster computation.

beoran commented Mar 31, 2013

I made the change to make the stack grow linearly since that prevents the stack from getting too deep on (infinite) recursion. I propose to make malloc pluggable, and suggest the use of something like nedmalloc (http://www.nedprod.com/programs/portable/nedmalloc/) for Windows XP.

Contributor

kurodash commented Apr 1, 2013

Visual Studio 2012 Performance Profiler Result
Call tree summary and HotLine summary
https://gist.github.com/kurodash/5282929

Contributor

kurodash commented Apr 1, 2013

Visual Studio 2012 Performance Profiler Result
Call tree summary and HotLine summary
https://gist.github.com/kurodash/5282929

My impressions:

  • stack_extend() uses CPU 14.27%
  • stack_clear() uses CPU 4.85%
    • Why does RITE clear a stack area, when stack is grow? In an ordinary system, when you want a zero value, it 'PUSH 0' to stack explicitly.
  • kh_get_mt() uses 12.00% (Probably this is search of a virtual function table?)
  • There is many HOT LINE in mrb_run()
    • When I look at an assembler VS2012 generated, there are very much shift operation of OPERAND and array operation to a register. maybe.
Contributor

mattn commented Apr 1, 2013

I had making sure that replacing stack_clear with zerofilling of memset make little faster.

Owner

matz commented Apr 1, 2013

The reason for stack_clear() is that default value of local variables are nil.
It might be possible to reduce clear regions since we know how many local variables are.

Contributor

mattn commented Apr 1, 2013

Yes, and the reason to use stack_clear() is that mrb_value contains pointer typed field. And zero filling doesn't mean setting NULL into the pointer feild in C99.

Contributor

kurodash commented Apr 1, 2013

The reason for stack_clear() is that default value of local variables are nil.
It might be possible to reduce clear regions since we know how many local variables are.

Thank you for your reply.
It's an interesting idea.

Contributor

schmurfy commented Apr 22, 2013

I just did the test and I got really different results but in a good way :)

ruby 2.0.0-p0

$ time ruby benchmark/fib39.rb
63245986

real  0m44.590s
user  0m44.466s
sys 0m0.077s

mruby (master)

$ time mruby benchmark/fib39.rb
63245986

real  0m16.508s
user  0m16.481s
sys 0m0.012s

I am on a Mac OS X 10.8 macbookpro, mruby was compiled with nearly default options (compiler is llvm and some mrbgems were added), mri ruby was compiled by rbenv which should use default options.

beoran commented Apr 22, 2013

Well that's both good and bad news, since that means that it's compiler dependent whether mruby is fast or not. Probably llvm does a better job at optimizing mruby than gcc/mingw for some reason. The question remains of course: "Why?"...

Member

carsonmcdonald commented Apr 22, 2013

If you are using the latest version of mruby then the performance change you are seeing is probably related to 84acf4e

Contributor

schmurfy commented Apr 23, 2013

since the issue was opened 20 days ago yes I also think changes in the codebase might be responsible, I don't really trust the good old fibonacci test to be that meaningful too ;)
It would be interesting to issue resulting from others to compare, could you run the benchmark again with master @mattn ?

Contributor

mattn commented Apr 23, 2013

ok, I'll take benchmark result in later.

Contributor

mattn commented Apr 23, 2013

I used default build_config.rb

old result (same as above)

$ w32time mruby fib.rb
63245986
real    61.330
system  0.000
user    60.000

current

$ mruby fib.rb
63245986
real    55.498
system  0.000
user    55.500
Contributor

kyab commented Apr 29, 2014

By 077ccaf, now it should be much faster on Windows.

@mattn Where can I get w32time command?

Contributor

mattn commented Apr 30, 2014

Hmm, I don't get faster result than few days ago before the commit.
I get benchmark result just now using timeit(You can get it from Windows Resource Kit 2003).

c:\dev\mruby>timeit bin\mruby.exe fib.rb
63245986

Version Number:   Windows NT 6.1 (Build 7601)
Exit Time:        9:05 am, Wednesday, April 30 2014
Elapsed Time:     0:00:16.975
Process Time:     0:00:16.926
System Calls:     258271
Context Switches: 83873
Page Faults:      150358
Bytes Read:       250229030
Bytes Written:    365478
Bytes Other:      275574
Contributor

mattn commented Apr 30, 2014

Ah, MRB_STACK_EXTEND_DOUBLING is not default. Plz wait.

Contributor

mattn commented Apr 30, 2014

Enabled MRB_STACK_EXTEND_DOUBLING but not faster than disabled one.

sample1

c:\dev\mruby>timeit bin\mruby.exe fib.rb
63245986

Version Number:   Windows NT 6.1 (Build 7601)
Exit Time:        9:09 am, Wednesday, April 30 2014
Elapsed Time:     0:00:17.113
Process Time:     0:00:16.972
System Calls:     2093352
Context Switches: 343863
Page Faults:      133229
Bytes Read:       857192804
Bytes Written:    138203156
Bytes Other:      316269

sample2

c:\dev\mruby>timeit bin\mruby.exe fib.rb
63245986

Version Number:   Windows NT 6.1 (Build 7601)
Exit Time:        9:11 am, Wednesday, April 30 2014
Elapsed Time:     0:00:16.676
Process Time:     0:00:16.660
System Calls:     168257
Context Switches: 45270
Page Faults:      10070
Bytes Read:       14630642
Bytes Written:    8133593
Bytes Other:      300683
Contributor

kyab commented Apr 30, 2014

oops I also confirmed MRB_STACK_EXTEND_DOUBLING does not make sense for fib.rb..

BTW, in my Windows7 I could not run timeit.exe so I use PowerShell's Measure-Command commandlet.

C:\Users\koji>powershell
Windows PowerShell
Copyright (C) 2009 Microsoft Corporation. All rights reserved.

PS C:\Users\koji> Measure-Command {Start-Process E:\work\mruby\mruby\bin\mruby.exe C:\Users\koji\Documents\fib.rb -wait}



Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 35
Milliseconds      : 176
Ticks             : 351760154
TotalDays         : 0.00040712980787037
TotalHours        : 0.00977111538888889
TotalMinutes      : 0.586266923333333
TotalSeconds      : 35.1760154
TotalMilliseconds : 35176.0154


Contributor

mattn commented May 1, 2014

My result.

PS C:\dev\mruby> Measure-Command {Start-Process bin\mruby.exe fib.rb -wait}


Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 18
Milliseconds      : 543
Ticks             : 185437501
TotalDays         : 0.000214626737268519
TotalHours        : 0.00515104169444444
TotalMinutes      : 0.309062501666667
TotalSeconds      : 18.5437501
TotalMilliseconds : 18543.7501

beoran commented May 2, 2014

I think that simply setting MRB_STACK_GROWTH to a high value on windows, perhaps even to 1<<16 or so is the simplest solution to reduce the amount of memory allocations while not compromising on embedded safety and security. But perhaps there are other bottle necks than memory allocation as well?

SDkie commented Oct 19, 2014

I can give some time to get the figures for ruby and mruby.
I already have compared the performance on ruby and jruby in 2013, check the excel https://docs.google.com/spreadsheet/ccc?key=0AhvHecClbpW-dEx5bkhzU0VDcEgtR2ZBQmF4QVFfekE#gid=0

jarthod commented Feb 4, 2016

Hi,
I've recently started writing a small exercise in various languages (including ruby and mruby) to understand them better and benchmark them. And I was quite surprise to found mruby (1.2.0) to be more than 10 times slower than ruby (2.3.0).

Language CPU time Clock time
Ruby 146 ms 170 ms
mruby 2,000 ms 2,000 ms

You can find all the details here: https://github.com/jarthod/words_path

The code is a bit more complete, it does file read and parsing, and then a BFS between two words. I'm using these two gems, so the ruby and mruby code is almost the same (I could have used a hash directly):

  conf.gem github: 'yui-knk/mruby-set'
  conf.gem github: 'iij/mruby-io'

I'm not sure how to explain this gap, I also tried to compile the mruby version but the performance was exactly the same. Let me know If you have any lead, and if I can help improving this ☺ (I know ruby and C very well)

Contributor

furunkel commented Feb 4, 2016

@jarthod, are you sure you compiled with -O2 or -O3, AFAIK optimizations are not enabled by default.

Owner

matz commented Feb 4, 2016

I looked in to profile, and found out utf8len spent most of the time. I will investigate the issue later.

jarthod commented Feb 4, 2016

Interesting, we could try switching to bytesize see if there's a difference. I'll try that tonight, along with the optimization flags. What's the best way to profile with mruby ?

Owner

matz commented Feb 4, 2016

I'd suggest remove MRB_UTF8_STRING from your config (if you have).
Currently, we don't provide profiler for mruby. This case, I used gprof.
RIght now, I am working on removing some bottlenecks from mruby, it makes word_path 2-3 times faster, but not as much as CRuby.

@matz matz added a commit that referenced this issue Feb 4, 2016

@matz matz cache mrb_regexp_p(); ref #980 5e514c9

jarthod commented Feb 4, 2016

Ok i've tried commenting the enable_debug line and the time dropped to 850ms, much better.
I also tried adding -O3 like this:

  conf.cc do |cc|
    cc.flags = [ENV['CFLAGS'] || %w(-O3)]
  end

Is it the right way ?
But without any changes, maybe it is by default ? or it has no impact. I confirmed the flag was being used, If I just put something which doesn't exists the compilation fails.

I don't seem to have MRB_UTF8_STRING in my config, the define is commented in my mrbconf.h

I see you made a few optimization commits @matz, I'll try with master soon!

Contributor

furunkel commented Feb 4, 2016

@jarthod: Without debug enabled, -O3 might be the default.

jarthod commented Feb 4, 2016

I just tried with mruby-master and saw a slight increase in performance due to the latest commits I guess:

$ time mruby-1.2.0/bin/mruby words_path.mrb jina pray
jina -> dina -> dins -> bins -> bias -> bras -> bray -> pray

real    0m0.886s
user    0m0.878s
sys 0m0.008s

$ time mruby-master/bin/mruby words_path.mrb jina pray
jina -> dina -> dins -> bins -> bias -> bras -> bray -> pray

real    0m0.822s
user    0m0.804s
sys 0m0.016s

BTW I tried using bytesize instead of length but it didn't change anything.

jarthod commented Feb 4, 2016

I've managed to go down to 666ms by:

  1. Replacing mruby-set with a simple Hash, it's just a wrapper anyway
  2. Removing the call to readline and call .each directly, it avoid creating a big unnecessary intermediary array.

see: jarthod/words_path@b1f9638

Owner

matz commented Feb 6, 2016

I forget about https://github.com/miura1729/mruby-profiler
I am not sure it still works though.

jarthod commented Feb 6, 2016

Interesting, I couldn't compile it yet with mruby 1.2 or master but thanks for the tip.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment