New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Bignum string parsing #1258

Closed
headius opened this Issue Nov 23, 2013 · 6 comments

Comments

Projects
None yet
5 participants
@headius
Member

headius commented Nov 23, 2013

JRuby currently uses Java's BigInteger to handle String#to_i when it might need a Bignum. However, this algorithm is quite a bit slower than the one in MRI 2.1.

system ~/projects/jruby/jruby-tmp $ rvm ruby-head do ruby -v -rjson
-rbenchmark -e 'p Benchmark.realtime { ("1" * 1000000).to_i }'
ruby 2.1.0dev (2013-09-30) [x86_64-darwin12.5.0]
0.895446

system ~/projects/jruby/jruby-tmp $ rvm jruby-1.7.8 do ruby -v -rjson
-rbenchmark -e 'p Benchmark.realtime { ("1" * 1000000).to_i }'
jruby 1.7.8 (1.9.3p392) 2013-11-18 0ce429e on Java HotSpot(TM) 64-Bit
Server VM 1.8.0-ea-b115 +indy [darwin-x86_64]
24.387

system ~/projects/jruby/jruby-tmp $ jruby -v -rjson -rbenchmark -e 'p
Benchmark.realtime { ("1" * 1000000).to_i }'
jruby 9000.dev (2.1.0.dev) 2013-11-23 f73ef68 on Java HotSpot(TM)
64-Bit Server VM 1.8.0-ea-b115 +indy [darwin-x86_64]
24.227

I had a quick look at the algorithm and I don't think it would be hard to mimic.

@dbrady

This comment has been minimized.

Show comment
Hide comment
@dbrady

dbrady Nov 23, 2013

Taking a look at this now. This is my first foray into JRuby, so anybody who wants to leapfrog me, feel free. :-)

dbrady commented Nov 23, 2013

Taking a look at this now. This is my first foray into JRuby, so anybody who wants to leapfrog me, feel free. :-)

@giuan

This comment has been minimized.

Show comment
Hide comment
@giuan

giuan Nov 24, 2013

This is a ruby implemetation.
Faster then mri 2.0; no big gain for jruby.
Jruby is 10x slower then mri 2.0 in my test
(jruby 1.7.8 (1.9.3p392) 2013-11-14 0ce429e on Java HotSpot(TM) 64-Bit Server VM 1.7.0_45-b18 [darwin-x86_64])

require 'benchmark'
def to_fixnum str
  str.to_i
end
$max_digit_fixnum = 18

def to_bignum str
  size = str.size
  return to_fixnum(str) if size <= 18
  # str to array base 10**$max_digit_fixnum least significant digit first
  digits = []
  j = size - 1
  i = j - $max_digit_fixnum + 1
  while j >= 0
    s = str[i..j]
    digits << to_fixnum(s)
    j = i-1;
    i = j - $max_digit_fixnum + 1
    i = 0 if i < 0
  end
  # reduce digits from base b10x to b10x*b10x 
  n = digits.size
  b10x = 10**$max_digit_fixnum
  while n > 1
    i = 0; j = 0
    while i < n.div(2)
      digits[i] = digits[j] + digits[j+1]*b10x
      i += 1; j += 2
    end
    if (j == n-1) 
      digits[i] = digits[j]
      i += 1
    end
    n = i
    b10x = b10x*b10x
  end
  digits[0]
end

# to_bignum without method str.to_i
def to_bignum2 str
  size = str.size
  i = size - 1
  digits = []
  while i > 0
    digits << (str[i].ord + 10*(str[i-1].ord) - 11 *('0'.ord))
    i -= 2;
  end
  digits << (str[i].ord - '0'.ord) if i == 0
  n = digits.size
  p10 = 10*10
  while n > 1
    i = 0; j = 0
    while i < n.div(2)
      digits[i] = digits[j] + digits[j+1]*p10
      i += 1; j += 2
    end
    if (j == n-1) 
      digits[i] = digits[j]
      i += 1
    end
    n = i
    p10 = p10*p10
  end
  digits[0]
end
puts Benchmark.realtime {10000.times {to_bignum("1"*100)}}
puts Benchmark.realtime {10000.times {to_bignum2("1"*100)}}
puts

x_my = 0
x = 0
str = "1" * 1000*1000
puts Benchmark.realtime { x_my = to_bignum(str)}
puts Benchmark.realtime { x = (str).to_i }
puts Benchmark.realtime { x_my = to_bignum2(str)}
puts Benchmark.realtime { x = (str).to_i }

puts
puts x_my.size
puts (x_my - x)

giuan commented Nov 24, 2013

This is a ruby implemetation.
Faster then mri 2.0; no big gain for jruby.
Jruby is 10x slower then mri 2.0 in my test
(jruby 1.7.8 (1.9.3p392) 2013-11-14 0ce429e on Java HotSpot(TM) 64-Bit Server VM 1.7.0_45-b18 [darwin-x86_64])

require 'benchmark'
def to_fixnum str
  str.to_i
end
$max_digit_fixnum = 18

def to_bignum str
  size = str.size
  return to_fixnum(str) if size <= 18
  # str to array base 10**$max_digit_fixnum least significant digit first
  digits = []
  j = size - 1
  i = j - $max_digit_fixnum + 1
  while j >= 0
    s = str[i..j]
    digits << to_fixnum(s)
    j = i-1;
    i = j - $max_digit_fixnum + 1
    i = 0 if i < 0
  end
  # reduce digits from base b10x to b10x*b10x 
  n = digits.size
  b10x = 10**$max_digit_fixnum
  while n > 1
    i = 0; j = 0
    while i < n.div(2)
      digits[i] = digits[j] + digits[j+1]*b10x
      i += 1; j += 2
    end
    if (j == n-1) 
      digits[i] = digits[j]
      i += 1
    end
    n = i
    b10x = b10x*b10x
  end
  digits[0]
end

# to_bignum without method str.to_i
def to_bignum2 str
  size = str.size
  i = size - 1
  digits = []
  while i > 0
    digits << (str[i].ord + 10*(str[i-1].ord) - 11 *('0'.ord))
    i -= 2;
  end
  digits << (str[i].ord - '0'.ord) if i == 0
  n = digits.size
  p10 = 10*10
  while n > 1
    i = 0; j = 0
    while i < n.div(2)
      digits[i] = digits[j] + digits[j+1]*p10
      i += 1; j += 2
    end
    if (j == n-1) 
      digits[i] = digits[j]
      i += 1
    end
    n = i
    p10 = p10*p10
  end
  digits[0]
end
puts Benchmark.realtime {10000.times {to_bignum("1"*100)}}
puts Benchmark.realtime {10000.times {to_bignum2("1"*100)}}
puts

x_my = 0
x = 0
str = "1" * 1000*1000
puts Benchmark.realtime { x_my = to_bignum(str)}
puts Benchmark.realtime { x = (str).to_i }
puts Benchmark.realtime { x_my = to_bignum2(str)}
puts Benchmark.realtime { x = (str).to_i }

puts
puts x_my.size
puts (x_my - x)
@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Nov 29, 2013

Member

@dbrady Did you make any progress?

@giuan Your implementation is intriguing! It does indeed appear to be quite a bit faster than the Java BigInteger algorithm. I have not been able to confirm that it is 100% compatible, but I will explore that a bit now.

Member

headius commented Nov 29, 2013

@dbrady Did you make any progress?

@giuan Your implementation is intriguing! It does indeed appear to be quite a bit faster than the Java BigInteger algorithm. I have not been able to confirm that it is 100% compatible, but I will explore that a bit now.

@dbrady

This comment has been minimized.

Show comment
Hide comment
@dbrady

dbrady Nov 29, 2013

@headius Not yet. I've been fettered by my newness to the ecosystem (e.g. not knowing what tools are available to debug or inspect what's going on). I'm hoping to look at this again this weekend, but again, if anybody wants to leapfrog me on this, feel free. I'm not going to come back and say "hey! I was working on that!" :-)

dbrady commented Nov 29, 2013

@headius Not yet. I've been fettered by my newness to the ecosystem (e.g. not knowing what tools are available to debug or inspect what's going on). I'm hoping to look at this again this weekend, but again, if anybody wants to leapfrog me on this, feel free. I'm not going to come back and say "hey! I was working on that!" :-)

@giuan

This comment has been minimized.

Show comment
Hide comment
@giuan

giuan Nov 30, 2013

Hedius, i tryed three different algorithms

  1. Classic the most efficient from literature
  2. Sum of power
  3. The transformation of rappresentation to base B*B until there is one digit

All three are more efficent with a large group(512) of decimal digit converted with #to_i.

On MRI 2.0 the best (10X) is algorithm 3. This puzzles me!

On JRuby algorithm 1 is the best.

The code

require 'benchmark'
def to_fixnum str
  str.to_i
end

def to_bignum_classic str, n_digists = 512
  str = str.gsub(/_/,'')
  size = str.size
  p10_n = 10**(n_digists)
  digits_10_n, r = size.divmod(n_digists)
  if r == 0
    bign = 0
  else
    bign = to_fixnum(str[0..r-1])
  end
  j = r
  digits_10_n.times do 
    bign *= p10_n
    bign += to_fixnum(str[j..(j + n_digists - 1)])
    j += n_digists
  end
  bign
end

def to_bignum_pow str,  n_digists = 512
  str = str.gsub(/_/,'')
  b_n = 1
  b = 10**(n_digists)
  bign = 0
  j = str.size - 1
  i = j - n_digists + 1
  while j >= 0
    bign += b_n*to_fixnum(str[i..j])
    j = i - 1
    i = j - n_digists + 1
    i = 0 if i < 0
    b_n *= b
  end
  bign
end

def to_bignum_base_10x(str, n_digists = 512)
  str = str.gsub(/_/,'')
  size = str.size
  return to_fixnum(str) if size <= n_digists
  # str to array base 10**$max_digit_fixnum least significant digit first
  digits = []
  j = size - 1
  i = j - n_digists + 1
  while j >= 0
    s = str[i..j]
    digits << to_fixnum(s)
    j = i - 1
    i = j - n_digists + 1
    i = 0 if i < 0
  end
  # reduce digits from base b10x to b10x*b10x
  n = digits.size
  b10x = 10**n_digists
  while n > 1
    i = 0; j = 0
    while i < n.div(2)
      digits[i] = digits[j] + digits[j + 1]*b10x
      i += 1; j += 2
    end
    if j == n-1
      digits[i] = digits[j]
      i += 1
    end
    n = i
    b10x = b10x*b10x
  end
  digits[0]
end

x_pow = 0
x_classic = 0
x_base_10x = 0
x = 0
n = 50000
k = 50000/n
k = 1 if k == 0
str = "1234567891" * n
n_digists = 512

3.times do
  puts "#{n*10} decimal digits; use str.to_i for #{n_digists}"
  puts('to_bignum_pow:      ' + Benchmark.realtime { k.times {x_pow = to_bignum_pow(str, n_digists)}}.to_s)
  puts('to_bignum_classic:  ' + Benchmark.realtime { k.times {x_classic = to_bignum_classic(str, n_digists)}}.to_s)
  puts('to_bignum_base_10x: ' + Benchmark.realtime { k.times {x_base_10x = to_bignum_base_10x(str, n_digists)}}.to_s)
  puts('to_i:               ' + Benchmark.realtime { k.times {x = str.to_i }}.to_s)
  puts
end

puts x.size

puts 'to_bignum_pow error' if x_pow - x != 0
puts 'to_bignum_classic error' if x_classic - x != 0
puts 'to_bignum_base_10x error' if x_base_10x - x != 0

giuan commented Nov 30, 2013

Hedius, i tryed three different algorithms

  1. Classic the most efficient from literature
  2. Sum of power
  3. The transformation of rappresentation to base B*B until there is one digit

All three are more efficent with a large group(512) of decimal digit converted with #to_i.

On MRI 2.0 the best (10X) is algorithm 3. This puzzles me!

On JRuby algorithm 1 is the best.

The code

require 'benchmark'
def to_fixnum str
  str.to_i
end

def to_bignum_classic str, n_digists = 512
  str = str.gsub(/_/,'')
  size = str.size
  p10_n = 10**(n_digists)
  digits_10_n, r = size.divmod(n_digists)
  if r == 0
    bign = 0
  else
    bign = to_fixnum(str[0..r-1])
  end
  j = r
  digits_10_n.times do 
    bign *= p10_n
    bign += to_fixnum(str[j..(j + n_digists - 1)])
    j += n_digists
  end
  bign
end

def to_bignum_pow str,  n_digists = 512
  str = str.gsub(/_/,'')
  b_n = 1
  b = 10**(n_digists)
  bign = 0
  j = str.size - 1
  i = j - n_digists + 1
  while j >= 0
    bign += b_n*to_fixnum(str[i..j])
    j = i - 1
    i = j - n_digists + 1
    i = 0 if i < 0
    b_n *= b
  end
  bign
end

def to_bignum_base_10x(str, n_digists = 512)
  str = str.gsub(/_/,'')
  size = str.size
  return to_fixnum(str) if size <= n_digists
  # str to array base 10**$max_digit_fixnum least significant digit first
  digits = []
  j = size - 1
  i = j - n_digists + 1
  while j >= 0
    s = str[i..j]
    digits << to_fixnum(s)
    j = i - 1
    i = j - n_digists + 1
    i = 0 if i < 0
  end
  # reduce digits from base b10x to b10x*b10x
  n = digits.size
  b10x = 10**n_digists
  while n > 1
    i = 0; j = 0
    while i < n.div(2)
      digits[i] = digits[j] + digits[j + 1]*b10x
      i += 1; j += 2
    end
    if j == n-1
      digits[i] = digits[j]
      i += 1
    end
    n = i
    b10x = b10x*b10x
  end
  digits[0]
end

x_pow = 0
x_classic = 0
x_base_10x = 0
x = 0
n = 50000
k = 50000/n
k = 1 if k == 0
str = "1234567891" * n
n_digists = 512

3.times do
  puts "#{n*10} decimal digits; use str.to_i for #{n_digists}"
  puts('to_bignum_pow:      ' + Benchmark.realtime { k.times {x_pow = to_bignum_pow(str, n_digists)}}.to_s)
  puts('to_bignum_classic:  ' + Benchmark.realtime { k.times {x_classic = to_bignum_classic(str, n_digists)}}.to_s)
  puts('to_bignum_base_10x: ' + Benchmark.realtime { k.times {x_base_10x = to_bignum_base_10x(str, n_digists)}}.to_s)
  puts('to_i:               ' + Benchmark.realtime { k.times {x = str.to_i }}.to_s)
  puts
end

puts x.size

puts 'to_bignum_pow error' if x_pow - x != 0
puts 'to_bignum_classic error' if x_classic - x != 0
puts 'to_bignum_base_10x error' if x_base_10x - x != 0
@enebo

This comment has been minimized.

Show comment
Hide comment
@enebo

enebo Feb 17, 2017

Member

This was fixed by #3211 above. We are still slower than MRI but many times faster than we were (on both 1.7.x and 9.x). Perhaps we can improve further but this is good enough to declare victory for today.

Member

enebo commented Feb 17, 2017

This was fixed by #3211 above. We are still slower than MRI but many times faster than we were (on both 1.7.x and 9.x). Perhaps we can improve further but this is good enough to declare victory for today.

@enebo enebo closed this Feb 17, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment