Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a numeric decoder written in C #25

Merged
merged 2 commits into from
Jun 23, 2018
Merged

Conversation

jeremyevans
Copy link
Contributor

This is about 10% faster than a pure-ruby decoder.

Passing an Integer to BigDecimal is not significantly different than
passing a String, and passing a Flonum is actually slower. Integers
probably aren't faster because BigDecimal converts them to cstrs
internally:

https://github.com/ruby/ruby/blob/trunk/ext/bigdecimal/bigdecimal.c#L290-L292
https://github.com/ruby/ruby/blob/trunk/ext/bigdecimal/bigdecimal.c#L302-L306

Flonums are slower because BigDecimal converts them to Rational
and then does some processing on the Rational:

https://github.com/ruby/ruby/blob/trunk/ext/bigdecimal/bigdecimal.c#L247-L278

Here's the code I originally had that tested Integer and Flonum
approaches (I removed it when I benchmarked and found those approachs
were the same speed or slower):

	VALUE bd;
	char *found;
	if ((found = strchr(val, '.'))) {
		if (len <= 16) {
			bd = DBL2NUM(rb_cstr_to_dbl(val, 1));
			len -= found - val - 1;
			return rb_funcall(rb_cObject, s_id_BigDecimal, 2, bd, INT2NUM(len));
		}
		bd = rb_str_new(val, len);
	} else {
		bd = pg_text_dec_integer(conv, val, len, tuple, field, enc_idx);
	}

	return rb_funcall(rb_cObject, s_id_BigDecimal, 1, bd);

This also includes a text encoder written in ruby.

Example test/benchmark code:

require 'pg'
require 'bigdecimal'
require 'benchmark/ips'

int_max = 10**18
small_float_range = 0...15
large_float_range = 0...1000

log_cols = ((ENV['BENCH_LOG_COLS'] || 6).to_i)
log_rows = ((ENV['BENCH_LOG_ROWS'] || 6).to_i)
cols = 2**log_cols

raise "BENCH_LOG_COLS must be <= 10" if log_cols > 10
raise "BENCH_LOG_ROWS must be <= 10" if log_rows > 10

if ENV['PURE_RUBY'] == '1'
  class NumericDecoder < PG::SimpleDecoder
    def decode(string, tuple=nil, field=nil)
      BigDecimal(string)
    end
  end
  class NumericEncoder < PG::SimpleEncoder
    def encode(decimal)
      decimal.to_s('F')
    end
  end
  PG::BasicTypeRegistry.register_type(0, 'numeric', NumericEncoder, NumericDecoder)
end

conn = PG.connect
unless ENV['ALL_STRINGS'] == '1'
  conn.type_map_for_queries = PG::BasicTypeMapForQueries.new conn
  conn.type_map_for_results = PG::BasicTypeMapForResults.new conn
end

conn.exec("BEGIN")
at_exit{conn.exec("ROLLBACK")}
conn.exec("CREATE TABLE int_numeric_test (#{(0...cols).map{|i| "d#{i} numeric(40, 2) DEFAULT '#{(rand*int_max).to_i}'"}.join(', ')})")
conn.exec("CREATE TABLE small_float_numeric_test (#{(0...cols).map{|i| "d#{i} numeric(15, 2) DEFAULT '#{s = small_float_range.map{rand(10)}.join; s[-3] = '.'; s}'"}.join(', ')})")
conn.exec("CREATE TABLE large_float_numeric_test (#{(0...cols).map{|i| "d#{i} numeric(1000, 10) DEFAULT '#{s = large_float_range.map{rand(10)}.join; s[-10] = '.'; s}'"}.join(', ')})")

conn.exec("INSERT INTO int_numeric_test DEFAULT VALUES")
conn.exec("INSERT INTO small_float_numeric_test DEFAULT VALUES")
conn.exec("INSERT INTO large_float_numeric_test DEFAULT VALUES")
log_rows.times do
  conn.exec("INSERT INTO int_numeric_test SELECT * FROM int_numeric_test")
  conn.exec("INSERT INTO small_float_numeric_test SELECT * FROM small_float_numeric_test")
  conn.exec("INSERT INTO large_float_numeric_test SELECT * FROM large_float_numeric_test")
end

['int_numeric_test', 'small_float_numeric_test', 'large_float_numeric_test'].each do |v|
  conn.exec( "SELECT d0 FROM #{v} LIMIT 1" ) do |res|
    v = res.getvalue(0, 0)
    print "Example #{v} value: #{v.is_a?(BigDecimal) ? v.to_s('F') : v}\n"
  end
end
puts

small = '123456790123.12'
large = ('123456790'*10) << '.' << ('012345679')
puts "Basic Tests"
numeric_tests = [
  '1',
  '1.0',
  '1.2',
  small,
  large,
]
numeric_tests.each do |d|
  conn.exec("SELECT #{d}::numeric") do |res|
    v = res.getvalue(0, 0)
    print "Test decimal value: #{d} should equal #{v.is_a?(BigDecimal) ? v.to_s('F') : v}\n"
  end
end

conn.exec_params("SELECT $1::numeric, $2::numeric", [BigDecimal(1), BigDecimal(large)]) do |res|
  v = res.getvalue(0, 0)
  print "Test bigdecimal text encoder values: 1 should equal #{v.is_a?(BigDecimal) ? v.to_s('F') : v}\n"
  v = res.getvalue(0, 1)
  print "Test bigdecimal text encoder values: #{large} should equal #{v.is_a?(BigDecimal) ? v.to_s('F') : v}\n"
end

Benchmark.ips do |x|
  x.warmup = -1
  ['int_numeric_test', 'small_float_numeric_test', 'large_float_numeric_test'].each do |v|
    sql = "SELECT * FROM #{v}"
    x.report(v) do
      conn.exec(sql) do |res|
        ntuples = res.ntuples
        recnum = 0
        while recnum < ntuples
          converted_rec = {}
          fieldnum = 0
          while fieldnum < cols
            res.getvalue(recnum, fieldnum)
            fieldnum += 1
          end
          recnum += 1
        end
      end
    end
  end
end

=begin
Integer Numeric
Strings: ~275 ips
Pure Ruby BigDecimal: ~110 ips
C BigDecimal: ~120 ips

Small Float Numeric
Strings: ~300 ips
Pure Ruby BigDecimal: ~115 ips
C BigDecimal: ~126 ips

Large Float Numeric
Strings: ~10 ips
Pure Ruby BigDecimal: ~7.3 ips
C BigDecimal: ~7.3 ips
=end

This is about 10% faster than a pure-ruby decoder.

Passing an Integer to BigDecimal is not significantly different than
passing a String, and passing a Flonum is actually slower. Integers
probably aren't faster because BigDecimal converts them to cstrs
internally:

https://github.com/ruby/ruby/blob/trunk/ext/bigdecimal/bigdecimal.c#L290-L292
https://github.com/ruby/ruby/blob/trunk/ext/bigdecimal/bigdecimal.c#L302-L306

Flonums are slower because BigDecimal converts them to Rational
and then does some processing on the Rational:

https://github.com/ruby/ruby/blob/trunk/ext/bigdecimal/bigdecimal.c#L247-L278

Here's the code I originally had that tested Integer and Flonum
approaches (I removed it when I benchmarked and found those approachs
were the same speed or slower):

```
	VALUE bd;
	char *found;
	if ((found = strchr(val, '.'))) {
		if (len <= 16) {
			bd = DBL2NUM(rb_cstr_to_dbl(val, 1));
			len -= found - val - 1;
			return rb_funcall(rb_cObject, s_id_BigDecimal, 2, bd, INT2NUM(len));
		}
		bd = rb_str_new(val, len);
	} else {
		bd = pg_text_dec_integer(conv, val, len, tuple, field, enc_idx);
	}

	return rb_funcall(rb_cObject, s_id_BigDecimal, 1, bd);
```

This also includes a text encoder written in ruby.

Example test/benchmark code:

```ruby
require 'pg'
require 'bigdecimal'
require 'benchmark/ips'

int_max = 10**18
small_float_range = 0...15
large_float_range = 0...1000

log_cols = ((ENV['BENCH_LOG_COLS'] || 6).to_i)
log_rows = ((ENV['BENCH_LOG_ROWS'] || 6).to_i)
cols = 2**log_cols

raise "BENCH_LOG_COLS must be <= 10" if log_cols > 10
raise "BENCH_LOG_ROWS must be <= 10" if log_rows > 10

if ENV['PURE_RUBY'] == '1'
  class NumericDecoder < PG::SimpleDecoder
    def decode(string, tuple=nil, field=nil)
      BigDecimal(string)
    end
  end
  class NumericEncoder < PG::SimpleEncoder
    def encode(decimal)
      decimal.to_s('F')
    end
  end
  PG::BasicTypeRegistry.register_type(0, 'numeric', NumericEncoder, NumericDecoder)
end

conn = PG.connect
unless ENV['ALL_STRINGS'] == '1'
  conn.type_map_for_queries = PG::BasicTypeMapForQueries.new conn
  conn.type_map_for_results = PG::BasicTypeMapForResults.new conn
end

conn.exec("BEGIN")
at_exit{conn.exec("ROLLBACK")}
conn.exec("CREATE TABLE int_numeric_test (#{(0...cols).map{|i| "d#{i} numeric(40, 2) DEFAULT '#{(rand*int_max).to_i}'"}.join(', ')})")
conn.exec("CREATE TABLE small_float_numeric_test (#{(0...cols).map{|i| "d#{i} numeric(15, 2) DEFAULT '#{s = small_float_range.map{rand(10)}.join; s[-3] = '.'; s}'"}.join(', ')})")
conn.exec("CREATE TABLE large_float_numeric_test (#{(0...cols).map{|i| "d#{i} numeric(1000, 10) DEFAULT '#{s = large_float_range.map{rand(10)}.join; s[-10] = '.'; s}'"}.join(', ')})")

conn.exec("INSERT INTO int_numeric_test DEFAULT VALUES")
conn.exec("INSERT INTO small_float_numeric_test DEFAULT VALUES")
conn.exec("INSERT INTO large_float_numeric_test DEFAULT VALUES")
log_rows.times do
  conn.exec("INSERT INTO int_numeric_test SELECT * FROM int_numeric_test")
  conn.exec("INSERT INTO small_float_numeric_test SELECT * FROM small_float_numeric_test")
  conn.exec("INSERT INTO large_float_numeric_test SELECT * FROM large_float_numeric_test")
end

['int_numeric_test', 'small_float_numeric_test', 'large_float_numeric_test'].each do |v|
  conn.exec( "SELECT d0 FROM #{v} LIMIT 1" ) do |res|
    v = res.getvalue(0, 0)
    print "Example #{v} value: #{v.is_a?(BigDecimal) ? v.to_s('F') : v}\n"
  end
end
puts

small = '123456790123.12'
large = ('123456790'*10) << '.' << ('012345679')
puts "Basic Tests"
numeric_tests = [
  '1',
  '1.0',
  '1.2',
  small,
  large,
]
numeric_tests.each do |d|
  conn.exec("SELECT #{d}::numeric") do |res|
    v = res.getvalue(0, 0)
    print "Test decimal value: #{d} should equal #{v.is_a?(BigDecimal) ? v.to_s('F') : v}\n"
  end
end

conn.exec_params("SELECT $1::numeric, $2::numeric", [BigDecimal(1), BigDecimal(large)]) do |res|
  v = res.getvalue(0, 0)
  print "Test bigdecimal text encoder values: 1 should equal #{v.is_a?(BigDecimal) ? v.to_s('F') : v}\n"
  v = res.getvalue(0, 1)
  print "Test bigdecimal text encoder values: #{large} should equal #{v.is_a?(BigDecimal) ? v.to_s('F') : v}\n"
end

Benchmark.ips do |x|
  x.warmup = -1
  ['int_numeric_test', 'small_float_numeric_test', 'large_float_numeric_test'].each do |v|
    sql = "SELECT * FROM #{v}"
    x.report(v) do
      conn.exec(sql) do |res|
        ntuples = res.ntuples
        recnum = 0
        while recnum < ntuples
          converted_rec = {}
          fieldnum = 0
          while fieldnum < cols
            res.getvalue(recnum, fieldnum)
            fieldnum += 1
          end
          recnum += 1
        end
      end
    end
  end
end

=begin
Integer Numeric
Strings: ~275 ips
Pure Ruby BigDecimal: ~110 ips
C BigDecimal: ~120 ips

Small Float Numeric
Strings: ~300 ips
Pure Ruby BigDecimal: ~115 ips
C BigDecimal: ~126 ips

Large Float Numeric
Strings: ~10 ips
Pure Ruby BigDecimal: ~7.3 ips
C BigDecimal: ~7.3 ips
=end
```
@SamSaffron
Copy link

Another important note here is that this is the first time PG is getting default decimal mapping, which is big cause select 1.2 will return a decimal and come back as "1.2" prior to this change (old mapper was commented out.

@@ -166,6 +179,27 @@
end
end

it "should do numeric type conversions" do
[0].each do |format|
small = '123456790123.12'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something's up with the indentation here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I just pushed another commit to fix that.

@larskanis larskanis merged commit 8337969 into ged:master Jun 23, 2018
@larskanis
Copy link
Collaborator

Thank you - merged! Out of curiosity: Is this related to Sequel in any way?

@jeremyevans
Copy link
Contributor Author

No, Sequel doesn't use pg's typecasting. If sequel_pg is used, then it does the typecasting in C, otherwise Sequel does it in ruby. I think Sequel has converted numeric columns to BigDecimal since before I took over maintenance.

@larskanis
Copy link
Collaborator

Anyway, your contributions are greatly appreciated!

Sequel still uses the query params typecasting (which is to_s for all non-strings by default). You could use something like this commit to rails, to enable typecasting for the most basic types. This saves some object allocations when sending data.

@jeremyevans
Copy link
Contributor Author

I wouldn't be against accepting a patch for that, and may implement it if I have time. Sequel still supports old versions of pg, though, so it would have to test for support before using it.

@larskanis
Copy link
Collaborator

This works since pg-0.18.0, but compatibility to postgres-pr and jdbc surely needs to be considered.

@jeremyevans
Copy link
Contributor Author

I think Sequel supports back to pg-0.8.0. postgres-pr and jdbc don't matter in this case as parameters are not used on postgres-pr, and jdbc uses the jdbc adapter and not the postgres adapter.

@larskanis
Copy link
Collaborator

But sequel_pg requires pg >= 0.18.0, so that it could be added there?

@jeremyevans
Copy link
Contributor Author

sequel_pg doesn't handle anything related to input parameters, it only handles decoding results, so I wouldn't make sense to add it to sequel_pg. It should be added to Sequel, but made conditional (respond_to?(:type_map_for_queries=)).

junaruga pushed a commit to junaruga/ruby-pg that referenced this pull request Sep 9, 2019
Add Ruby 2.5, and add ruby-head as allow_failure.
kamipo added a commit to kamipo/rails that referenced this pull request Apr 27, 2020
This is required for rails#39063 to use `PG::TextDecoder::Numeric`.

Ref ged/ruby-pg#25.

The pg gem 1.1.0 was released at August 24, 2018, so I think it is good
timing to bump the required version for improving and cleaning up the
code base.

https://rubygems.org/gems/pg/versions
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants