Add a numeric decoder written in C #25

jeremyevans · 2018-06-18T00:21:12Z

This is about 10% faster than a pure-ruby decoder.

Passing an Integer to BigDecimal is not significantly different than
passing a String, and passing a Flonum is actually slower. Integers
probably aren't faster because BigDecimal converts them to cstrs
internally:

https://github.com/ruby/ruby/blob/trunk/ext/bigdecimal/bigdecimal.c#L290-L292
https://github.com/ruby/ruby/blob/trunk/ext/bigdecimal/bigdecimal.c#L302-L306

Flonums are slower because BigDecimal converts them to Rational
and then does some processing on the Rational:

https://github.com/ruby/ruby/blob/trunk/ext/bigdecimal/bigdecimal.c#L247-L278

Here's the code I originally had that tested Integer and Flonum
approaches (I removed it when I benchmarked and found those approachs
were the same speed or slower):

	VALUE bd;
	char *found;
	if ((found = strchr(val, '.'))) {
		if (len <= 16) {
			bd = DBL2NUM(rb_cstr_to_dbl(val, 1));
			len -= found - val - 1;
			return rb_funcall(rb_cObject, s_id_BigDecimal, 2, bd, INT2NUM(len));
		}
		bd = rb_str_new(val, len);
	} else {
		bd = pg_text_dec_integer(conv, val, len, tuple, field, enc_idx);
	}

	return rb_funcall(rb_cObject, s_id_BigDecimal, 1, bd);

This also includes a text encoder written in ruby.

Example test/benchmark code:

require 'pg'
require 'bigdecimal'
require 'benchmark/ips'

int_max = 10**18
small_float_range = 0...15
large_float_range = 0...1000

log_cols = ((ENV['BENCH_LOG_COLS'] || 6).to_i)
log_rows = ((ENV['BENCH_LOG_ROWS'] || 6).to_i)
cols = 2**log_cols

raise "BENCH_LOG_COLS must be <= 10" if log_cols > 10
raise "BENCH_LOG_ROWS must be <= 10" if log_rows > 10

if ENV['PURE_RUBY'] == '1'
  class NumericDecoder < PG::SimpleDecoder
    def decode(string, tuple=nil, field=nil)
      BigDecimal(string)
    end
  end
  class NumericEncoder < PG::SimpleEncoder
    def encode(decimal)
      decimal.to_s('F')
    end
  end
  PG::BasicTypeRegistry.register_type(0, 'numeric', NumericEncoder, NumericDecoder)
end

conn = PG.connect
unless ENV['ALL_STRINGS'] == '1'
  conn.type_map_for_queries = PG::BasicTypeMapForQueries.new conn
  conn.type_map_for_results = PG::BasicTypeMapForResults.new conn
end

conn.exec("BEGIN")
at_exit{conn.exec("ROLLBACK")}
conn.exec("CREATE TABLE int_numeric_test (#{(0...cols).map{|i| "d#{i} numeric(40, 2) DEFAULT '#{(rand*int_max).to_i}'"}.join(', ')})")
conn.exec("CREATE TABLE small_float_numeric_test (#{(0...cols).map{|i| "d#{i} numeric(15, 2) DEFAULT '#{s = small_float_range.map{rand(10)}.join; s[-3] = '.'; s}'"}.join(', ')})")
conn.exec("CREATE TABLE large_float_numeric_test (#{(0...cols).map{|i| "d#{i} numeric(1000, 10) DEFAULT '#{s = large_float_range.map{rand(10)}.join; s[-10] = '.'; s}'"}.join(', ')})")

conn.exec("INSERT INTO int_numeric_test DEFAULT VALUES")
conn.exec("INSERT INTO small_float_numeric_test DEFAULT VALUES")
conn.exec("INSERT INTO large_float_numeric_test DEFAULT VALUES")
log_rows.times do
  conn.exec("INSERT INTO int_numeric_test SELECT * FROM int_numeric_test")
  conn.exec("INSERT INTO small_float_numeric_test SELECT * FROM small_float_numeric_test")
  conn.exec("INSERT INTO large_float_numeric_test SELECT * FROM large_float_numeric_test")
end

['int_numeric_test', 'small_float_numeric_test', 'large_float_numeric_test'].each do |v|
  conn.exec( "SELECT d0 FROM #{v} LIMIT 1" ) do |res|
    v = res.getvalue(0, 0)
    print "Example #{v} value: #{v.is_a?(BigDecimal) ? v.to_s('F') : v}\n"
  end
end
puts

small = '123456790123.12'
large = ('123456790'*10) << '.' << ('012345679')
puts "Basic Tests"
numeric_tests = [
  '1',
  '1.0',
  '1.2',
  small,
  large,
]
numeric_tests.each do |d|
  conn.exec("SELECT #{d}::numeric") do |res|
    v = res.getvalue(0, 0)
    print "Test decimal value: #{d} should equal #{v.is_a?(BigDecimal) ? v.to_s('F') : v}\n"
  end
end

conn.exec_params("SELECT $1::numeric, $2::numeric", [BigDecimal(1), BigDecimal(large)]) do |res|
  v = res.getvalue(0, 0)
  print "Test bigdecimal text encoder values: 1 should equal #{v.is_a?(BigDecimal) ? v.to_s('F') : v}\n"
  v = res.getvalue(0, 1)
  print "Test bigdecimal text encoder values: #{large} should equal #{v.is_a?(BigDecimal) ? v.to_s('F') : v}\n"
end

Benchmark.ips do |x|
  x.warmup = -1
  ['int_numeric_test', 'small_float_numeric_test', 'large_float_numeric_test'].each do |v|
    sql = "SELECT * FROM #{v}"
    x.report(v) do
      conn.exec(sql) do |res|
        ntuples = res.ntuples
        recnum = 0
        while recnum < ntuples
          converted_rec = {}
          fieldnum = 0
          while fieldnum < cols
            res.getvalue(recnum, fieldnum)
            fieldnum += 1
          end
          recnum += 1
        end
      end
    end
  end
end

=begin
Integer Numeric
Strings: ~275 ips
Pure Ruby BigDecimal: ~110 ips
C BigDecimal: ~120 ips

Small Float Numeric
Strings: ~300 ips
Pure Ruby BigDecimal: ~115 ips
C BigDecimal: ~126 ips

Large Float Numeric
Strings: ~10 ips
Pure Ruby BigDecimal: ~7.3 ips
C BigDecimal: ~7.3 ips
=end

This is about 10% faster than a pure-ruby decoder. Passing an Integer to BigDecimal is not significantly different than passing a String, and passing a Flonum is actually slower. Integers probably aren't faster because BigDecimal converts them to cstrs internally: https://github.com/ruby/ruby/blob/trunk/ext/bigdecimal/bigdecimal.c#L290-L292 https://github.com/ruby/ruby/blob/trunk/ext/bigdecimal/bigdecimal.c#L302-L306 Flonums are slower because BigDecimal converts them to Rational and then does some processing on the Rational: https://github.com/ruby/ruby/blob/trunk/ext/bigdecimal/bigdecimal.c#L247-L278 Here's the code I originally had that tested Integer and Flonum approaches (I removed it when I benchmarked and found those approachs were the same speed or slower): ``` VALUE bd; char *found; if ((found = strchr(val, '.'))) { if (len <= 16) { bd = DBL2NUM(rb_cstr_to_dbl(val, 1)); len -= found - val - 1; return rb_funcall(rb_cObject, s_id_BigDecimal, 2, bd, INT2NUM(len)); } bd = rb_str_new(val, len); } else { bd = pg_text_dec_integer(conv, val, len, tuple, field, enc_idx); } return rb_funcall(rb_cObject, s_id_BigDecimal, 1, bd); ``` This also includes a text encoder written in ruby. Example test/benchmark code: ```ruby require 'pg' require 'bigdecimal' require 'benchmark/ips' int_max = 10**18 small_float_range = 0...15 large_float_range = 0...1000 log_cols = ((ENV['BENCH_LOG_COLS'] || 6).to_i) log_rows = ((ENV['BENCH_LOG_ROWS'] || 6).to_i) cols = 2**log_cols raise "BENCH_LOG_COLS must be <= 10" if log_cols > 10 raise "BENCH_LOG_ROWS must be <= 10" if log_rows > 10 if ENV['PURE_RUBY'] == '1' class NumericDecoder < PG::SimpleDecoder def decode(string, tuple=nil, field=nil) BigDecimal(string) end end class NumericEncoder < PG::SimpleEncoder def encode(decimal) decimal.to_s('F') end end PG::BasicTypeRegistry.register_type(0, 'numeric', NumericEncoder, NumericDecoder) end conn = PG.connect unless ENV['ALL_STRINGS'] == '1' conn.type_map_for_queries = PG::BasicTypeMapForQueries.new conn conn.type_map_for_results = PG::BasicTypeMapForResults.new conn end conn.exec("BEGIN") at_exit{conn.exec("ROLLBACK")} conn.exec("CREATE TABLE int_numeric_test (#{(0...cols).map{|i| "d#{i} numeric(40, 2) DEFAULT '#{(rand*int_max).to_i}'"}.join(', ')})") conn.exec("CREATE TABLE small_float_numeric_test (#{(0...cols).map{|i| "d#{i} numeric(15, 2) DEFAULT '#{s = small_float_range.map{rand(10)}.join; s[-3] = '.'; s}'"}.join(', ')})") conn.exec("CREATE TABLE large_float_numeric_test (#{(0...cols).map{|i| "d#{i} numeric(1000, 10) DEFAULT '#{s = large_float_range.map{rand(10)}.join; s[-10] = '.'; s}'"}.join(', ')})") conn.exec("INSERT INTO int_numeric_test DEFAULT VALUES") conn.exec("INSERT INTO small_float_numeric_test DEFAULT VALUES") conn.exec("INSERT INTO large_float_numeric_test DEFAULT VALUES") log_rows.times do conn.exec("INSERT INTO int_numeric_test SELECT * FROM int_numeric_test") conn.exec("INSERT INTO small_float_numeric_test SELECT * FROM small_float_numeric_test") conn.exec("INSERT INTO large_float_numeric_test SELECT * FROM large_float_numeric_test") end ['int_numeric_test', 'small_float_numeric_test', 'large_float_numeric_test'].each do |v| conn.exec( "SELECT d0 FROM #{v} LIMIT 1" ) do |res| v = res.getvalue(0, 0) print "Example #{v} value: #{v.is_a?(BigDecimal) ? v.to_s('F') : v}\n" end end puts small = '123456790123.12' large = ('123456790'*10) << '.' << ('012345679') puts "Basic Tests" numeric_tests = [ '1', '1.0', '1.2', small, large, ] numeric_tests.each do |d| conn.exec("SELECT #{d}::numeric") do |res| v = res.getvalue(0, 0) print "Test decimal value: #{d} should equal #{v.is_a?(BigDecimal) ? v.to_s('F') : v}\n" end end conn.exec_params("SELECT $1::numeric, $2::numeric", [BigDecimal(1), BigDecimal(large)]) do |res| v = res.getvalue(0, 0) print "Test bigdecimal text encoder values: 1 should equal #{v.is_a?(BigDecimal) ? v.to_s('F') : v}\n" v = res.getvalue(0, 1) print "Test bigdecimal text encoder values: #{large} should equal #{v.is_a?(BigDecimal) ? v.to_s('F') : v}\n" end Benchmark.ips do |x| x.warmup = -1 ['int_numeric_test', 'small_float_numeric_test', 'large_float_numeric_test'].each do |v| sql = "SELECT * FROM #{v}" x.report(v) do conn.exec(sql) do |res| ntuples = res.ntuples recnum = 0 while recnum < ntuples converted_rec = {} fieldnum = 0 while fieldnum < cols res.getvalue(recnum, fieldnum) fieldnum += 1 end recnum += 1 end end end end end =begin Integer Numeric Strings: ~275 ips Pure Ruby BigDecimal: ~110 ips C BigDecimal: ~120 ips Small Float Numeric Strings: ~300 ips Pure Ruby BigDecimal: ~115 ips C BigDecimal: ~126 ips Large Float Numeric Strings: ~10 ips Pure Ruby BigDecimal: ~7.3 ips C BigDecimal: ~7.3 ips =end ```

SamSaffron · 2018-06-18T10:48:26Z

Another important note here is that this is the first time PG is getting default decimal mapping, which is big cause select 1.2 will return a decimal and come back as "1.2" prior to this change (old mapper was commented out.

cbandy · 2018-06-18T02:19:09Z

spec/pg/basic_type_mapping_spec.rb

@@ -166,6 +179,27 @@
 				end
 			end

+			it "should do numeric type conversions" do
+				[0].each do |format|
+				  small = '123456790123.12'


Something's up with the indentation here.

Thanks, I just pushed another commit to fix that.

larskanis · 2018-06-23T16:37:15Z

Thank you - merged! Out of curiosity: Is this related to Sequel in any way?

jeremyevans · 2018-06-23T20:02:36Z

No, Sequel doesn't use pg's typecasting. If sequel_pg is used, then it does the typecasting in C, otherwise Sequel does it in ruby. I think Sequel has converted numeric columns to BigDecimal since before I took over maintenance.

larskanis · 2018-06-23T20:18:13Z

Anyway, your contributions are greatly appreciated!

Sequel still uses the query params typecasting (which is to_s for all non-strings by default). You could use something like this commit to rails, to enable typecasting for the most basic types. This saves some object allocations when sending data.

jeremyevans · 2018-06-23T20:45:11Z

I wouldn't be against accepting a patch for that, and may implement it if I have time. Sequel still supports old versions of pg, though, so it would have to test for support before using it.

larskanis · 2018-06-23T21:04:47Z

This works since pg-0.18.0, but compatibility to postgres-pr and jdbc surely needs to be considered.

jeremyevans · 2018-06-23T21:06:20Z

I think Sequel supports back to pg-0.8.0. postgres-pr and jdbc don't matter in this case as parameters are not used on postgres-pr, and jdbc uses the jdbc adapter and not the postgres adapter.

larskanis · 2018-06-23T21:12:40Z

But sequel_pg requires pg >= 0.18.0, so that it could be added there?

jeremyevans · 2018-06-23T21:15:09Z

sequel_pg doesn't handle anything related to input parameters, it only handles decoding results, so I wouldn't make sense to add it to sequel_pg. It should be added to Sequel, but made conditional (respond_to?(:type_map_for_queries=)).

Add Ruby 2.5, and add ruby-head as allow_failure.

This is required for rails#39063 to use `PG::TextDecoder::Numeric`. Ref ged/ruby-pg#25. The pg gem 1.1.0 was released at August 24, 2018, so I think it is good timing to bump the required version for improving and cleaning up the code base. https://rubygems.org/gems/pg/versions

cbandy reviewed Jun 18, 2018

View reviewed changes

Fix whitespace in spec

8337969

larskanis merged commit 8337969 into ged:master Jun 23, 2018

junaruga pushed a commit to junaruga/ruby-pg that referenced this pull request Sep 9, 2019

Merged in junaruga/ruby-pg/feature/add-ruby-2.5 (pull request ged#25)

c30c3fc

Add Ruby 2.5, and add ruby-head as allow_failure.

kamipo mentioned this pull request Apr 27, 2020

Update pg gem required version to 1.1 rails/rails#39064

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a numeric decoder written in C #25

Add a numeric decoder written in C #25

jeremyevans commented Jun 18, 2018

SamSaffron commented Jun 18, 2018

cbandy Jun 18, 2018

jeremyevans Jun 19, 2018

larskanis commented Jun 23, 2018

jeremyevans commented Jun 23, 2018

larskanis commented Jun 23, 2018

jeremyevans commented Jun 23, 2018

larskanis commented Jun 23, 2018

jeremyevans commented Jun 23, 2018

larskanis commented Jun 23, 2018

jeremyevans commented Jun 23, 2018

Add a numeric decoder written in C #25

Add a numeric decoder written in C #25

Conversation

jeremyevans commented Jun 18, 2018

SamSaffron commented Jun 18, 2018

cbandy Jun 18, 2018

Choose a reason for hiding this comment

jeremyevans Jun 19, 2018

Choose a reason for hiding this comment

larskanis commented Jun 23, 2018

jeremyevans commented Jun 23, 2018

larskanis commented Jun 23, 2018

jeremyevans commented Jun 23, 2018

larskanis commented Jun 23, 2018

jeremyevans commented Jun 23, 2018

larskanis commented Jun 23, 2018

jeremyevans commented Jun 23, 2018