PG::TextEncoder::Numeric only handles BigDecimal #310

brocktimus · 2019-12-02T06:43:10Z

Lots of backstory to this but I'll keep it down to minimal bug report.

As subject goes PG::TextEncoder::Numeric only handles the BigDecimal class explicitly. This means if trying to encode a Float it falls through as a Float still. Then later on in processing it Errors with TypeError (no implicit conversion of Float into String).

This was using a type map extracted from the connection itself. Reason Floats are being passed in is because data source is producing Floats. I'm assuming the Numeric TextEncoder should handle other subclasses of Numeric too?

I fixed this with a monkey patch as follows:

class PG::TextEncoder::Numeric
  def encode(value)
    return value.to_s('F') if value.is_a?(BigDecimal)
    return value.to_s if value.is_a?(Numeric)
    value
  end
end

Happy to submit a PR but unsure what way we want to skin this cat.

Other obvious variant would be using a case statement like is done for InetAddr.

We can explicitly check for Float and raise NotImplementedError for everything else that is a Numeric, on basis of it might need to be checked / added? An error which can be traced back to here would help, I had to dig around for a bit because the encoder "worked" fine previously, it was just in the extension area that was raising the TypeError but that wasn't visible in the stack trace so it was hard to track down.

Loving the work otherwise!

larskanis · 2019-12-02T06:58:28Z

Thank you @brocktimus for investigating this! Could you please post a small snippet that demonstrates the issue? In particular what the encoded value is and what you expect instead?

brocktimus · 2019-12-02T07:42:27Z

I can formalize a failing test if you'd like but the following snippet would constitute it.

encoder = PG::TextEncoder::Numeric.new

# It works generally
encoder.encode(0.5) == '0.5'

# Big Decimal works the same as floats
decimal = BigDecimal(0.5, 2)
encoder.encode(decimal) == encoder.encode(decimal.to_f)

Right now its just returning the value passed in for anything that is Numeric that isn't a BigDecimal. I'm expecting a String out that maps onto what Postgres expects for all Numeric types when encode is called.

If you've got a test file already I should append the spec too just point me in that direction.

I haven't thought about the other Numeric types like Rational or Complex, although unsure if they matter or should maybe even just raise an error? Integer is already covered by it's own class underneath TextEncoder so Floats was the only one I could THINK that had the hole for in Numeric that "make sense" for the converter.

larskanis · 2019-12-02T15:05:50Z

Did I understand it right, that your issue is like this?

c = PG.connect
tm = PG::TypeMapByColumn.new([PG::TextEncoder::Numeric.new])
c.exec_params("SELECT $1::Numeric", [2], 0, tm).to_a
# expected: [{"numeric"=>"2"}]
# but raises: TypeError (no implicit conversion of Integer into String)
c.exec_params("SELECT $1::Numeric", [2.0], 0, tm).to_a
# expected: [{"numeric"=>"2.0"}]
# but raises: TypeError (no implicit conversion of Float into String)

PG::BasicTypeMapForQueries maps only BigDecimal to PG::TextEncoder::Numeric and Float to PG::TextEncoder::Float. So am I right that you're using your own typemap and not PG::BasicTypeMapForQueries?

brocktimus · 2019-12-02T23:10:38Z

Right, sorry I managed to completely miss PG::TextEncoder::Float yesterday. Must have been in a rush and overlooked it! Otherwise I would have constructed my own Type Map.

So full story is I'm doing some large COPYs since it's much faster than even INSERT with multiple VALUE. The examples I found said to use PG::TextEncoder::CopyRow and that got the whole thing working.

Then I realised that all precision of timestamps was being rounded off, because that object is functionally calling to_s on timestamp and rounding to nearest second. So I went reading more code and found this example, which indicated if I used a different coder I would maintain precision (it did BTW, fantastic!).

ruby-pg/lib/pg/basic_type_mapping.rb

Line 330 in 47d616b

# Simple set of rules for type casting common PostgreSQL types from Ruby

Which is exactly what I wanted. So I ran this code...

res = conn.exec("SELECT 0.5 LIMIT 0") # Not really, but simpler for this use case.
btm = PG::BasicTypeMapBasedOnResult.new(conn)
tm = btm.build_column_map(res)
#<PG::TypeMapByColumn numeric:0>

So as can be seen now the type map comes out with a numeric so when I try to run the following code and would expect it to work since the type map has been derived from the query above and I'm putting data against the same columns

row_encoder = PG::TextEncoder::CopyRow.new type_map: tm
conn.copy_data( "COPY copytable FROM STDIN", row_encoder ) do |res|
  conn.put_copy_data [0.5]
end

However raises: TypeError (no implicit conversion of Float into String). This is because the Numeric encoder is being used rather than Float. Although realistically the same problem would have occurred if I tried to insert an Integer into a Numeric column (which I'd also expect to be OK?), it's just my dataset was entirely Floats in the Numeric column.

So a more full but minimal example of what I'd expect to work is the following. There might be some errors, but gist is there. I just copied and pasted the example from the code and modified it to suit.

conn.exec( "CREATE TEMP TABLE numerics (n NUMERIC, an NUMERIC[])" )

# Retrieve table OIDs per empty result set.
res = conn.exec( "SELECT * FROM numerics LIMIT 0" )
# Build a type map for common ruby to database type encoders.
btm = PG::BasicTypeMapBasedOnResult.new(conn)
# Build a PG::TypeMapByColumn with encoders suitable for copytable.
tm = btm.build_column_map( res )
row_encoder = PG::TextEncoder::CopyRow.new type_map: tm

conn.copy_data( "COPY numerics FROM STDIN", row_encoder ) do |res|
  decimal = BigDecimal(0.5, 1)
  conn.put_copy_data [decimal, [decimal, decimal]]
  conn.put_copy_data [decimal.to_f, [decimal.to_f, decimal.to_f]]
  conn.put_copy_data [decimal.to_i, [decimal.to_i, decimal.to_i]]
  conn.put_copy_data [decimal, [decimal, decimal.to_i, decimal.to_i]]
end

FYI where I'm using the COPY I'm going into multiple tables, so using generic code that pulls out the type map from the DB prior to me copying rows works great rather than my statically constructing it ahead of time.

Although I don't think static construction would even help if the dataset was mixed like that in the example above. Any ideas for setting up a row_encoder for a numeric column that can handle any of the Numeric types passed in?

Since PostgreSQL accepts mostly the same format for all 3 data types, we could shirk to one encoder. Fixes ged#310

Previously it was only tested through BasicTypeMapping but that didn't handle Integer and Float inputs. Related to ged#310

larskanis · 2019-12-06T21:57:34Z

#312 fixes the above failing code samples.

brocktimus · 2019-12-06T22:36:11Z

Thanks! The Numeric Encoder in C is a much cleaner change set.

Just wondering, am I using type maps / encoders properly in my above code sample?

Based on your comment in #312 it makes it sound like there was a way to encode all of these using the BasicTypeMap directly and it handles switching based on input type?

larskanis · 2019-12-07T22:03:30Z

@brocktimus I tried to properly document the type map system of ruby-pg, but it's declarative nature makes it somewhat difficult to understand.

PG::BasicTypeMapBasedOnResult is a derivation of PG::TypeMapByOid populated with all the encoders from the PG::BasicTypeRegistry . So it uses PostgreSQL's type OIDs to determine the encoders to be used. COPY doesn't provide any type OIDs but a result with "LIMIT 0" does and it can be used to build a PG::TypeMapByColumn which can be applied to copy_data.

On the other hand PG::BasicTypeMapForQueries is a PG::TypeMapByClass derivation, which selects the encoder based on the ruby class of input values. Your example could also use a PG::BasicTypeMapForQueries like so:

# Use the ruby class based type map for database type encoders.
btm = PG::BasicTypeMapForQueries.new(conn)
row_encoder = PG::TextEncoder::CopyRow.new type_map: btm

conn.copy_data( "COPY numerics FROM STDIN", row_encoder ) do |res|
  decimal = BigDecimal(0.5, 1)
  conn.put_copy_data [decimal, [decimal, decimal]]
  conn.put_copy_data [decimal.to_f, [decimal.to_f, decimal.to_f]]
  conn.put_copy_data [decimal.to_i, [decimal.to_i, decimal.to_i]]
  conn.put_copy_data [decimal, [decimal, decimal.to_i, decimal.to_i]]
end

However this is less precise than selection by type OID. For instance a ruby Array could be stored as PostgreSQL array type (the default) but also as a JSON object or as a composite type record. By the way I noticed two issues while running the above code, which I fixed in commit df29bf7 and 6953952 . So thank you for this valuable bug report! And you're doing everything right - it was ruby-pg's fault.

brocktimus · 2019-12-09T01:56:09Z

Thanks for the fixes!

Not nagging, just noticed some commits referencing a possible 1.2.0 release. Is that imminent? Just asking because I'm commenting up the hack I wrote in a project to remove it when the gem update comes out.

larskanis · 2019-12-11T20:52:05Z

@ged will cut a new release in the next days.

larskanis added a commit to larskanis/ruby-pg that referenced this issue Dec 6, 2019

Combine TextEncoder::Integer, Float and Numeric into one encoder

ae3dcb1

Since PostgreSQL accepts mostly the same format for all 3 data types, we could shirk to one encoder. Fixes ged#310

larskanis added a commit to larskanis/ruby-pg that referenced this issue Dec 6, 2019

Add specs for TextEncoder::Numeric

e2dcecf

Previously it was only tested through BasicTypeMapping but that didn't handle Integer and Float inputs. Related to ged#310

larskanis mentioned this issue Dec 6, 2019

Accept Integer and Float as input to TextEncoder::Numeric #312

Merged

larskanis closed this as completed in 0d5e2fe Dec 7, 2019

larskanis closed this as completed in #312 Dec 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PG::TextEncoder::Numeric only handles BigDecimal #310

PG::TextEncoder::Numeric only handles BigDecimal #310

brocktimus commented Dec 2, 2019

larskanis commented Dec 2, 2019

brocktimus commented Dec 2, 2019

larskanis commented Dec 2, 2019

brocktimus commented Dec 2, 2019 •

edited by larskanis

larskanis commented Dec 6, 2019 •

edited

brocktimus commented Dec 6, 2019

larskanis commented Dec 7, 2019

brocktimus commented Dec 9, 2019

larskanis commented Dec 11, 2019

PG::TextEncoder::Numeric only handles BigDecimal #310

PG::TextEncoder::Numeric only handles BigDecimal #310

Comments

brocktimus commented Dec 2, 2019

larskanis commented Dec 2, 2019

brocktimus commented Dec 2, 2019

larskanis commented Dec 2, 2019

brocktimus commented Dec 2, 2019 • edited by larskanis

larskanis commented Dec 6, 2019 • edited

brocktimus commented Dec 6, 2019

larskanis commented Dec 7, 2019

brocktimus commented Dec 9, 2019

larskanis commented Dec 11, 2019

brocktimus commented Dec 2, 2019 •

edited by larskanis

larskanis commented Dec 6, 2019 •

edited