Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PG::TextEncoder::Numeric only handles BigDecimal #310

Closed
brocktimus opened this issue Dec 2, 2019 · 9 comments · Fixed by #312
Closed

PG::TextEncoder::Numeric only handles BigDecimal #310

brocktimus opened this issue Dec 2, 2019 · 9 comments · Fixed by #312

Comments

@brocktimus
Copy link

Lots of backstory to this but I'll keep it down to minimal bug report.

As subject goes PG::TextEncoder::Numeric only handles the BigDecimal class explicitly. This means if trying to encode a Float it falls through as a Float still. Then later on in processing it Errors with TypeError (no implicit conversion of Float into String).

This was using a type map extracted from the connection itself. Reason Floats are being passed in is because data source is producing Floats. I'm assuming the Numeric TextEncoder should handle other subclasses of Numeric too?

I fixed this with a monkey patch as follows:

class PG::TextEncoder::Numeric
  def encode(value)
    return value.to_s('F') if value.is_a?(BigDecimal)
    return value.to_s if value.is_a?(Numeric)
    value
  end
end

Happy to submit a PR but unsure what way we want to skin this cat.

Other obvious variant would be using a case statement like is done for InetAddr.

We can explicitly check for Float and raise NotImplementedError for everything else that is a Numeric, on basis of it might need to be checked / added? An error which can be traced back to here would help, I had to dig around for a bit because the encoder "worked" fine previously, it was just in the extension area that was raising the TypeError but that wasn't visible in the stack trace so it was hard to track down.

Loving the work otherwise!

@larskanis
Copy link
Collaborator

Thank you @brocktimus for investigating this! Could you please post a small snippet that demonstrates the issue? In particular what the encoded value is and what you expect instead?

@brocktimus
Copy link
Author

I can formalize a failing test if you'd like but the following snippet would constitute it.

encoder = PG::TextEncoder::Numeric.new

# It works generally
encoder.encode(0.5) == '0.5'

# Big Decimal works the same as floats
decimal = BigDecimal(0.5, 2)
encoder.encode(decimal) == encoder.encode(decimal.to_f)

Right now its just returning the value passed in for anything that is Numeric that isn't a BigDecimal. I'm expecting a String out that maps onto what Postgres expects for all Numeric types when encode is called.

If you've got a test file already I should append the spec too just point me in that direction.

I haven't thought about the other Numeric types like Rational or Complex, although unsure if they matter or should maybe even just raise an error? Integer is already covered by it's own class underneath TextEncoder so Floats was the only one I could THINK that had the hole for in Numeric that "make sense" for the converter.

@larskanis
Copy link
Collaborator

Did I understand it right, that your issue is like this?

c = PG.connect
tm = PG::TypeMapByColumn.new([PG::TextEncoder::Numeric.new])
c.exec_params("SELECT $1::Numeric", [2], 0, tm).to_a
# expected: [{"numeric"=>"2"}]
# but raises: TypeError (no implicit conversion of Integer into String)
c.exec_params("SELECT $1::Numeric", [2.0], 0, tm).to_a
# expected: [{"numeric"=>"2.0"}]
# but raises: TypeError (no implicit conversion of Float into String)

PG::BasicTypeMapForQueries maps only BigDecimal to PG::TextEncoder::Numeric and Float to PG::TextEncoder::Float. So am I right that you're using your own typemap and not PG::BasicTypeMapForQueries?

@brocktimus
Copy link
Author

brocktimus commented Dec 2, 2019

Right, sorry I managed to completely miss PG::TextEncoder::Float yesterday. Must have been in a rush and overlooked it! Otherwise I would have constructed my own Type Map.

So full story is I'm doing some large COPYs since it's much faster than even INSERT with multiple VALUE. The examples I found said to use PG::TextEncoder::CopyRow and that got the whole thing working.

Then I realised that all precision of timestamps was being rounded off, because that object is functionally calling to_s on timestamp and rounding to nearest second. So I went reading more code and found this example, which indicated if I used a different coder I would maintain precision (it did BTW, fantastic!).

# Simple set of rules for type casting common PostgreSQL types from Ruby

Which is exactly what I wanted. So I ran this code...

res = conn.exec("SELECT 0.5 LIMIT 0") # Not really, but simpler for this use case.
btm = PG::BasicTypeMapBasedOnResult.new(conn)
tm = btm.build_column_map(res)
#<PG::TypeMapByColumn numeric:0>

So as can be seen now the type map comes out with a numeric so when I try to run the following code and would expect it to work since the type map has been derived from the query above and I'm putting data against the same columns

row_encoder = PG::TextEncoder::CopyRow.new type_map: tm
conn.copy_data( "COPY copytable FROM STDIN", row_encoder ) do |res|
  conn.put_copy_data [0.5]
end

However raises: TypeError (no implicit conversion of Float into String). This is because the Numeric encoder is being used rather than Float. Although realistically the same problem would have occurred if I tried to insert an Integer into a Numeric column (which I'd also expect to be OK?), it's just my dataset was entirely Floats in the Numeric column.

So a more full but minimal example of what I'd expect to work is the following. There might be some errors, but gist is there. I just copied and pasted the example from the code and modified it to suit.

conn.exec( "CREATE TEMP TABLE numerics (n NUMERIC, an NUMERIC[])" )

# Retrieve table OIDs per empty result set.
res = conn.exec( "SELECT * FROM numerics LIMIT 0" )
# Build a type map for common ruby to database type encoders.
btm = PG::BasicTypeMapBasedOnResult.new(conn)
# Build a PG::TypeMapByColumn with encoders suitable for copytable.
tm = btm.build_column_map( res )
row_encoder = PG::TextEncoder::CopyRow.new type_map: tm

conn.copy_data( "COPY numerics FROM STDIN", row_encoder ) do |res|
  decimal = BigDecimal(0.5, 1)
  conn.put_copy_data [decimal, [decimal, decimal]]
  conn.put_copy_data [decimal.to_f, [decimal.to_f, decimal.to_f]]
  conn.put_copy_data [decimal.to_i, [decimal.to_i, decimal.to_i]]
  conn.put_copy_data [decimal, [decimal, decimal.to_i, decimal.to_i]]
end

FYI where I'm using the COPY I'm going into multiple tables, so using generic code that pulls out the type map from the DB prior to me copying rows works great rather than my statically constructing it ahead of time.

Although I don't think static construction would even help if the dataset was mixed like that in the example above. Any ideas for setting up a row_encoder for a numeric column that can handle any of the Numeric types passed in?

larskanis added a commit to larskanis/ruby-pg that referenced this issue Dec 6, 2019
Since PostgreSQL accepts mostly the same format for all 3 data types, we could shirk to one encoder.

Fixes ged#310
larskanis added a commit to larskanis/ruby-pg that referenced this issue Dec 6, 2019
Previously it was only tested through BasicTypeMapping but that didn't handle Integer and Float inputs.

Related to ged#310
@larskanis
Copy link
Collaborator

larskanis commented Dec 6, 2019

#312 fixes the above failing code samples.

@brocktimus
Copy link
Author

Thanks! The Numeric Encoder in C is a much cleaner change set.

Just wondering, am I using type maps / encoders properly in my above code sample?

Based on your comment in #312 it makes it sound like there was a way to encode all of these using the BasicTypeMap directly and it handles switching based on input type?

@larskanis
Copy link
Collaborator

@brocktimus I tried to properly document the type map system of ruby-pg, but it's declarative nature makes it somewhat difficult to understand.

PG::BasicTypeMapBasedOnResult is a derivation of PG::TypeMapByOid populated with all the encoders from the PG::BasicTypeRegistry . So it uses PostgreSQL's type OIDs to determine the encoders to be used. COPY doesn't provide any type OIDs but a result with "LIMIT 0" does and it can be used to build a PG::TypeMapByColumn which can be applied to copy_data.

On the other hand PG::BasicTypeMapForQueries is a PG::TypeMapByClass derivation, which selects the encoder based on the ruby class of input values. Your example could also use a PG::BasicTypeMapForQueries like so:

# Use the ruby class based type map for database type encoders.
btm = PG::BasicTypeMapForQueries.new(conn)
row_encoder = PG::TextEncoder::CopyRow.new type_map: btm

conn.copy_data( "COPY numerics FROM STDIN", row_encoder ) do |res|
  decimal = BigDecimal(0.5, 1)
  conn.put_copy_data [decimal, [decimal, decimal]]
  conn.put_copy_data [decimal.to_f, [decimal.to_f, decimal.to_f]]
  conn.put_copy_data [decimal.to_i, [decimal.to_i, decimal.to_i]]
  conn.put_copy_data [decimal, [decimal, decimal.to_i, decimal.to_i]]
end

However this is less precise than selection by type OID. For instance a ruby Array could be stored as PostgreSQL array type (the default) but also as a JSON object or as a composite type record. By the way I noticed two issues while running the above code, which I fixed in commit df29bf7 and 6953952 . So thank you for this valuable bug report! And you're doing everything right - it was ruby-pg's fault.

@brocktimus
Copy link
Author

Thanks for the fixes!

Not nagging, just noticed some commits referencing a possible 1.2.0 release. Is that imminent? Just asking because I'm commenting up the hack I wrote in a project to remove it when the gem update comes out.

@larskanis
Copy link
Collaborator

@ged will cut a new release in the next days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants