Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pg sets the encoding on PG::Connection objects #280

Closed
ged opened this issue Jul 16, 2018 · 5 comments
Closed

pg sets the encoding on PG::Connection objects #280

ged opened this issue Jul 16, 2018 · 5 comments
Labels
1.0.0 bug Library major migrated Migrated from issues on Bitbucket and/or Rubyforge

Comments

@ged
Copy link
Owner

ged commented Jul 16, 2018

Original report by Benoit Daloze (Bitbucket: eregon, GitHub: eregon).


pg sets the encoding with ENCODING_SET_INLINED/rb_enc_set_index on PG::Connection objects. Pg uses a macro PG_ENCODING_SET_NOCHECK() in pg.h.

This seems not supported in latest ruby trunk, as Koichi Sasada explicitly added a check against it in ruby/ruby@d0fb73a , and as a result the program crashes when rb_enc_set_index() is called on a non-encoding capable object.

To observe this, we need to change PG_ENCODING_SET_NOCHECK() to just call rb_enc_set_index() as ENCODING_SET_INLINED() doesn't contain the check currently, but likely has unsupported semantics and only works by luck that there are < 128 encodings.

Then with ruby-trunk we can see (the program assume a postgres db named "test" exists):

$ ruby -Ilib -rpg -e 'p PG.connect(dbname: "test")'

/home/eregon/code/ruby-pg/lib/pg.rb:56: [BUG] enc_set_index: not capable object
ruby 2.6.0dev (2018-07-16 trunk 63979) [x86_64-linux]

-- Control frame information -----------------------------------------------
c:0005 p:---- s:0020 e:000019 CFUNC  :initialize
c:0004 p:---- s:0017 e:000016 CFUNC  :new
c:0003 p:0016 s:0012 e:000011 METHOD /home/eregon/code/ruby-pg/lib/pg.rb:56
c:0002 p:0013 s:0007 e:000005 EVAL   -e:1 [FINISH]
c:0001 p:0000 s:0003 E:001c50 (none) [FINISH]

-- Ruby level backtrace information ----------------------------------------
-e:1:in `<main>'
/home/eregon/code/ruby-pg/lib/pg.rb:56:in `connect'
/home/eregon/code/ruby-pg/lib/pg.rb:56:in `new'
/home/eregon/code/ruby-pg/lib/pg.rb:56:in `initialize'

-- C level backtrace information -------------------------------------------
/home/eregon/prefix/ruby-trunk/bin/ruby(rb_vm_bugreport+0xcf9) [0x55cbdfb9a1b9] vm_dump.c:704
/home/eregon/prefix/ruby-trunk/bin/ruby(rb_bug+0xd0) [0x55cbdfb8d200] error.c:595
/home/eregon/prefix/ruby-trunk/bin/ruby(rb_enc_set_index+0xfe) [0x55cbdfb77a4e] encoding.c:822
/home/eregon/code/ruby-pg/lib/pg_ext.so(pgconn_init+0x168) [0x7ff1d355d2c8]
/home/eregon/prefix/ruby-trunk/bin/ruby(vm_call0_body.constprop.314+0x288) [0x55cbdfaf0648] vm_eval.c:87
/home/eregon/prefix/ruby-trunk/bin/ruby(rb_call0+0xdf) [0x55cbdfaf168f] vm_eval.c:60
/home/eregon/prefix/ruby-trunk/bin/ruby(rb_funcallv+0x2b) [0x55cbdfaf1d7b] vm_eval.c:595
/home/eregon/prefix/ruby-trunk/bin/ruby(rb_class_s_new+0x21) [0x55cbdf9e9ad1] object.c:2153
/home/eregon/prefix/ruby-trunk/bin/ruby(vm_call_cfunc+0xfe) [0x55cbdfaea6ce] vm_insnhelper.c:1934
/home/eregon/prefix/ruby-trunk/bin/ruby(vm_call_method+0xe3) [0x55cbdfaedae3] vm_insnhelper.c:2424
/home/eregon/prefix/ruby-trunk/bin/ruby(vm_exec_core+0x13e) [0x55cbdfaf511e] /home/eregon/code/ruby/insns.def:779
/home/eregon/prefix/ruby-trunk/bin/ruby(vm_exec+0x96) [0x55cbdfaec266] vm.c:1807
/home/eregon/prefix/ruby-trunk/bin/ruby(ruby_exec_internal+0xc4) [0x55cbdf967c84] eval.c:261
/home/eregon/prefix/ruby-trunk/bin/ruby(ruby_run_node+0x2f) [0x55cbdf96c1ef] eval.c:325
/home/eregon/prefix/ruby-trunk/bin/ruby(main+0x5f) [0x55cbdf9670df] ./include/ruby/intern.h:295

I think one solution is to store the encoding in a separate field, for instance in t_pg_connection. If you think Ruby should keep supporting rb_enc_set_index() on non-encoding-capable objects, please file an issue at https://bugs.ruby-lang.org/

I'm just reporting this as I noticed the change by Koichi and knew from trying to run pg with TruffleRuby that pg uses rb_enc_set_index() on the PG::Connection object.

@ged
Copy link
Owner Author

ged commented Jul 17, 2018

Original comment by Lars Kanis (Bitbucket: larskanis, GitHub: larskanis).


Thank you @eregon for notifying us and for your analysis!

Storing the encoding in the metadata of a T_DATA object was a handy trick to save some space in the connection and the result. However if it's in the way of some future improvements in MRI, I think we can find an alternative solution for it. I'll check this and will make ruby-pg compatible with ruby-trunk.

Regarding TruffleRuby, please let me know when it is ready to be supported by ruby-pg. We already had some Rubinius compatibility in the past for parts that are too MRI specific.

Kind Regards,
Lars

@ged
Copy link
Owner Author

ged commented Jun 5, 2019

Original comment by Chris Bandy (Bitbucket: cbandy, GitHub: cbandy).


Is this still an issue? I am able to ruby -rpg -e 'p PG.connect(ENV["DATABASE_URL"])' on MRI ruby 2.6.3p62 (2019-04-16 revision 67580) [x86_64-linux].

@ged
Copy link
Owner Author

ged commented Jun 5, 2019

Original comment by Benoit Daloze (Bitbucket: eregon, GitHub: eregon).


Yes. The fix should be:

  • Use rb_enc_set_index() instead of PG_ENCODING_SET_NOCHECK, which is intrusive and bypasses sanity checks.
  • Then we should notice invalid cases where the encoding is not set on a encoding-capable object (only String/Regexp/IO/Symbol), as the backtrace above.

@ged ged modified the milestone: Pending Oct 8, 2019
@ged ged added the migrated Migrated from issues on Bitbucket and/or Rubyforge label Oct 8, 2019
@larskanis
Copy link
Collaborator

This is fixed by commit cfb90ef

@eregon
Copy link
Contributor

eregon commented Oct 9, 2019

Thanks for the fix and congrats on the GitHub migration.

larskanis added a commit to larskanis/ruby-pg that referenced this issue Oct 13, 2019
…bjects

The previous solution used MRI's internal bits to store the connection encoding.
This still works, but is only supported for String and some other types of VALUEs since ruby-2.6.
Non-encoding capable objects are no longer allowed to use these bits.

To save some bytes of struct size of PG::Result, we're using bit fields for enc_idx and autoclear.

This patch also removes the cache from pgconn_external_encoding().
It was useful in the past, when the encoding retrievel was slow.
In the meantime the cache did an improvement of factor 3 only.
Therefore as external_encoding is a very infrequently called method, there is no need for a cache any longer.

Fixes ged#280
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1.0.0 bug Library major migrated Migrated from issues on Bitbucket and/or Rubyforge
Projects
None yet
Development

No branches or pull requests

3 participants