Failing test case for PG Array parsing into UTF-8 strings #573

francois · 2012-10-25T20:24:47Z

I aggregate multiple rows into a PG Array and turning that into JSON. The problem is Sequel doesn't return UTF-8 strings, but ASCII-8BIT strings instead. Sequel already knows about the database's encoding, so I would assume it can return properly encoded strings.

The following gist allows us to reproduce the problem (Sequel 3.40.0): https://gist.github.com/3955094

Any advice on how to turn this into a passing spec?

jeremyevans · 2012-10-25T21:06:13Z

In general, Sequel doesn't deal with encodings at all. Encodings are the responsibility of the database driver, not Sequel.

In this case, the problem is not in Sequel, but in sequel_pg. Sequel's pure ruby PostgreSQL array parser appears to handle the encodings correctly, but the C version of the array parser in sequel_pg does not. There probably needs to be some rb_enc_associate_index calls to the strings created by rb_str_new near https://github.com/jeremyevans/sequel_pg/blob/master/ext/sequel_pg/sequel_pg.c#L166. I'll try to fix that in the near future.

I don't think we should add a spec for this to Sequel itself, since the encoding of the returned result depends on the encoding of the database, which is outside of Sequel's control. Your example is expected to fail on a default install of PostgreSQL (which has SQL_ASCII as the default database encoding), it's only expected to pass if the user manually sets the encoding of the PostgreSQL database to UTF8 (either by default for the cluster via initdb, or when using createdb).

I suppose you could check that the text version of the column is the same encoding as the members of the array, but that's such a specific check, and once the issue is fixed in sequel_pg, I think regressions are unlikely and such a spec is unwarranted.

jeremyevans · 2012-10-25T22:55:55Z

I've released a new version of sequel_pg, 1.6.1, that should fix the encoding issue. Please give it a shot and post an issue in the sequel_pg bug tracker if it doesn't work as expected.

francois · 2012-10-26T01:34:02Z

I didn't know where to report the bug, but with sequel_pg 1.6.1 this is fixed. Thanks for your prompt action! You really deserve your reputation for maintaining a great product!

Failing test case for PG Array parsing into UTF-8 strings

f0308ef

jeremyevans closed this Oct 25, 2012

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failing test case for PG Array parsing into UTF-8 strings #573

Failing test case for PG Array parsing into UTF-8 strings #573

francois commented Oct 25, 2012

jeremyevans commented Oct 25, 2012

jeremyevans commented Oct 25, 2012

francois commented Oct 26, 2012

Failing test case for PG Array parsing into UTF-8 strings #573

Failing test case for PG Array parsing into UTF-8 strings #573

Conversation

francois commented Oct 25, 2012

jeremyevans commented Oct 25, 2012

jeremyevans commented Oct 25, 2012

francois commented Oct 26, 2012