Failing test case for PG Array parsing into UTF-8 strings #573

Closed
wants to merge 1 commit into
from

Conversation

Projects
None yet
2 participants
Contributor

francois commented Oct 25, 2012

I aggregate multiple rows into a PG Array and turning that into JSON. The problem is Sequel doesn't return UTF-8 strings, but ASCII-8BIT strings instead. Sequel already knows about the database's encoding, so I would assume it can return properly encoded strings.

The following gist allows us to reproduce the problem (Sequel 3.40.0): https://gist.github.com/3955094

Any advice on how to turn this into a passing spec?

Owner

jeremyevans commented Oct 25, 2012

In general, Sequel doesn't deal with encodings at all. Encodings are the responsibility of the database driver, not Sequel.

In this case, the problem is not in Sequel, but in sequel_pg. Sequel's pure ruby PostgreSQL array parser appears to handle the encodings correctly, but the C version of the array parser in sequel_pg does not. There probably needs to be some rb_enc_associate_index calls to the strings created by rb_str_new near https://github.com/jeremyevans/sequel_pg/blob/master/ext/sequel_pg/sequel_pg.c#L166. I'll try to fix that in the near future.

I don't think we should add a spec for this to Sequel itself, since the encoding of the returned result depends on the encoding of the database, which is outside of Sequel's control. Your example is expected to fail on a default install of PostgreSQL (which has SQL_ASCII as the default database encoding), it's only expected to pass if the user manually sets the encoding of the PostgreSQL database to UTF8 (either by default for the cluster via initdb, or when using createdb).

I suppose you could check that the text version of the column is the same encoding as the members of the array, but that's such a specific check, and once the issue is fixed in sequel_pg, I think regressions are unlikely and such a spec is unwarranted.

Owner

jeremyevans commented Oct 25, 2012

I've released a new version of sequel_pg, 1.6.1, that should fix the encoding issue. Please give it a shot and post an issue in the sequel_pg bug tracker if it doesn't work as expected.

Contributor

francois commented Oct 26, 2012

I didn't know where to report the bug, but with sequel_pg 1.6.1 this is fixed. Thanks for your prompt action! You really deserve your reputation for maintaining a great product!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment