Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failing test case for PG Array parsing into UTF-8 strings #573

Closed
wants to merge 1 commit into from
Closed

Failing test case for PG Array parsing into UTF-8 strings #573

wants to merge 1 commit into from

Conversation

francois
Copy link
Contributor

I aggregate multiple rows into a PG Array and turning that into JSON. The problem is Sequel doesn't return UTF-8 strings, but ASCII-8BIT strings instead. Sequel already knows about the database's encoding, so I would assume it can return properly encoded strings.

The following gist allows us to reproduce the problem (Sequel 3.40.0): https://gist.github.com/3955094

Any advice on how to turn this into a passing spec?

@jeremyevans
Copy link
Owner

In general, Sequel doesn't deal with encodings at all. Encodings are the responsibility of the database driver, not Sequel.

In this case, the problem is not in Sequel, but in sequel_pg. Sequel's pure ruby PostgreSQL array parser appears to handle the encodings correctly, but the C version of the array parser in sequel_pg does not. There probably needs to be some rb_enc_associate_index calls to the strings created by rb_str_new near https://github.com/jeremyevans/sequel_pg/blob/master/ext/sequel_pg/sequel_pg.c#L166. I'll try to fix that in the near future.

I don't think we should add a spec for this to Sequel itself, since the encoding of the returned result depends on the encoding of the database, which is outside of Sequel's control. Your example is expected to fail on a default install of PostgreSQL (which has SQL_ASCII as the default database encoding), it's only expected to pass if the user manually sets the encoding of the PostgreSQL database to UTF8 (either by default for the cluster via initdb, or when using createdb).

I suppose you could check that the text version of the column is the same encoding as the members of the array, but that's such a specific check, and once the issue is fixed in sequel_pg, I think regressions are unlikely and such a spec is unwarranted.

@jeremyevans
Copy link
Owner

I've released a new version of sequel_pg, 1.6.1, that should fix the encoding issue. Please give it a shot and post an issue in the sequel_pg bug tracker if it doesn't work as expected.

@francois
Copy link
Contributor Author

I didn't know where to report the bug, but with sequel_pg 1.6.1 this is fixed. Thanks for your prompt action! You really deserve your reputation for maintaining a great product!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants