New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Postico doesn't properly display Unicode sequences with U+000000 #523

Closed
andrewgross opened this Issue Apr 17, 2018 · 7 comments

Comments

Projects
None yet
3 participants
@andrewgross

andrewgross commented Apr 17, 2018

You can use this template for reporting bugs.
Feel free to remove parts that are not relevant, or just write something free-form if you prefer.

What did you do?

When viewing data with the NULL bye, U+000000, Postico will truncate display of that data instead of properly interpreting the character sequence.

Here is the data in Postico:

Here is the data in Python:

Notice the Python data is significantly longer. Here is a JSON file with the sample data.

{"unicode_string": "\u5361\u6c76\u59ae 2014\u79cb\u51ac\u65b0\u6b3e\u5973\u88c5\u4fee\u8eab\u663e\u7626\u9542\u7a7a\u9488\u0000\u7ec7\u0000\u6253\u5e95\u886b \u591a\u0000\u8272\u0000\u9ad8\u0000\u5f39 \u7d2b\u8272 \u5747\u7801", "simple_string": "foo", "simple_string2": "bar"}

If you happen to have access to Redshift, you can reproduce this easily. Here is a link to a Gist covering the exact issue with relation to Redshift, and also steps to reproduce.

What did you expect to happen?

I expected to see the full unicode string displayed, instead of only up to the first NULL byte.

What actually happened?

I only see a partial string, which stops at the first NULL byte.

What software versions are you using?

Postico version: 1.3.3
(select "About Postico" from the Postico menu)

macOS version: 10.12.6
(selecting "About this Mac" from the Apple menu)

PostgreSQL version: PostgreSQL 8.0.2 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.4.2 20041017 (Red Hat 3.4.2-6.fc3), Redshift 1.0.1980
(execute the command "SELECT VERSION();" to find out)

@andrewgross

This comment has been minimized.

andrewgross commented Apr 17, 2018

Not a 100% sure if this is a Postico issue or a macOS issue. Not looking forward to dealing with Apple.

@jakob

This comment has been minimized.

Owner

jakob commented Apr 18, 2018

It's a Postico issue. PostgreSQL does not allow zero characters in strings (at least in the default UNICODE encoding), so Postico assumes that textual data returned from the server does not contain null bytes.

It should be an easy fix.

@andrewgross

This comment has been minimized.

andrewgross commented Apr 18, 2018

Got it, chalk up another one to Redshift weirdness. Thanks for investigating.

@jakob

This comment has been minimized.

Owner

jakob commented Apr 18, 2018

I've done some investigation, and I'm pretty it's a bug in Redshifts data loading. VARCHARs shouldn't contain zero bytes.

I've tried to insert a string with a 0 byte into a VARCHAR column. PostgreSQL absolutely refuses to do that (raises an error: invalid byte sequence for encoding "UTF8": 0x00, no matter how hard I try). I even tried using a hex editor to insert a 0 byte in the middle of the string. If I do that, PostgreSQL truncates the string there. So it seems in PostgreSQL, there is no way to get a 0 byte into a VARCHAR.

I also tried inserting 0 bytes into Redshift. I couldn't find a way to do that either with normal INSERT statements. Redshift just truncated the string at the zero byte.

But somehow you did manage to insert a zero byte, and it seemed like an easy fix to make Postico accept zero bytes, so I did that. Postico build bot should post a link as soon as CI is done testing.

I had no way to test if the fix actually works, so I'd appreciate your feedback.

@andrewgross

This comment has been minimized.

andrewgross commented Apr 18, 2018

Awesome, thanks for digging in. To reproduce on Redshift I had to use the COPY command on JSON. Writing the following to a JSON file worked for me:

{"unicode_string": "\u5361\u6c76\u59ae 2014\u79cb\u51ac\u65b0\u6b3e\u5973\u88c5\u4fee\u8eab\u663e\u7626\u9542\u7a7a\u9488\u0000\u7ec7\u0000\u6253\u5e95\u886b \u591a\u0000\u8272\u0000\u9ad8\u0000\u5f39 \u7d2b\u8272 \u5747\u7801", "simple_string": "foo", "simple_string2": "bar"}

As far as I can tell Redshift is doing some weird stuff with the COPY/UNLOAD commands that gets around the normal Postgres ingestion routine, hence the ability to ingest \u0000

@postico-bot

This comment has been minimized.

postico-bot commented Apr 18, 2018

We have been working on this issue!

jakob: Accept invalid strings with \0 bytes #523

Download Build B2610, or check out the list of recent builds.

@andrewgross

This comment has been minimized.

andrewgross commented Apr 18, 2018

Excellent, it works now!

Current (Buildbot Build) Version

Failing Version (1.3.3):

Thanks for the quick turnaround!

@jakob jakob closed this Apr 25, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment