Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Array text encoding #80

Merged
merged 4 commits into from
Oct 25, 2013
Merged

Array text encoding #80

merged 4 commits into from
Oct 25, 2013

Conversation

glenebob
Copy link
Contributor

Francisco,

I ran the test suite against 7.3.21, and discovered that array text encoding did not work, because the array[] construct was new in 7.4. So, I re-wrote the array text value encoding code, yet again. What is that, 37 times now? Now it uses the more universal '{"", ""}' syntax. The result is that the array tests run cleanly on all final revisions back to 7.3.21.

The different quoting and escaping combinations with plain text encoding will drive a person crazy.

-Glen

Encode text arrays using the '{"", ""}' syntax, rather than the array['', ''] syntax.
@glenebob
Copy link
Contributor Author

There is a caveat above. Tests TestBug1010488ArrayParameterWithNullValue and TestBug1010675ArrayParameterWithNullValue() now fail on < 8.2.

On < 8.2, using the syntax array[1, 2, NULL]::int4[] silently produces an empty array. Using the syntax '{1, 2, NULL}' produces an error.

On 8.2+, either syntax produces the expected array. It was a Postgres bug, not an Npgsql bug.

The tests are not complete. If they compared the output to the input, rather than simply looking for an error, the tests would have failed with the old array syntax, too.

Of course, Npgsql still can't handle null array elements on arrays of primitive types. That's a design flaw that appears to be particularly nasty to resolve without breaking existing code.

-Glen

@franciscojunior
Copy link
Member

That's excellent, Glen!

Do you think the array[] construct is easier to understand? Or has a better performance?

I know you already fixed in this PR to work on pre array[] construct, but I'm asking that because if the new syntax has some advantages, maybe we could add an exception to the case of servers below 7.4? This way we wouldn't need to support a lower performance format.

I don't know how many people use below 7.4 postgresql version, but I think it may be few compared to the number of 7.4+ installations.

What do you think? Sorry to add even more considerations to the mix of so many changes to array text value encoding code :)

@franciscojunior
Copy link
Member

Ah, and please, don't get me wrong. I'm ok with your approach of using old style array syntax. I just added the comment because maybe there would be benefits from using the new syntax instead of the old one and I'd like to say that I'm ok with adding an exceptional handling of the old syntax for servers below 7.4 only.

@glenebob
Copy link
Contributor Author

Hi Francisco,

Welcome back!

It didn't occur to me to compare performance between the two formats. I've
done that now, by running the text array speed test against my local 9.2
instance. The results:

master:

Elapsed: 00:10.02, CPU: 00:09.188458, Queries: 177; 17.66/second, 19.26/CPU
second
Elapsed: 00:10.05, CPU: 00:08.860856, Queries: 184; 18.30/second, 20.77/CPU
second
Elapsed: 00:10.00, CPU: 00:09.313259, Queries: 189; 18.89/second, 20.29/CPU
second
Elapsed: 00:10.04, CPU: 00:09.094858, Queries: 182; 18.13/second, 20.01/CPU
second

PR #80:

Elapsed: 00:10.02, CPU: 00:09.094858, Queries: 179; 17.86/second, 19.68/CPU
second
Elapsed: 00:10.02, CPU: 00:09.500460, Queries: 196; 19.56/second, 20.63/CPU
second
Elapsed: 00:10.04, CPU: 00:09.313259, Queries: 187; 18.61/second, 20.08/CPU
second
Elapsed: 00:10.05, CPU: 00:09.516061, Queries: 189; 18.79/second, 19.86/CPU
second

I'd call that a wash :)

As far as understandability goes, I don't really see a big difference.
But, keep in mind that PG doesn't ever appear to encode arrays using the
new format; if you do (SELECT array[1,2,3]), PG will always return it as
'{1,2,3}', so in that sense the old format seems a little easier to
understand. Also, in the extended mode, the new format isn't supported at
all. So, in total, using the newer format is a bit of an oddity, really.

You can also see that, while the text and bytea encoders got a little more
complicated, the array encoder is now a bit simpler, since it only knows
one format.

-Glen

On Thu, Oct 24, 2013 at 7:53 AM, Francisco Figueiredo Jr. <
notifications@github.com> wrote:

That's excellent, Glen!

Do you think the array[] construct is easier to understand? Or has a
better performance?

I know you already fixed in this PR to work on pre array[] construct, but
I'm asking that because if the new syntax has some advantages, maybe we
could add an exception to the case of servers below 7.4? This way we
wouldn't need to support a lower performance format.

I don't know how many people use below 7.4 postgresql version, but I think
it may be few compared to the number of 7.4+ installations.

What do you think? Sorry to add even more considerations to the mix of so
many changes to array text value encoding code :)


Reply to this email directly or view it on GitHubhttps://github.com//pull/80#issuecomment-26999108
.

@franciscojunior
Copy link
Member

Hi. Glen!

Thanks! The vacation was very nice. Now I'm refilled to work more on Npgsql :)

Excellent to know the new representation doesn't add any negative performance effects and in fact, as you are saying, it seems to be easier to understand as now we are dealing with only one format :)

Excellent work, Glen!

Expand use of the hash table to control all aspects of string and bytea quoting and escaping.
Optimize the hash function to make as few decisions as possible on current PG version.
@glenebob
Copy link
Contributor Author

This last commit seems to run a little faster, since there are now less decisions being made when encoding strings and bytea data to text.

@franciscojunior
Copy link
Member

Excellent, Glen!
Merging now...

franciscojunior added a commit that referenced this pull request Oct 25, 2013
Array text encoding

I ran the test suite against 7.3.21, and discovered that array text encoding did not work, because the array[] construct was new in 7.4. So, I re-wrote the array text value encoding code, yet again. What is that, 37 times now? Now it uses the more universal '{"", ""}' syntax. The result is that the array tests run cleanly on all final revisions back to 7.3.21.

The different quoting and escaping combinations with plain text encoding will drive a person crazy.

-Glen
@franciscojunior franciscojunior merged commit 06731d4 into npgsql:master Oct 25, 2013
@glenebob glenebob mentioned this pull request Oct 25, 2013
franciscojunior added a commit that referenced this pull request Oct 25, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants