Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make non-updating queries use readonly unless there was a mutation before #963

Merged
merged 8 commits into from
Dec 21, 2022

Conversation

yrashk
Copy link
Contributor

@yrashk yrashk commented Dec 15, 2022

Since we're approaching a series of SPI improvements, I think this is time to add this to the mix.


Problem: not using readonly flag for SPI commands

We're potentially losing out on some optimizations.

Solution: track use of mutating commands

In my unscientific benchmarks, I saw a 4-5% performance increase when doing repetitive SELECT queries as read-only vs. read-write.

This is an API-breaking change (again), but it would avoid leaving performance on the table for little convenience gain.


Note: I've made get_one/two/three helpers use update always as there's no distinction between select/update in these. While slightly suboptimal, this works and again shows the potential/subjective shortcomings of these helpers.


This PR will cause conflicts with #912 and #885 but they are not very difficult to resolve. Just need to figure out the order.


A message to pgx users: my ability to actively contribute to pgx largely depends on your support. Please consider sponsoring my work

We're potentially losing out on some optimizations.

Solution: track use of mutating commands

In my unscientific benchmarks, I saw a 4-5% performance increase when
doing repetitive SELECT queries as read-only vs. read-write.

This is an API breaking change (again) but it would avoid leaving
performance on the table for little convenience gain.
However, we always assume them to be non-readonly.

Solution: split the API to `open_cursor` and `open_cursor_mut`

This way we can ensure we're squeezing out performance in the right
place, when feasible.
@workingjubilee
Copy link
Member

Note: timescale/timescaledb-toolkit#529

@eeeebbbbrrrr
Copy link
Contributor

I just used GitHub's "resolve conflicts" UI. Lets see if things even compile.

@eeeebbbbrrrr
Copy link
Contributor

The merge I did through GH's UI didn't compile (totally not surprised). Maybe it's close. It does seem tho that I can push to your branch, so let me try doing that on this one before you do anything else...

@eeeebbbbrrrr
Copy link
Contributor

Yeah, scratch that. I don't understand what I see locally. I add your repo as a remote and checkout the yurii/spi-readonly branch, which puts me in a detached head mode. So I guess I can't push to it. I really suck at git.

@yrashk
Copy link
Contributor Author

yrashk commented Dec 20, 2022

Please let me merge develop myself. I've merged the one with correct resolutions.

@yrashk
Copy link
Contributor Author

yrashk commented Dec 20, 2022

Should be good now.

@yrashk
Copy link
Contributor Author

yrashk commented Dec 20, 2022

It will need another (slightly non-trivial, but I can do it for sure) merge after #885 is in, as they will have a conflict with regard to cursors (nothing major, I'll fix it)

Solution: fix the signature
@yrashk
Copy link
Contributor Author

yrashk commented Dec 20, 2022

Working on conflict resolution.

Copy link
Contributor

@eeeebbbbrrrr eeeebbbbrrrr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine to merge as-is. EDIT: found one little nit.

I'm just still unsure this even does it "right". The warning from the Postgres doc...

//    It is generally unwise to mix read-only and read-write commands within a single function
//    using SPI; that could result in very confusing behavior, since the read-only queries
//    would not see the results of any database updates done by the read-write queries.

... just isn't very helpful in guiding the user to determining which mode they ought to operate.

I interpret that note as saying "Within a SPI connection you need to decide if all queries are read only or not. Any mixture requires read_only=false."

I looked through SPI usages in the Postgres sources, and there aren't many, but they don't mix/match the read_only flag under the same SPI connection. So like, that doesn't give us much guidance here.

I bet we'll have some bug reports around this in the future. I have no idea how to handle it. The only thing that comes to mind is two different SpiConnections. One that is read_only=false, and one that is read_only=true, and the user has to pick up front which one they want.

So rather than the flag being on SpiClient it'd be on SpiConnection.

I don't like that idea very much but I wonder if it's the only way to best honor the warning from the Postgres sources? Not that the recent merges haven't been, but this idea would be a pretty big impact on our Spi API.

@eeeebbbbrrrr
Copy link
Contributor

but this idea would be a pretty big impact on our Spi API.

Hmm. Maybe not. Maybe it's just adding a Spi::connect_mut(). All the convenience methods like Spi::get_one() would stay read_only=true, delegating to Spi::connect() like they do now.

Thoughts?

@yrashk
Copy link
Contributor Author

yrashk commented Dec 20, 2022

I've updated this PR to be based off recent develop.

pgx/src/spi.rs Show resolved Hide resolved
@yrashk
Copy link
Contributor Author

yrashk commented Dec 20, 2022

I read the warning from PostgreSQL differently (and I may have seen further explanations elsewhere). The way I read it is that as long you haven't done any "read-write" ops, "readonly" is fine. This is what I've implemented. This is what my tests have also confirmed.

It is generally unwise to mix read-only and read-write commands within a single function using SPI; that could result in very confusing behavior, since the read-only queries would not see the results of any database updates done by the read-write queries.

(emphasis mine)

@eeeebbbbrrrr
Copy link
Contributor

I mean, I suppose it could be saying, "once you go read_only=false, don't go back". If that's what they mean that'd be great. Where did you see other discussion around this? I think I can be convinced to interpret it that way because that does make sense. The outcome of "SELECT"s prior to the first "UPDATE" don't matter for the next "SELECT". What matters is that it see the outcome of that "UPDATE". Which yeah, this does.

It's unnecessary as client already has it (and error-prone)

Solution: get it from the client
@yrashk
Copy link
Contributor Author

yrashk commented Dec 20, 2022

I mean, I suppose it could be saying, "once you go read_only=false, don't go back". If that's what they mean that'd be great. Where did you see other discussion around this? I think I can be convinced to interpret it that way because that does make sense. The outcome of "SELECT"s prior to the first "UPDATE" don't matter for the next "SELECT". What matters is that it see the outcome of that "UPDATE". Which yeah, this does.

I'll try to find if there's something else, but I strongly believe this is the case. It makes sense and my reading of _SPI_execute_plan confirms this (what it does with snapshot manipulation)

@eeeebbbbrrrr
Copy link
Contributor

It makes sense and my reading of _SPI_execute_plan confirms this (what it does with snapshot manipulation)

Do we need a test that tries to execute an UPDATE statement via a read_only=true SpiClient? I mean, we don't need to test that Postgres works, but probably that we're passing in the right flag. I'm guessing Postgres will raise a ERRCODE_READ_ONLY_SQL_TRANSACTION in this situation -- never tried.

Solution: write a test that shows it
@yrashk
Copy link
Contributor Author

yrashk commented Dec 20, 2022

It makes sense and my reading of _SPI_execute_plan confirms this (what it does with snapshot manipulation)

Do we need a test that tries to execute an UPDATE statement via a read_only=true SpiClient? I mean, we don't need to test that Postgres works, but probably that we're passing in the right flag. I'm guessing Postgres will raise a ERRCODE_READ_ONLY_SQL_TRANSACTION in this situation -- never tried.

I wrote a test like this and pushed it to this PR.

@eeeebbbbrrrr
Copy link
Contributor

I wrote a test like this and pushed it to this PR.

Yeah, nice.

I can't see anything else here. Can you? I'm now convinced this code interprets the Postgres docs correctly. And we'll have our discussion here in case something terrible pops up in the future that proves us wrong!

@yrashk
Copy link
Contributor Author

yrashk commented Dec 20, 2022

I think this is good but there will be one issue to address in the future.

Namely, when you pass the client into PgTryBuilder for update (since mut ref can't cross it), you might want to be able to recover it. Generally speaking, this is the the territory that's most safe with SubTransaction and I've done this recovery mechanism already, and it's tiny and I'll be able to introduce it in #912 or as a followup.

I don't think this is an issue we can or should address right now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants