Make non-updating queries use `readonly` unless there was a mutation before #963

yrashk · 2022-12-15T01:24:05Z

Since we're approaching a series of SPI improvements, I think this is time to add this to the mix.

Problem: not using readonly flag for SPI commands

We're potentially losing out on some optimizations.

Solution: track use of mutating commands

In my unscientific benchmarks, I saw a 4-5% performance increase when doing repetitive SELECT queries as read-only vs. read-write.

This is an API-breaking change (again), but it would avoid leaving performance on the table for little convenience gain.

Note: I've made get_one/two/three helpers use update always as there's no distinction between select/update in these. While slightly suboptimal, this works and again shows the potential/subjective shortcomings of these helpers.

This PR will cause conflicts with #912 and #885 but they are not very difficult to resolve. Just need to figure out the order.

A message to pgx users: my ability to actively contribute to pgx largely depends on your support. Please consider sponsoring my work

We're potentially losing out on some optimizations. Solution: track use of mutating commands In my unscientific benchmarks, I saw a 4-5% performance increase when doing repetitive SELECT queries as read-only vs. read-write. This is an API breaking change (again) but it would avoid leaving performance on the table for little convenience gain.

However, we always assume them to be non-readonly. Solution: split the API to `open_cursor` and `open_cursor_mut` This way we can ensure we're squeezing out performance in the right place, when feasible.

workingjubilee · 2022-12-15T23:36:36Z

Note: timescale/timescaledb-toolkit#529

eeeebbbbrrrr · 2022-12-20T16:55:55Z

I just used GitHub's "resolve conflicts" UI. Lets see if things even compile.

eeeebbbbrrrr · 2022-12-20T17:04:07Z

The merge I did through GH's UI didn't compile (totally not surprised). Maybe it's close. It does seem tho that I can push to your branch, so let me try doing that on this one before you do anything else...

eeeebbbbrrrr · 2022-12-20T17:07:21Z

Yeah, scratch that. I don't understand what I see locally. I add your repo as a remote and checkout the yurii/spi-readonly branch, which puts me in a detached head mode. So I guess I can't push to it. I really suck at git.

yrashk · 2022-12-20T17:23:30Z

Please let me merge develop myself. I've merged the one with correct resolutions.

yrashk · 2022-12-20T17:32:38Z

Should be good now.

yrashk · 2022-12-20T17:33:57Z

It will need another (slightly non-trivial, but I can do it for sure) merge after #885 is in, as they will have a conflict with regard to cursors (nothing major, I'll fix it)

Solution: fix the signature

yrashk · 2022-12-20T23:01:45Z

Working on conflict resolution.

eeeebbbbrrrr

~~I think this is fine to merge as-is.~~ EDIT: found one little nit.

I'm just still unsure this even does it "right". The warning from the Postgres doc...

//    It is generally unwise to mix read-only and read-write commands within a single function
//    using SPI; that could result in very confusing behavior, since the read-only queries
//    would not see the results of any database updates done by the read-write queries.

... just isn't very helpful in guiding the user to determining which mode they ought to operate.

I interpret that note as saying "Within a SPI connection you need to decide if all queries are read only or not. Any mixture requires read_only=false."

I looked through SPI usages in the Postgres sources, and there aren't many, but they don't mix/match the read_only flag under the same SPI connection. So like, that doesn't give us much guidance here.

I bet we'll have some bug reports around this in the future. I have no idea how to handle it. The only thing that comes to mind is two different SpiConnections. One that is read_only=false, and one that is read_only=true, and the user has to pick up front which one they want.

So rather than the flag being on SpiClient it'd be on SpiConnection.

I don't like that idea very much but I wonder if it's the only way to best honor the warning from the Postgres sources? Not that the recent merges haven't been, but this idea would be a pretty big impact on our Spi API.

eeeebbbbrrrr · 2022-12-20T23:06:07Z

but this idea would be a pretty big impact on our Spi API.

Hmm. Maybe not. Maybe it's just adding a Spi::connect_mut(). All the convenience methods like Spi::get_one() would stay read_only=true, delegating to Spi::connect() like they do now.

Thoughts?

yrashk · 2022-12-20T23:09:12Z

I've updated this PR to be based off recent develop.

pgx/src/spi.rs

yrashk · 2022-12-20T23:12:13Z

I read the warning from PostgreSQL differently (and I may have seen further explanations elsewhere). The way I read it is that as long you haven't done any "read-write" ops, "readonly" is fine. This is what I've implemented. This is what my tests have also confirmed.

It is generally unwise to mix read-only and read-write commands within a single function using SPI; that could result in very confusing behavior, since the read-only queries would not see the results of any database updates done by the read-write queries.

(emphasis mine)

eeeebbbbrrrr · 2022-12-20T23:17:10Z

I mean, I suppose it could be saying, "once you go read_only=false, don't go back". If that's what they mean that'd be great. Where did you see other discussion around this? I think I can be convinced to interpret it that way because that does make sense. The outcome of "SELECT"s prior to the first "UPDATE" don't matter for the next "SELECT". What matters is that it see the outcome of that "UPDATE". Which yeah, this does.

It's unnecessary as client already has it (and error-prone) Solution: get it from the client

yrashk · 2022-12-20T23:22:01Z

I mean, I suppose it could be saying, "once you go read_only=false, don't go back". If that's what they mean that'd be great. Where did you see other discussion around this? I think I can be convinced to interpret it that way because that does make sense. The outcome of "SELECT"s prior to the first "UPDATE" don't matter for the next "SELECT". What matters is that it see the outcome of that "UPDATE". Which yeah, this does.

I'll try to find if there's something else, but I strongly believe this is the case. It makes sense and my reading of _SPI_execute_plan confirms this (what it does with snapshot manipulation)

eeeebbbbrrrr · 2022-12-20T23:25:00Z

It makes sense and my reading of _SPI_execute_plan confirms this (what it does with snapshot manipulation)

Do we need a test that tries to execute an UPDATE statement via a read_only=true SpiClient? I mean, we don't need to test that Postgres works, but probably that we're passing in the right flag. I'm guessing Postgres will raise a ERRCODE_READ_ONLY_SQL_TRANSACTION in this situation -- never tried.

Solution: write a test that shows it

yrashk · 2022-12-20T23:31:27Z

It makes sense and my reading of _SPI_execute_plan confirms this (what it does with snapshot manipulation)

Do we need a test that tries to execute an UPDATE statement via a read_only=true SpiClient? I mean, we don't need to test that Postgres works, but probably that we're passing in the right flag. I'm guessing Postgres will raise a ERRCODE_READ_ONLY_SQL_TRANSACTION in this situation -- never tried.

I wrote a test like this and pushed it to this PR.

eeeebbbbrrrr · 2022-12-20T23:33:25Z

I wrote a test like this and pushed it to this PR.

Yeah, nice.

I can't see anything else here. Can you? I'm now convinced this code interprets the Postgres docs correctly. And we'll have our discussion here in case something terrible pops up in the future that proves us wrong!

yrashk · 2022-12-20T23:52:57Z

I think this is good but there will be one issue to address in the future.

Namely, when you pass the client into PgTryBuilder for update (since mut ref can't cross it), you might want to be able to recover it. Generally speaking, this is the the territory that's most safe with SubTransaction and I've done this recovery mechanism already, and it's tiny and I'll be able to introduce it in #912 or as a followup.

I don't think this is an issue we can or should address right now.

yrashk added 2 commits December 14, 2022 17:19

Problem: cursors may or may not be readonly-friendly

01d1d24

However, we always assume them to be non-readonly. Solution: split the API to `open_cursor` and `open_cursor_mut` This way we can ensure we're squeezing out performance in the right place, when feasible.

eeeebbbbrrrr mentioned this pull request Dec 20, 2022

Build upon #902: Spi Error Handling #969

Merged

yrashk force-pushed the spi-readonly branch from 3a15b54 to d1b78d2 Compare December 20, 2022 17:22

Merge remote-tracking branch 'origin/develop' into spi-readonly

652dfa8

yrashk force-pushed the spi-readonly branch from d1b78d2 to 652dfa8 Compare December 20, 2022 17:31

Problem: one example fails to compile

d6695bc

Solution: fix the signature

eeeebbbbrrrr approved these changes Dec 20, 2022

View reviewed changes

Merge remote-tracking branch 'origin/develop' into spi-readonly

38c0a3d

eeeebbbbrrrr requested changes Dec 20, 2022

View reviewed changes

pgx/src/spi.rs Show resolved Hide resolved

Rename _phantom to __marker as per @eeeebbbbrrrr's request

a7ed61c

Problem: passing readonly flag to Query::execute

2d2cd6f

It's unnecessary as client already has it (and error-prone) Solution: get it from the client

Problem: how do we know we enforce readonly?

1038cb7

Solution: write a test that shows it

eeeebbbbrrrr merged commit e6135a9 into pgcentralfoundation:develop Dec 21, 2022

eeeebbbbrrrr mentioned this pull request Dec 27, 2022

SpiClient::select not seeing changes from Spi::run in same transaction #983

Closed

eeeebbbbrrrr mentioned this pull request Jan 10, 2023

Update version to 0.7.0-beta.0 #1000

Merged

sumerman mentioned this pull request Feb 15, 2023

Extend UTF-8 detection in PGX init and test #1041

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make non-updating queries use `readonly` unless there was a mutation before #963

Make non-updating queries use `readonly` unless there was a mutation before #963

yrashk commented Dec 15, 2022 •

edited

Loading

workingjubilee commented Dec 15, 2022

eeeebbbbrrrr commented Dec 20, 2022

eeeebbbbrrrr commented Dec 20, 2022

eeeebbbbrrrr commented Dec 20, 2022

yrashk commented Dec 20, 2022

yrashk commented Dec 20, 2022

yrashk commented Dec 20, 2022

yrashk commented Dec 20, 2022

eeeebbbbrrrr left a comment •

edited

Loading

eeeebbbbrrrr commented Dec 20, 2022

yrashk commented Dec 20, 2022

yrashk commented Dec 20, 2022 •

edited

Loading

eeeebbbbrrrr commented Dec 20, 2022

yrashk commented Dec 20, 2022

eeeebbbbrrrr commented Dec 20, 2022

yrashk commented Dec 20, 2022

eeeebbbbrrrr commented Dec 20, 2022

yrashk commented Dec 20, 2022

Make non-updating queries use readonly unless there was a mutation before #963

Make non-updating queries use readonly unless there was a mutation before #963

Conversation

yrashk commented Dec 15, 2022 • edited Loading

workingjubilee commented Dec 15, 2022

eeeebbbbrrrr commented Dec 20, 2022

eeeebbbbrrrr commented Dec 20, 2022

eeeebbbbrrrr commented Dec 20, 2022

yrashk commented Dec 20, 2022

yrashk commented Dec 20, 2022

yrashk commented Dec 20, 2022

yrashk commented Dec 20, 2022

eeeebbbbrrrr left a comment • edited Loading

Choose a reason for hiding this comment

eeeebbbbrrrr commented Dec 20, 2022

yrashk commented Dec 20, 2022

yrashk commented Dec 20, 2022 • edited Loading

eeeebbbbrrrr commented Dec 20, 2022

yrashk commented Dec 20, 2022

eeeebbbbrrrr commented Dec 20, 2022

yrashk commented Dec 20, 2022

eeeebbbbrrrr commented Dec 20, 2022

yrashk commented Dec 20, 2022

Make non-updating queries use `readonly` unless there was a mutation before #963

Make non-updating queries use `readonly` unless there was a mutation before #963

yrashk commented Dec 15, 2022 •

edited

Loading

eeeebbbbrrrr left a comment •

edited

Loading

yrashk commented Dec 20, 2022 •

edited

Loading