Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SpiTupleTable is unsound #1209

Open
david-monroe opened this issue Jul 13, 2023 · 5 comments
Open

SpiTupleTable is unsound #1209

david-monroe opened this issue Jul 13, 2023 · 5 comments

Comments

@david-monroe
Copy link

david-monroe commented Jul 13, 2023

While investigating memory reclamation of SpiTupleTable, I realized that its lifetime is not bound by the lifetime of the surrounding SPI connection. Therefore, SPI_finish frees the memory referenced by the SpiTupleTable without restricting the lifetime of the SpiTupleTable.

CREATE OR REPLACE FUNCTION three()
    RETURNS TEXT LANGUAGE plrust
AS
$$
    // create the cursor we actually care about
    let mut res = Spi::connect(|c| c.open_cursor("select 'hello' from generate_series(1, 10000)", None).fetch(10000).unwrap());

    // here we just perform some allocations to make sure that the previous cursor gets invalidated
    for i in 0..1000 {
        Spi::connect(|c| c.open_cursor("select 1", None).fetch(1).unwrap());
    }

    // later elements are probably more likely to point to deallocated memory
    for i in 0..1000 {
        res.next();
    }

    // segfault
    Ok(res.next().unwrap().get::<String>(1)?)
$$;

select three();
2023-07-13 17:37:11.654 UTC [51] LOG:  server process (PID 58) was terminated by signal 11: Segmentation fault
2023-07-13 17:37:11.654 UTC [51] DETAIL:  Failed process was running: select three()
2023-07-13 17:37:11.654 UTC [51] LOG:  terminating any other active server processes
2023-07-13 17:37:11.655 UTC [51] LOG:  all server processes terminated; reinitializing

While this example uses cursors, the same is possibly true for plain queries as well.

@eeeebbbbrrrr
Copy link
Contributor

Thanks. We'll look into this immediately.

@eeeebbbbrrrr eeeebbbbrrrr added the bug Something isn't working label Jul 13, 2023
eeeebbbbrrrr added a commit to eeeebbbbrrrr/pgrx that referenced this issue Jul 13, 2023
Cleanup lifetimes such that a `Spi::connect()` block can't returned things that should be borrowed from the lifetime of that SPI connection, which includes cursors (`SpiCursor`).
@eeeebbbbrrrr
Copy link
Contributor

eeeebbbbrrrr commented Jul 13, 2023

#1210 fixes this such that the above code won't compile anymore, which I think is the correct answer here.

My pgrx-based interpretation of what this code should look like is:

#[pg_extern]
fn issue1209() -> Result<Option<String>, Box<dyn std::error::Error>> {
    let res = Spi::connect(|c| {
        let mut cursor = c.open_cursor("SELECT 'hello' FROM generate_series(1, 10000)", None);
        let table = cursor.fetch(10000)?;
        table.into_iter().map(|row| row.get(1)).collect::<Result<Vec<_>, _>>()
    })?;

    Ok(res.first().cloned().flatten())
}

The function body ought to be paste-able into a plrust function.

eeeebbbbrrrr added a commit that referenced this issue Jul 13, 2023
* Fix issue #1209

Cleanup lifetimes such that a `Spi::connect()` block can't return things that should be borrowed from the lifetime of that SPI connection, which includes cursors (`SpiCursor`).

Specifically, we were able to return a `SpiTupleTable` from within the `Spi::connect(|c| ...)` closure, which is strictly wrong.  We were also able to return `PreparedStatement`s, which is also wrong.

We also ensure that datums are `FromDatum`'d in the proper memory context when retrieving from an `SpiHeapTupleEntry`.
@workingjubilee
Copy link
Member

workingjubilee commented Jul 14, 2023

We added some positive tests in #1210, but to be completely sure, we want a negative test that confirms compile-fail on the now-banned usage. This will require setting up ui_test or something similar.

eeeebbbbrrrr added a commit that referenced this issue Jul 14, 2023
* Fix issue #1209

Cleanup lifetimes such that a `Spi::connect()` block can't return things that should be borrowed from the lifetime of that SPI connection, which includes cursors (`SpiCursor`).

Specifically, we were able to return a `SpiTupleTable` from within the `Spi::connect(|c| ...)` closure, which is strictly wrong.  We were also able to return `PreparedStatement`s, which is also wrong.

We also ensure that datums are `FromDatum`'d in the proper memory context when retrieving from an `SpiHeapTupleEntry`.
@eeeebbbbrrrr
Copy link
Contributor

Hi @david-monroe! We've released pgrx v0.9.8 with the fix for this, and I'm wrapping up a v0.10.0-beta.1 as well.

Tomorrow (Friday July 14, 2023) we'll put out a new PL/Rust with this fix. I need some time to incorporate some unit tests over there first.

Thanks for the report and the re-creates. Please let us know about any other soundness and/or usability issues you run across.

@david-monroe
Copy link
Author

david-monroe commented Jul 14, 2023

I'm surprised that you think this issue is closed. Yes, the approach above no longer works, but it seems that you have not thoroughly addressed the lifetime issues.

To issues that came to mind:

  1. By allocating into the parent context, you're now leaking all &str, &[u8] until the end of the transaction. It's no longer possible to use plrust for large datasets containing such types.
CREATE OR REPLACE FUNCTION four()
    RETURNS TEXT LANGUAGE plrust
AS
$$
    let tmp: &'static str = Spi::connect(|c| {
        c.select("select repeat('x', 1024 * 1024)", None, None).unwrap().next().unwrap().get(1).unwrap().unwrap()
    });
    Ok(None)
$$;

begin;

select four(); -- every call to four increases memory usage by 1MB.
  1. It's still possible to corrupt memory by using the incorrect FromDatum implementation:
CREATE OR REPLACE FUNCTION three()
    RETURNS TEXT LANGUAGE plrust
AS
$$
    use std::cell::RefCell;
    thread_local! {
        static RES: RefCell<Option<&'static str>> = RefCell::new(None);
    }

    RES.with(|res| {
        let mut res = res.borrow_mut();
        if res.is_none() {
            let tmp: &'static str = Spi::connect(|c| {
                c.select("select repeat('x', 1024 * 1024)", None, None).unwrap().next().unwrap().get(1).unwrap().unwrap()
            });
            *res = Some(tmp);
        }
        Ok(Some(res.unwrap().to_string()))
    })
$$;

select three();
2023-07-14 09:15:27.875 UTC [51] LOG:  server process (PID 3961) was terminated by signal 11: Segmentation fault
2023-07-14 09:15:27.875 UTC [51] DETAIL:  Failed process was running: select three()

This is a bit hard to reproduce. But it's clear from the code what is going wrong.


I think a thorough solution to this would also ensure that SpiTupleTable properly deallocates memory. If SpiTupleTable properly managed its memory and extracted datums were bound by the lifetime of the SpiTupleTable, then you would not have to allocate into the parent context and neither would you be able to ever construct a &'static str.

eeeebbbbrrrr added a commit to eeeebbbbrrrr/pgrx that referenced this issue Jul 14, 2023
The intent of SPI has been that it use whatever `CurrentMemoryContext` is prior to calling `Spi::connect()` to allocate Datums.  This allows the `Spi::connect()` closure to return type instances that might be backed by Postgres-allocated memory (`&str` and `&[u8]`, specifically).

While operating in Spi-land, we're inside a short-lived "SPI Proc" memory context -- we enter it upon `SPI_connect()` and it's free'd at `SPI_finish()`.  Unfortunately, an incorrect assumption was made that this memory context's parent is `CurrentMemoryContext` at the time `SPI_connect()` is called.  This is not the case.

Postgres attaches this "SPI Proc" context to `TopTransactionContext` instead.  As such, pgrx' use of `PgMemoryContexts::CurrentMemoryContext.parent()` to find the context for datum allocation is the wrong one.

This PR adjusts the Spi wrapper internals such that the `SpiClient` holds onto a `PgMemoryContext` and that it is set to `CurrentMemoryContext` **before** we call `SPI_connect().`

This resolves the first complaint from pgcentralfoundation#1209 (comment) that Spi is leaky.
eeeebbbbrrrr added a commit that referenced this issue Sep 5, 2023
This is the final release of v0.10.0. Thanks everyone for the beta
testing, pull requests, issues, and patience.

As always, install `cargo-pgrx` with `cargo install cargo-pgrx --locked`
and update your extension Cargo.toml files to use the `0.10.0` pgrx
dependencies.

This release includes support for Postgres 16RC1. Support for the
previous betas has been removed. As such, a fresh `cargo pgrx init` is
required.

## What's Changed Since v0.10.0-beta.4

* Fix `GetMemoryChunkContext` port by @workingjubilee in
#1273
* Better error messages when `pg_config` isn't found. by @eeeebbbbrrrr
in #1271
* Make `PostgresHash` also need `Eq` by @workingjubilee in
#1264
* Memoize git hash and extension metadata by @levkk in
#1274
* move to pg16rc1 by @eeeebbbbrrrr in
#1276
* Fix bgworker template up to 0.10.0-beta.4 by @workingjubilee in
#1270

## New Contributors
* @levkk made their first contribution in
#1274

**Changelog**:
v0.10.0-beta.4...v0.10.0

---

v0.10.0's full set of changes throughout the entire beta period are:

* Postgres 16beta1 Support by @eeeebbbbrrrr in
#1169
* Support building against macOS universal binaries by @clowder in
#1166
* list specific versions in feature gates by @eeeebbbbrrrr in
#1175
* Fix bug with converting a `pg_sys::Datum` into a `pgrx::Date` by
@eeeebbbbrrrr in #1177
* Fix Arrays with leading nulls by @eeeebbbbrrrr in
#1180
* Disable hello_versioned_so test by @workingjubilee in
#1192
* doc: fix link broken by @yihong0618 in
#1181
* fcinfo: fix incorrect length set in unsafe code by @Sasasu in
#1190
* update to pg16beta2 support by @eeeebbbbrrrr in
#1188
* Array-walking is aligned by @workingjubilee in
#1191
* Implement PGRXSharedMemory for Deque by @feikesteenbergen in
#1170
* Include security labels header by @daamien in
#1189
* Fixes macos-11 tests by @BradyBonnette in
#1197
* Pgcentralfoundation updates again by @eeeebbbbrrrr in
#1200
* Update version to 0.10.0-beta.0 by @eeeebbbbrrrr in
#1201
* Testing help by @eeeebbbbrrrr in
#1203
* Type testability cleanup by @eeeebbbbrrrr in
#1204
* Try to smartly propagate fs errors by @workingjubilee in
#1186
* Fix issue #1209 by @eeeebbbbrrrr in
#1210
* Type roundtrip tests by @eeeebbbbrrrr in
#1185
* Update version to 0.10.0-beta.1 by @eeeebbbbrrrr in
#1213
* Add a workaround for the pg16/homebrew/icu4c situation by @thomcc in
#1206
* Add security policy by @johnrballard in
#1207
* `AnyNumeric` is no longer backed by Postgres-allocated memory by
@eeeebbbbrrrr in #1216
* Modularize pgrx::spi by @workingjubilee in
#1219
* Stop SpiClient soundness from regressing by @workingjubilee in
#1214
* Add foreign table headers by @workingjubilee in
#1226
* Modularize the interior of pgrx-pg-sys by @workingjubilee in
#1227
* Initial valgrind support by @thomcc in
#1218
* Add support for handling SIGINT and SIGCHLD from bgworker by @JelteF
in #1229
* Ignores UI tests for MUSL environments by @BradyBonnette in
#1235
* Add a env flag that can be set to skip `#[pg_test]`-generated tests.
by @thomcc in #1239
* Fix issue #1076: Properly handle dependency graph of `Result<T, _>` by
@eeeebbbbrrrr in #1241
* Cleanup the error when cargo-pgrx version doesn't match Cargo.toml by
@eeeebbbbrrrr in #1240
* Add operator and cache related api by @VoVAllen in
#1242
* Addresses cargo-pgrx error reporting by @BradyBonnette in
#1238
* Update version to 0.10.0-beta.2 by @eeeebbbbrrrr in
#1244
* Bump cargo-metadata and clap-cargo by @thomcc in
#1246
* Derive Clone for Inet by @JelteF in
#1251
* Correct docs for datetime `From` impls by @workingjubilee in
#1253
* Only enable line tables for profile.dev by @thomcc in
#1249
* Remove references to master branch by @thomcc in
#1243
* Ensure bindgen gets all the `cppflags` it needs (on macOS, anyway) by
@thomcc in #1247
* update for pg16beta3 support by @eeeebbbbrrrr in
#1254
* Update version to 0.10.0-beta.3 by @eeeebbbbrrrr in
#1255
* Add proptest support by @workingjubilee in
#1258
* Misc reformatting and typo fixes by @workingjubilee in
#1260
* spi: simplify (optimize?) Datum preparation by @vrmiguel in
#1256
* Assume commutation when deriving PostgresEq by @workingjubilee in
#1261
* Demand Ord for PostgresOrd by @workingjubilee in
#1262
* Fix pgrx install causing postgresql coredump by @Sasasu in
#1263
* Update version to 0.10.0-beta.4 by @workingjubilee in
#1267

## New Contributors
* @clowder made their first contribution in
#1166
* @yihong0618 made their first contribution in
#1181
* @Sasasu made their first contribution in
#1190
* @daamien made their first contribution in
#1189
* @johnrballard made their first contribution in
#1207
* @VoVAllen made their first contribution in
#1242
* @vrmiguel made their first contribution in
#1256


**Full Changelog**:
v0.9.8...v0.10.0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants