fix: make DB Buffer use the up to date schema #25001

mgattozzi · 2024-05-14T16:40:14Z

Alternate Title: The DB Schema only ever has one table

This is a story of subtle bugs, gnashing of teeth, and hair pulling. Gather round as I tell you the tale of of an Arc that pointed to an outdated schema.

In #24954 we introduced an Index for the database as this will allow us to perform faster queries. When we added that code this check was added:

if !self.table_buffers.contains_key(&table_name) {
    // TODO: this check shouldn't be necessary. If the table doesn't exist in the catalog
    // and we've gotten here, it means we're dropping a write.
    if let Some(table) = self.db_schema.get_table(&table_name) {
        self.table_buffers.insert(
            table_name.clone(),
            TableBuffer::new(segment_key.clone(), &table.index_columns()),
        );
    } else {
        return;
    }
}

Adding the return there let us continue on with our day and make the tests pass. However, just because these tests passed didn't mean the code was correct as I would soon find out. With a follow up ticket of #24955 created we merged the changes and I began to debug the issue.

Note we had the assumption of dropping a single write due to limits because the limits test is what failed. What began was a chase of a few days to prove that the limits weren't what was failing. This was a bit long but the conclusion was that the limits weren't causing it, but it did expose the fact that a Database only ever had one table which was weird.

I then began to dig into this a bit more. Why would there only be one table? We weren't just dropping one write, we were dropping all but one write or so it seemed. Many printlns/hours later it became clear that we were actually updating the schema! It existed in the Catalog, but not in the pointer to the schema in the DatabaseBuffer struct so what gives?

Well we need to look at another piece of code.

In the validate_or_insert_schema_and_partitions function for the WriteBuffer we have this bit of code:

// The (potentially updated) DatabaseSchema to return to the caller.
let mut schema = Cow::Borrowed(schema);

As we pass in a reference to the schema in the catalog. However, when we go a bit further down we see this code:

    let schema = match schema {
        Cow::Owned(s) => Some(s),
        Cow::Borrowed(_) => None,
    };

What this means is that if we make a change we clone the original and update it. We aren't making a change to the original schema. When we go back up the call stack we get to this bit of code:

    if let Some(schema) = result.schema.take() {
        debug!("replacing schema for {:?}", schema);


        catalog.replace_database(sequence, Arc::new(schema))?;
    }

We are updating the catalog with the new schema, but how does that work?

        inner.databases.insert(db.name.clone(), db);

Oh. Oh no. We're just overwriting it. Which means that the DatabaseBuffer has an Arc to the old schema, not the new one. Which means that the buffer will get the first copy of the schema with the first new table, but none of the other ones. The solution is to make sure that the buffer has a pointer to the Catalog instead which means the DatabaseBuffer will have the most up to date schema and instead lets only the Catalog handle the schema itself. This commit makes those changes to make sure it works.

This was a very very subtle mutability/pointer bug given the intersection of valid borrow checking and some writes making it in, but luckily we caught it. It does mean though that until this fix is in, we can consider changes between the Index PR and now are subtly broken and shouldn't be used for anything beyond writing to a signle table per DB.

TL;DR We should ask the Catalog what the schema is as it contains the up to date version of it.

Closes #24955

mgattozzi · 2024-05-14T16:43:16Z

Alternatively we can put the schema behind a RwLock and update it that way instead, but I figured this would also be fine.

hiltontj

Eesh... that was a doozie, thanks for sticking it out. One suggestion in-line, but otherwise LGTM. I am sure @pauldix will want to review as well, though.

influxdb3_write/src/write_buffer/buffer_segment.rs

pauldix

Ahhhh that's a tricky one. There's one concern I have with this fix. Now every buffer into a table will call into the catalog to grab the db schema. So that's grabbing a read lock and doing a hash table lookup to get the database.

Most of the time when users do writes, they write into many tables at the same time. So that means we'd do this hash lookup for the DB and grabbing of the lock for every table that gets written into (this can be in the hundreds or thousands). The schema gets updated a single time for this (and thus the catalog gets updated).

Ideally, the buffer_table_batch function would take the db_schema as an argument, that way the buffer_writes function on OpenBufferSegment could lookup the schema a single time and then pass that down to all the calls to buffer_table_batch.

mgattozzi · 2024-05-16T14:17:01Z

@pauldix I updated it to lookup the schema once per db and pass that in so that each table can read it for that particular db. I think with that we should limit the amount of times we get a lock on the catalog and still fix this issue.

pauldix

Looks good, thanks!

Alternate Title: The DB Schema only ever has one table This is a story of subtle bugs, gnashing of teeth, and hair pulling. Gather round as I tell you the tale of of an Arc that pointed to an outdated schema. In #24954 we introduced an Index for the database as this will allow us to perform faster queries. When we added that code this check was added: ```rust if !self.table_buffers.contains_key(&table_name) { // TODO: this check shouldn't be necessary. If the table doesn't exist in the catalog // and we've gotten here, it means we're dropping a write. if let Some(table) = self.db_schema.get_table(&table_name) { self.table_buffers.insert( table_name.clone(), TableBuffer::new(segment_key.clone(), &table.index_columns()), ); } else { return; } } ``` Adding the return there let us continue on with our day and make the tests pass. However, just because these tests passed didn't mean the code was correct as I would soon find out. With a follow up ticket of #24955 created we merged the changes and I began to debug the issue. Note we had the assumption of dropping a single write due to limits because the limits test is what failed. What began was a chase of a few days to prove that the limits weren't what was failing. This was a bit long but the conclusion was that the limits weren't causing it, but it did expose the fact that a Database only ever had one table which was weird. I then began to dig into this a bit more. Why would there only be one table? We weren't just dropping one write, we were dropping all but *one* write or so it seemed. Many printlns/hours later it became clear that we were actually updating the schema! It existed in the Catalog, but not in the pointer to the schema in the DatabaseBuffer struct so what gives? Well we need to look at [another piece of code](https://github.com/influxdata/influxdb/blob/8f72bf06e13e2269db973b9c2852e092239c701e/influxdb3_write/src/write_buffer/mod.rs#L540-L541). In the `validate_or_insert_schema_and_partitions` function for the WriteBuffer we have this bit of code: ```rust // The (potentially updated) DatabaseSchema to return to the caller. let mut schema = Cow::Borrowed(schema); ``` As we pass in a reference to the schema in the catalog. However, when we [go a bit further down](https://github.com/influxdata/influxdb/blob/8f72bf06e13e2269db973b9c2852e092239c701e/influxdb3_write/src/write_buffer/mod.rs#L565-L568) we see this code: ```rust let schema = match schema { Cow::Owned(s) => Some(s), Cow::Borrowed(_) => None, }; ``` What this means is that if we make a change we clone the original and update it. We *aren't* making a change to the original schema. When we go back up the call stack we get to [this bit of code](https://github.com/influxdata/influxdb/blob/8f72bf06e13e2269db973b9c2852e092239c701e/influxdb3_write/src/write_buffer/mod.rs#L456-L460): ```rust if let Some(schema) = result.schema.take() { debug!("replacing schema for {:?}", schema); catalog.replace_database(sequence, Arc::new(schema))?; } ``` We are updating the catalog with the new schema, but how does that work? ```rust inner.databases.insert(db.name.clone(), db); ``` Oh. Oh no. We're just overwriting it. Which means that the DatabaseBuffer has an Arc to the *old* schema, not the *new* one. Which means that the buffer will get the first copy of the schema with the first new table, but *none* of the other ones. The solution is to make sure that the buffer has a pointer to the Catalog instead which means the DatabaseBuffer will have the most up to date schema and instead lets only the Catalog handle the schema itself. This commit makes those changes to make sure it works. This was a very very subtle mutability/pointer bug given the intersection of valid borrow checking and some writes making it in, but luckily we caught it. It does mean though that until this fix is in, we can consider changes between the Index PR and now are subtly broken and shouldn't be used for anything beyond writing to a signle table per DB. TL;DR We should ask the Catalog what the schema is as it contains the up to date version of it. Closes #24955

mgattozzi requested review from pauldix and hiltontj May 14, 2024 16:40

hiltontj approved these changes May 14, 2024

View reviewed changes

influxdb3_write/src/write_buffer/buffer_segment.rs Show resolved Hide resolved

pauldix reviewed May 14, 2024

View reviewed changes

mgattozzi force-pushed the mgattozzi/limits branch from 4e4aa17 to d8708df Compare May 16, 2024 14:15

pauldix approved these changes May 16, 2024

View reviewed changes

mgattozzi added 2 commits May 16, 2024 10:48

fix: lint issue

1dc1b9b

mgattozzi force-pushed the mgattozzi/limits branch from d8708df to 1dc1b9b Compare May 16, 2024 14:49

mgattozzi merged commit 2381cc6 into main May 16, 2024
12 checks passed

mgattozzi deleted the mgattozzi/limits branch May 16, 2024 15:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: make DB Buffer use the up to date schema #25001

fix: make DB Buffer use the up to date schema #25001

mgattozzi commented May 14, 2024

mgattozzi commented May 14, 2024

hiltontj left a comment

pauldix left a comment

mgattozzi commented May 16, 2024

pauldix left a comment

fix: make DB Buffer use the up to date schema #25001

fix: make DB Buffer use the up to date schema #25001

Conversation

mgattozzi commented May 14, 2024

mgattozzi commented May 14, 2024

hiltontj left a comment

Choose a reason for hiding this comment

pauldix left a comment

Choose a reason for hiding this comment

mgattozzi commented May 16, 2024

pauldix left a comment

Choose a reason for hiding this comment