Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: make DB Buffer use the up to date schema #25001

Merged
merged 2 commits into from
May 16, 2024
Merged

Conversation

mgattozzi
Copy link
Contributor

Alternate Title: The DB Schema only ever has one table

This is a story of subtle bugs, gnashing of teeth, and hair pulling. Gather round as I tell you the tale of of an Arc that pointed to an outdated schema.

In #24954 we introduced an Index for the database as this will allow us to perform faster queries. When we added that code this check was added:

if !self.table_buffers.contains_key(&table_name) {
    // TODO: this check shouldn't be necessary. If the table doesn't exist in the catalog
    // and we've gotten here, it means we're dropping a write.
    if let Some(table) = self.db_schema.get_table(&table_name) {
        self.table_buffers.insert(
            table_name.clone(),
            TableBuffer::new(segment_key.clone(), &table.index_columns()),
        );
    } else {
        return;
    }
}

Adding the return there let us continue on with our day and make the tests pass. However, just because these tests passed didn't mean the code was correct as I would soon find out. With a follow up ticket of #24955 created we merged the changes and I began to debug the issue.

Note we had the assumption of dropping a single write due to limits because the limits test is what failed. What began was a chase of a few days to prove that the limits weren't what was failing. This was a bit long but the conclusion was that the limits weren't causing it, but it did expose the fact that a Database only ever had one table which was weird.

I then began to dig into this a bit more. Why would there only be one table? We weren't just dropping one write, we were dropping all but one write or so it seemed. Many printlns/hours later it became clear that we were actually updating the schema! It existed in the Catalog, but not in the pointer to the schema in the DatabaseBuffer struct so what gives?

Well we need to look at another piece of code.

In the validate_or_insert_schema_and_partitions function for the WriteBuffer we have this bit of code:

// The (potentially updated) DatabaseSchema to return to the caller.
let mut schema = Cow::Borrowed(schema);

As we pass in a reference to the schema in the catalog. However, when we go a bit further down we see this code:

    let schema = match schema {
        Cow::Owned(s) => Some(s),
        Cow::Borrowed(_) => None,
    };

What this means is that if we make a change we clone the original and update it. We aren't making a change to the original schema. When we go back up the call stack we get to this bit of code:

    if let Some(schema) = result.schema.take() {
        debug!("replacing schema for {:?}", schema);


        catalog.replace_database(sequence, Arc::new(schema))?;
    }

We are updating the catalog with the new schema, but how does that work?

        inner.databases.insert(db.name.clone(), db);

Oh. Oh no. We're just overwriting it. Which means that the DatabaseBuffer has an Arc to the old schema, not the new one. Which means that the buffer will get the first copy of the schema with the first new table, but none of the other ones. The solution is to make sure that the buffer has a pointer to the Catalog instead which means the DatabaseBuffer will have the most up to date schema and instead lets only the Catalog handle the schema itself. This commit makes those changes to make sure it works.

This was a very very subtle mutability/pointer bug given the intersection of valid borrow checking and some writes making it in, but luckily we caught it. It does mean though that until this fix is in, we can consider changes between the Index PR and now are subtly broken and shouldn't be used for anything beyond writing to a signle table per DB.

TL;DR We should ask the Catalog what the schema is as it contains the up to date version of it.

Closes #24955

@mgattozzi
Copy link
Contributor Author

Alternatively we can put the schema behind a RwLock and update it that way instead, but I figured this would also be fine.

Copy link
Contributor

@hiltontj hiltontj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eesh... that was a doozie, thanks for sticking it out. One suggestion in-line, but otherwise LGTM. I am sure @pauldix will want to review as well, though.

Copy link
Member

@pauldix pauldix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahhhh that's a tricky one. There's one concern I have with this fix. Now every buffer into a table will call into the catalog to grab the db schema. So that's grabbing a read lock and doing a hash table lookup to get the database.

Most of the time when users do writes, they write into many tables at the same time. So that means we'd do this hash lookup for the DB and grabbing of the lock for every table that gets written into (this can be in the hundreds or thousands). The schema gets updated a single time for this (and thus the catalog gets updated).

Ideally, the buffer_table_batch function would take the db_schema as an argument, that way the buffer_writes function on OpenBufferSegment could lookup the schema a single time and then pass that down to all the calls to buffer_table_batch.

@mgattozzi
Copy link
Contributor Author

@pauldix I updated it to lookup the schema once per db and pass that in so that each table can read it for that particular db. I think with that we should limit the amount of times we get a lock on the catalog and still fix this issue.

Copy link
Member

@pauldix pauldix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks!

Alternate Title: The DB Schema only ever has one table

This is a story of subtle bugs, gnashing of teeth, and hair pulling.
Gather round as I tell you the tale of of an Arc that pointed to an
outdated schema.

In #24954 we introduced an Index for the database as this will allow us
to perform faster queries. When we added that code this check was added:

```rust
if !self.table_buffers.contains_key(&table_name) {
    // TODO: this check shouldn't be necessary. If the table doesn't exist in the catalog
    // and we've gotten here, it means we're dropping a write.
    if let Some(table) = self.db_schema.get_table(&table_name) {
        self.table_buffers.insert(
            table_name.clone(),
            TableBuffer::new(segment_key.clone(), &table.index_columns()),
        );
    } else {
        return;
    }
}
```

Adding the return there let us continue on with our day and make the
tests pass. However, just because these tests passed didn't mean the
code was correct as I would soon find out. With a follow up ticket of
#24955 created we merged the changes and I began to debug the issue.

Note we had the assumption of dropping a single write due to limits
because the limits test is what failed. What began was a chase of a few
days to prove that the limits weren't what was failing. This was a bit
long but the conclusion was that the limits weren't causing it, but it
did expose the fact that a Database only ever had one table which was
weird.

I then began to dig into this a bit more. Why would there only be one
table? We weren't just dropping one write, we were dropping all but
*one* write or so it seemed. Many printlns/hours later it became clear
that we were actually updating the schema! It existed in the Catalog,
but not in the pointer to the schema in the DatabaseBuffer struct so
what gives?

Well we need to look at [another piece of code](https://github.com/influxdata/influxdb/blob/8f72bf06e13e2269db973b9c2852e092239c701e/influxdb3_write/src/write_buffer/mod.rs#L540-L541).

In the `validate_or_insert_schema_and_partitions` function for the
WriteBuffer we have this bit of code:

```rust
// The (potentially updated) DatabaseSchema to return to the caller.
let mut schema = Cow::Borrowed(schema);
```

As we pass in a reference to the schema in the catalog. However, when we
[go a bit further down](https://github.com/influxdata/influxdb/blob/8f72bf06e13e2269db973b9c2852e092239c701e/influxdb3_write/src/write_buffer/mod.rs#L565-L568)
we see this code:

```rust
    let schema = match schema {
        Cow::Owned(s) => Some(s),
        Cow::Borrowed(_) => None,
    };
```

What this means is that if we make a change we clone the original and
update it. We *aren't* making a change to the original schema. When we
go back up the call stack we get to [this bit of code](https://github.com/influxdata/influxdb/blob/8f72bf06e13e2269db973b9c2852e092239c701e/influxdb3_write/src/write_buffer/mod.rs#L456-L460):

```rust
    if let Some(schema) = result.schema.take() {
        debug!("replacing schema for {:?}", schema);


        catalog.replace_database(sequence, Arc::new(schema))?;
    }
```

We are updating the catalog with the new schema, but how does that work?

```rust
        inner.databases.insert(db.name.clone(), db);
```

Oh. Oh no. We're just overwriting it. Which means that the
DatabaseBuffer has an Arc to the *old* schema, not the *new* one. Which
means that the buffer will get the first copy of the schema with the
first new table, but *none* of the other ones. The solution is to make
sure that the buffer has a pointer to the Catalog instead which means
the DatabaseBuffer will have the most up to date schema and instead lets
only the Catalog handle the schema itself. This commit makes those
changes to make sure it works.

This was a very very subtle mutability/pointer bug given the
intersection of valid borrow checking and some writes making it in, but
luckily we caught it. It does mean though that until this fix is in, we
can consider changes between the Index PR and now are subtly broken and
shouldn't be used for anything beyond writing to a signle table per DB.

TL;DR We should ask the Catalog what the schema is as it contains the up
to date version of it.

Closes #24955
@mgattozzi mgattozzi merged commit 2381cc6 into main May 16, 2024
12 checks passed
@mgattozzi mgattozzi deleted the mgattozzi/limits branch May 16, 2024 15:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Limit enforcement seems to still attempt to buffer the write
3 participants