- 
                Notifications
    You must be signed in to change notification settings 
- Fork 60
[clickhouse] Fix DB initialisation on cluster nodes #7454
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the find and fix @karencfv. Your solution looks like it should work to me,
but I do wonder why we are wiping the databases from the client in the first place. That seems like a major faux-pas to have the possibility open in production. We still need to solve schema update for clickhouse, but wiping is just scary. I wonder if instead we can just remove the wipes all together from the client to fix this. I'd feel more comfortable with that.
Maybe we should wait for @bnaecker to come back from vacation and discuss.
| 
 I am 100% with you on this, yeah I'll be on stand by until @bnaecker returns | 
| If we remove the ability to wipe from the client somehow, how would the replicated tests run? Those spin up a single cluster, and continually initialize / wipe the DB to ensure a clean slate between them. | 
| 
 Can we not add separate wiping code only to the tests? | 
| 
 Probably! I didn't mean it wasn't possible, only that we need to make sure that still happens if we remove the actual public method calls for doing that. There might be other things that rely on wiping, but we should be able to find callsites to those methods to answer that. | 
| Sounds good! I think the objection was mainly having  | 
| I think I'm missing some context here. The method  https://github.com/oxidecomputer/omicron/blob/main/oximeter/collector/src/agent.rs#L90-L120 This all predates the admin server work. We needed a way to initialize the database somewhere. Initially, that was just in  The second callsite is in the  https://github.com/oxidecomputer/omicron/blob/main/clickhouse-admin/src/context.rs#L427 which is in the  These two seem to be doing very similar things. Are they conflicting now? Should we be using only one of them? That is, should we still have  Related, the  | 
| Initialising the database via clickhouse-admin was introduced here #6903 I am unsure about the specifics as to why the database needs to be started by reconfigurator when it's a single node (@andrewjstone or @plotnick will have more insight into this). For a replicated cluster the database needs to be initialised by reconfigurator because we add and remove nodes, and sadly ClickHouse does not copy over the schema when a new node is added to the cluster. With what you're saying, It appears we could have oximeter stop initialising the database and leave that all to clickhouse-admin? WDYT? 
 I considered using the  | 
| The original reason for putting schema initialization in the admin server is the scary note immediately preceding  Since reconfigurator execution happens automatically and asynchronously from each nexus, we can't serialize initialization requests from the clients (nexus). So we force everyone to go to the admin server, and use an explicit mutex there to serialize requests. FWIW, I do think that splitting initialization from wiping makes sense. I tried to use the machinery that was there for single-node, but the multi-node case is clearly more complex. I do not have strong opinions on exactly what that refactor should look like. | 
| 
 I'm not sure, but that seems reasonable.  It's also possible we want to get rid of the  | 
| 
 Excellent! I've updated this PR with that change. Thankfully, the impact on the tests was minimal 😅 
 I agree that we should clearer about whose responsibility it is to make sure the database exists and has the right schema. Your proposal of tentatively removing  | 
| 
 Yep, that's totally fine! | 
| Thanks @bnaecker! Might I request a ✅ on this PR? | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ask and ye shall receive @karencfv!
        
          
                oximeter/db/src/client/mod.rs
              
                Outdated
          
        
      |  | ||
| // If we try to upgrade to a newer version, we'll drop old data. | ||
| // If we try to upgrade to a newer version, we expect a failure when | ||
| // re-initilaising the client. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: spelling `reinitialising"
| self.init_replicated_db().await?; | ||
| } | ||
| } else if version > expected_version { | ||
| } else if version != expected_version { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No real action necessary, but this is part of why I think we probably want to move away from oximeter doing this work at all.
Suppose oximeter starts up, and is sitting at the loop waiting for the DB to be at its expected version. Meanwhile, we're separately upgrading the DB, past oximeter's expected version. This condition might feasibly be true only temporarily -- oximeter would then continue expecting the DB to be at its version, while the updater continues to move beyond it.
I'm not 100% sure what to do about this. We might need to commit to upgrading oximeter first, before applying any schema updates. We might also need some explicit way to release oximeter, so that it waits until the updater (the admin server, presumably) tells it to start operating.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ugh, yeah, those are really good points. I'll take this into account for #7488 . Will work on that next btw
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, thanks @karencfv. Let me know if I can help or you need a rubber duck!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! will do!
When running some manual tests with reconfigurator adding nodes to an existing cluster, I noticed that every time I added a new node, all of the other nodes would delete their existing
oximeterdatabase and create a newoximeterdatabase with it's tables from scratch. This is not the behaviour we want. The other nodes should continue as they are and only the new node should have the database and schema initialised.The unwanted behaviour when adding a new node can be seen in the following logs of a node that already existed and was part of the cluster.
DROP DATABASE IF EXISTS oximeter ON CLUSTER oximeter_cluster SYNC;and deletes everyoximeterdatabase in the nodes.To fix this issue, we are now only wiping the database associated with the node we're targeting. Here are the logs after applying the fix: