Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add replication support for ClickHouse state table + Fixes ListMigration query for ClickHouse #520
base: master
Are you sure you want to change the base?
Add replication support for ClickHouse state table + Fixes ListMigration query for ClickHouse #520
Changes from 10 commits
013a935
235a213
a820168
93e0cdb
b37f6e8
8144d06
0debc41
ffbcf8b
51168fd
e29e8ea
b6c510a
ab72893
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
Large diffs are not rendered by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been coming back to this PR on/off, trying to think through the implementation from the perspective of a "general purpose" migration tool. On the surface, everything looks reasonable with the exception of changing the engine.
For context, another PR #530 came in requesting another engine.
Should the default engine be
MergeTree
but allow this to be configurable?Maybe we could put our heads together on how to solve this so the tool can be useful for all or most ClickHouse users? cc @iamluc @arunk-s
I think PR #530 is on the right track to setting a custom engine, and afaics this would satisfy:
MergeTree()
ReplicatedMergeTree()
KeeperMap()
Other questions, does the
ORDER BY
orPARTITION BY
need to be configurable? Hopefully not so we can make this as generic as possible.Last thought, leaving
ENGINE = %s
to the user may be a footgun and I wish we could do more upfront validation, but then we have to account for all the possible ClickHouse engine types and this seems like a lot of work to maintain.Maybe there's a world where the user is allowed to supply a template for the table and as long as it contains:
version_id
is_applied
(legacy field, deprecated)tstamp
Not sure I like this idea, since goose should own the table it creates to ensure the identifiers match up across all implementations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi,
I had a quick look at the PR and it looks good to me (and could replace my own PR #530)
My only concern is about the engine. I'm not a ClickHouse expert and didn't know about the
KeeperMap
engine, but it seems it must be enabled (https://clickhouse.com/docs/en/engines/table-engines/special/keeper-map) which could be not possible with hosted services.I suggest to add the engine as a new option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having an option to toggle underlying table engine can definitely work!
I think these settings are affected by the choice of Engine so yeah they'll need to be configurable if the custom engine is allowed.
I can understand the concern with hosted services to not have
KeeperMap
though I doubt that platforms like ClickHouse Cloud disallows creating tables with this table engine.Definitely, convention over configuration is generally desirable however in this particular case I'm afraid the abstractions will leak a bit in order to properly support the underlying database. But in any case we need to have a sensible default and only selected number of choices.
The library does quite well supporting different databases but that does come at a cost I'm afraid.
I'll be happy to lend a hand with ClickHouse specific topics though it can take a bit of time for me to catch up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is fixing a bug? (I think you mentioned this in the description)
The core goose library relies on a non-user-defined order for when migrations get applied, this is typically a sequential id in all other databases. It seems here we were using
version_id
, but this is incorrect. In lieu of a sequential id, we should be using time..There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. There is a no equivalent of an auto incrementing ID in clickhouse and it doesn't provide any way to get a globally incremented unique ID across its nodes.
So I relied on using timestamps, with just second precision, it was non-deterministic as migrations could have same timestamps.
With nanosecond precision, that chance is almost non-existent and the implementation further augments it with use of KeeeperMap Engine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I initially added a "store" and "querier" the idea was to have the core goose library only interact with the store, and have no idea about the underlying queries.
The store abstracts the underlying querier which holds the raw string queries for the different database technologies.
Not sure if this made it easier/harder to evolve, but we should probably think about if there's a more elegant way to now thread options from the core library to those 2 internal layers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I struggled around this as well. I didn't find an easier way to hookup options all the way to the implementation specific detail. I tried to minimize the leaky abstraction(s) but I think it will have to exposed in one way.
If you have any suggestions, I can try those out.