Clickhouse Cluster Support #29

poundifdef · 2023-10-13T12:46:36Z

We want to be able to support cluster operations for creating tables.

When you create a database/table in Clickhouse, you have to specify an "ON CLUSTER" parameter in order to make sure the command is replicated across all database servers. You will need to run clickhouse-keeper as part of your docker setup. keeper is Clickhouse's version of Zookeeper which allows different DB servers to coordinate.
When we run a CREATE DATABASE/TABLE, or ALTER TABLE command in Clickhouse, we should be able to specify an "ON CLUSTER" parameter. We should add a new configuration param on the User object, GetCluster() string and use that as the cluster name. If that user does not have a cluster assigned (if the cluster is blank) then we should omit the ON CLUSTER query clause.

The text was updated successfully, but these errors were encountered:

amrullahfarook · 2023-10-14T13:43:07Z

Hey there @poundifdef! I came across Scratch DB and was interested in contributing, can I take up this issue?

poundifdef · 2023-10-16T14:24:14Z

Sure.

poundifdef · 2023-10-30T17:22:27Z

I think there need to be 2 data structures: one users and servers.

The users data structure looks like this:

{
  "users": [
    {"api_key": "A","cluster": "A"},
    {"api_key": "B","cluster": "B"}
   ]
}

The servers structure would look like this:

{
  "servers": {
    "1.1.1.1": [
      {"cluster": "A", "shard": "A1"},
      {"cluster": "B", "shard": "B1"}
    ],
    "2.2.2.2": [
      {"cluster": "A", "shard": "A2"},
      {"cluster": "B", "shard": "B1"}
    ]
  }
}

Now we have a correlation between a user and their cluster with the users.cluster.

The servers structure should be able to directly translate to a clickhouse cluster XML config. For example, the above would translate to:

<clickhouse>
    <remote_servers>
        <A>
            <shard>
            <!-- shard A1 -->
                <replica>
                    <host>1.1.1.1</host>
                </replica>
            </shard>
            <shard>
            <!-- shard A2 -->
                <replica>
                    <host>2.2.2.2</host>
                </replica>
            </shard>
        </A>
        <B>
            <shard>
                <!-- shard B1, 2 replicas -->
                <replica>
                    <host>1.1.1.1</host>
                </replica>
                <replica>
                    <host>2.2.2.2</host>
                </replica>
            </shard>
        </B>
    </remote_servers>
</clickhouse>

I would also be open to the inverse - having the config store data which mirrors the Clickhouse required config, and using that to translate to figure out which clusters are hosted on which servers.

At the end of this project, we should be able to do the following:

Ensure that we are performing database options on the correct cluster depending on which API key was used
Write algorithms to choose which individual shard or replica to write data to

chumaumenze mentioned this issue Oct 27, 2023

Setup Clickhouse Cluster Support #37

Closed

poundifdef closed this as completed Mar 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clickhouse Cluster Support #29

Clickhouse Cluster Support #29

poundifdef commented Oct 13, 2023 •

edited

amrullahfarook commented Oct 14, 2023 •

edited

poundifdef commented Oct 16, 2023

poundifdef commented Oct 30, 2023 •

edited

Clickhouse Cluster Support #29

Clickhouse Cluster Support #29

Comments

poundifdef commented Oct 13, 2023 • edited

amrullahfarook commented Oct 14, 2023 • edited

poundifdef commented Oct 16, 2023

poundifdef commented Oct 30, 2023 • edited

poundifdef commented Oct 13, 2023 •

edited

amrullahfarook commented Oct 14, 2023 •

edited

poundifdef commented Oct 30, 2023 •

edited