Skip to content

Latest commit

 

History

History
166 lines (119 loc) · 5.15 KB

clickhouse.md

File metadata and controls

166 lines (119 loc) · 5.15 KB

TSBS Supplemental Guide: ClickHouse

ClickHouse is an open source column-oriented database management system capable of real time generation of analytical data reports using SQL queries. This supplemental guide explains how the data generated for TSBS is stored, additional flags available when using the data importer (tsbs_load_clickhouse), and additional flags available for the query runner (tsbs_run_queries_clickhouse). This should be read after the main README.

Data format

Data generated by tsbs_generate_data for ClickHouse is serialized in a "pseudo-CSV" format, along with a custom header at the beginning. The header is several lines long:

  • one line composed of a comma-separated list of tag labels, with the literal string tags as the first value in the list
  • one or more lines composed of a comma-separated list of field labels, with the hypertable name as the first value in the list
  • a blank line

An example for the cpu-only use case:

tags,hostname,region,datacenter,rack,os,arch,team,service,service_version,service_environment
cpu,usage_user,usage_system,usage_idle,usage_nice,usage_iowait,usage_irq,usage_softirq,usage_steal,usage_guest,usage_guest_nice

Following this, each reading is composed of two rows:

  1. a comma-separated list of tag values for the reading, with the literal string tags as the first value in the list
  2. a comma-separated list of field values for the reading, with the hypertable the reading belongs to being the first value and the timestamp as the second value

An example for the cpu-only use case:

tags,host_0,eu-central-1,eu-central-1b,21,Ubuntu15.10,x86,SF,6,0,test
cpu,1451606400000000000,58.1317132304976170,2.6224297271376256,24.9969495069947882,61.5854484633778867,22.9481393231639395,63.6499207106198313,6.4098777048301052,44.8799140503027445,80.5028770761136201,38.2431182911542820

tsbs_load_clickhouse Additional Flags

-host (type: string, default: localhost)

Hostname of the ClickHouse server.

-user (type: string, default: default)

User to use to connect to the ClickHouse server. Yes, default user is really called default

-password (type: string, default: ``)

Password to use to connect to the ClickHouse server. Default password is empty

Miscellaneous

-hash-workers (type: boolean, default: false)

Whether to consistently hash data across the multiple insert workers by the value of the primary (first) tag. For datasets with larger numbers of devices, this option helps improve data locality on disk which can lead to better query performance. For datasets with smaller numbers of devices, it is typically not necessary.

-write-profile (type: string, default: none)

File to output periodic CPU and memory statistics. Useful for understanding system performance while writing data to the database.


tsbs_run_queries_clickhouse Additional Flags

-hosts (type: string, default: localhost)

Comma separated list of hostnames for the ClickHouse servers. Workers are connected to a server in a round-robin fashion.

-user (type: string, default: default)

User to use to connect to the ClickHouse server. Yes, default user is really called default

-password (type: string, default: ``)

Password to use to connect to the ClickHouse server. Default password is empty


How to run test. Ubuntu 16.04 LTS example

Install ClickHouse

Add ClickHouse repo

sudo bash -c "echo 'deb http://repo.yandex.ru/clickhouse/deb/stable/ main/' > /etc/apt/sources.list.d/clickhouse.list"

Add key and update repolist

sudo apt-key adv --keyserver keyserver.ubuntu.com --recv E0C56BD4    # optional
sudo apt-get update

Install binaries

sudo apt-get install -y clickhouse-client clickhouse-server

More details on how to get started with ClickHouse is available here

Ensure ClickHouse is running

sudo service clickhouse-server restart

Setup TSBS

Install golang

sudo apt install golang-1.9

Add go binaries to PATH for convenience and setup GOPATH env

echo 'export PATH="$HOME/gocode/bin:/usr/lib/go-1.9/bin:$PATH"' >> ~/.bashrc
echo 'export GOPATH="$HOME/gocode"' >> ~/.bashrc

Apply PATH and GOPATH

source ~/.bashrc

Create initial Go folders

mkdir -p $GOPATH/{bin,src}

Get and build TSBS

go get github.com/timescale/tsbs
cd $GOPATH/src/github.com/timescale/tsbs/cmd
go get ./...
go install ./...

Run test

cd $GOPATH/src/github.com/timescale/tsbs/scripts

Generate test dataset. This may take some time.

FORMATS=clickhouse ./generate_data.sh

Generate test queries set. This should not take much time

FORMATS=clickhouse ./generate_queries.sh

Load data set

./load_clickhouse.sh

Run test query set. In this example, there are restrictions on both number of concurrent workers and number of test queries to run. If you have powerful hardware, feel free to rise limits higher.

NUM_WORKERS=1 MAX_QUERIES=10 ./run_queries_clickhouse.sh

Enjoy results in /tmp/bulk_queries/result_queries_clickhouse* files.