Skip to content
Modern Ruby database driver for ClickHouse
Ruby Other
  1. Ruby 99.3%
  2. Other 0.7%
Branch: master
Clone or download

Latest commit

shlima Add CHANGELOG
Signed-off-by: Aliaksandr Shylau <alex.shilov.by@gmail.com>
Latest commit e3f8b08 Mar 26, 2020

Files

Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
bin Release 1.1.0 Nov 10, 2019
doc Add logo Nov 9, 2019
lib Release 1.2.6 Mar 26, 2020
log Add Connection Nov 8, 2019
spec Datetime64 field type support (#3) Mar 26, 2020
tmp Add Connection Nov 8, 2019
.gitignore Update Girignore Nov 9, 2019
.rspec initial commit Nov 8, 2019
.rubocop.yml Gemfile versions bump Dec 9, 2019
.travis.yml Add pretty logging Nov 9, 2019
CHANGELOG.md Add CHANGELOG Mar 26, 2020
Gemfile initial commit Nov 8, 2019
Gemfile.lock
LICENCE.txt
Makefile
README.md Update Readme Nov 11, 2019
Rakefile initial commit Nov 8, 2019
click_house.gemspec Add Connection Nov 8, 2019
docker-compose.yml Add Connection Nov 8, 2019

README.md

ClickHouse Ruby driver

pipeline status Code Climate Gem Version

# Requires modern Ruby (>= 2.5), tested with Yandex.Clickhouse v 19.16.2.2
gem install click_house

A modern Ruby database driver for ClickHouse. ClickHouse is a high-performance column-oriented database management system developed by Yandex which operates Russia's most popular search engine.

This development was inspired by currently unmaintainable alternative but rewritten and well tested

Why use the HTTP interface and not the TCP interface?

Well, the developers of ClickHouse themselves discourage using the TCP interface.

TCP transport is more specific, we don't want to expose details. Despite we have full compatibility of protocol of different versions of client and server, we want to keep the ability to "break" it for very old clients. And that protocol is not too clean to make a specification.

Yandex uses HTTP interface for working from Java and Perl, Python and Go as well as shell scripts.

TOC

Configuration

ClickHouse.config do |config|
  config.logger = Logger.new(STDOUT)
  config.adapter = :net_http
  config.database = 'metrics'
  config.url = 'http://localhost:8123'
  config.timeout = 60
  config.open_timeout = 3
  config.ssl_verify = false
  
  # or provide connection options separately
  config.scheme = 'http' 
  config.host = 'localhost' 
  config.port = 'port' 
  
  # if you use HTTP basic Auth
  config.username = 'user' 
  config.password = 'password' 
end

Alternative, you can assign configuration parameters via a hash

ClickHouse.config.assign(logger: Logger.new(STDOUT))

Now you are able to communicate with ClickHouse:

ClickHouse.connection.ping #=> true

You can easily build a new raw connection and override any configuration parameter (such as database name, connection address)

@connection = ClickHouse::Connection.new(ClickHouse::Config.new(logger: Rails.logger))
@connection.ping

Usage

ClickHouse.connection.ping #=> true
ClickHouse.connection.replicas_status #=> true

ClickHouse.connection.databases #=> ["default", "system"]
ClickHouse.connection.create_database('metrics', if_not_exists: true, engine: nil, cluster: nil)
ClickHouse.connection.drop_database('metrics', if_exists: true, cluster: nil)

ClickHouse.connection.tables #=> ["visits"]
ClickHouse.connection.describe_table('visits') #=> [{"name"=>"id", "type"=>"FixedString(16)", "default_type"=>""}]
ClickHouse.connection.table_exists?('visits', temporary: nil) #=> true
ClickHouse.connection.drop_table('visits', if_exists: true, temporary: nil, cluster: nil)
ClickHouse.connection.create_table(*) # see <Create a table> section
ClickHouse.connection.truncate_table('name', if_exists: true, cluster: nil)
ClickHouse.connection.truncate_tables(['table_1', 'table_2'], if_exists: true, cluster: nil)
ClickHouse.connection.truncate_tables # will truncate all tables in database
ClickHouse.connection.rename_table('old_name', 'new_name', cluster: nil)
ClickHouse.connection.rename_table(%w[table_1 table_2], %w[new_1 new_2], cluster: nil)

Queries

Select All

Select all type-casted result set

@result = ClickHouse.connection.select_all('SELECT * FROM visits')

# all enumerable methods are delegated like #each, #map, #select etc
@result.to_a #=> [{"date"=>#<Date: 2000-01-01>, "id"=>1}]

# you can access raw data
@result.meta #=> [{"name"=>"date", "type"=>"Date"}, {"name"=>"id", "type"=>"UInt32"}] 
@result.data #=> [{"date"=>"2000-01-01", "id"=>1}, {"date"=>"2000-01-02", "id"=>2}] 
@result.statistics #=> {"elapsed"=>0.0002271, "rows_read"=>2, "bytes_read"=>12}

Select Value

Select value returns exactly one type-casted value

ClickHouse.connection.select_value('SELECT COUNT(*) from visits') #=> 0
ClickHouse.connection.select_value("SELECT toDate('2019-01-01')") #=> #<Date: 2019-01-01>
ClickHouse.connection.select_value("SELECT toDateOrZero(NULL)") #=> nil

Select One

Returns a record hash with the column names as keys and column values as values.

ClickHouse.connection.select_one('SELECT date, SUM(id) AS sum FROM visits GROUP BY date')
#=> {"date"=>#<Date: 2000-01-01>, "sum"=>1}

Execute Raw SQL

By default, gem provides parser for JSON and CSV response formats. Type conversion available for the JSON.

# format not specified
response = ClickHouse.connection.execute <<~SQL
  SELECT count(*) AS counter FROM rspec
SQL

response.body #=> "2\n"

# JSON
response = ClickHouse.connection.execute <<~SQL
  SELECT count(*) AS counter FROM rspec FORMAT JSON
SQL

response.body #=> {"meta"=>[{"name"=>"counter", "type"=>"UInt64"}], "data"=>[{"counter"=>"2"}], "rows"=>1, "statistics"=>{"elapsed"=>0.0002412, "rows_read"=>2, "bytes_read"=>4}}

# CSV
response = ClickHouse.connection.execute <<~SQL
  SELECT count(*) AS counter FROM rspec FORMAT CSV
SQL

response.body #=> [["2"]]

# You may use any format supported by ClickHouse
response = ClickHouse.connection.execute <<~SQL
  SELECT count(*) AS counter FROM rspec FORMAT RowBinary
SQL

response.body #=> "\u0002\u0000\u0000\u0000\u0000\u0000\u0000\u0000" 

Insert

When column names and values are transferred separately

ClickHouse.connection.insert('table', columns: %i[id name]) do |buffer|
  buffer << [1, 'Mercury']
  buffer << [2, 'Venus']
end 

ClickHouse.connection.insert('table', columns: %i[id name], values: [[1, 'Mercury'], [2, 'Venus']])
#=> true

When rows are passed as a hash

ClickHouse.connection.insert('table', values: [{ name: 'Sun', id: 1 }, { name: 'Moon', id: 2 }])

ClickHouse.connection.insert('table') do |buffer|
  buffer << { name: 'Sun', id: 1 }
  buffer << { name: 'Moon', id: 2 }
end
#=> true

Create a table

Create table using DSL

ClickHouse.connection.create_table('visits', if_not_exists: true, engine: 'MergeTree(date, (year, date), 8192)') do |t|
  t.FixedString :id, 16
  t.UInt16      :year
  t.Date        :date
  t.DateTime    :time, 'UTC'
  t.Decimal     :money, 5, 4
  t.String      :event
  t.UInt32      :user_id
  t.UInt32      :Float32
end

Create nullable columns

ClickHouse.connection.create_table('visits', engine: 'TinyLog') do |t|
  t.UInt16 :id, 16, nullable: true
end

Set column options

ClickHouse.connection.create_table('visits', engine: 'MergeTree(date, (year, date), 8192)') do |t|
  t.UInt16  :year
  t.Date    :date
  t.UInt16  :id, 16, default: 0, ttl: 'date + INTERVAL 1 DAY'
end

Define column with custom SQL

ClickHouse.connection.create_table('visits', engine: 'TinyLog') do |t|
  t << "vendor Enum('microsoft' = 1, 'apple' = 2)"
  t << "tags Array(String)"
end

Define nested structures

ClickHouse.connection.create_table('visits', engine: 'TinyLog') do |t|
  t.UInt8 :id
  t.Nested :json do |n|
    n.UInt8 :cid
    n.Date  :created_at
    n.Date  :updated_at
  end
end

Set table options

ClickHouse.connection.create_table('visits', 
  order: 'year',
  ttl: 'date + INTERVAL 1 DAY',
  sample: 'year',
  settings: 'index_granularity=8192',
  primary_key: 'year',
  engine: 'MergeTree') do |t|
  t.UInt16  :year
  t.Date    :date
end

Create table with raw SQL

ClickHouse.connection.execute <<~SQL
  CREATE TABLE visits(int Nullable(Int8), date Nullable(Date)) ENGINE TinyLog
SQL

Alter table

Alter table with DSL

ClickHouse.connection.add_column('table', 'column_name', :UInt64, default: nil, if_not_exists: nil, after: nil, cluster: nil)
ClickHouse.connection.drop_column('table', 'column_name', if_exists: nil, cluster: nil)
ClickHouse.connection.clear_column('table', 'column_name', partition: 'partition_name', if_exists: nil, cluster: nil)
ClickHouse.connection.modify_column('table', 'column_name', type: :UInt64, default: nil, if_exists: false, cluster: nil)

Alter table with SQL

# By SQL in argument
ClickHouse.connection.alter_table('table', 'DROP COLUMN user_id', cluster: nil)

# By SQL in a block
ClickHouse.connection.alter_table('table', cluster: nil) do 
  <<~SQL
    MOVE PART '20190301_14343_16206_438' TO VOLUME 'slow'
  SQL  
end

Type casting

By default gem provides all necessary type casting, but you may overwrite or define your own logic

class DateType
  def cast(value)
    Date.parse(value)
  end 
  
  def serialize(value)
    value.strftime('%Y-%m-%d')
  end
end

ClickHouse.add_type('Date', DateType.new)

Actually serialize function is not used for now, but you may use it manually:

time_type = DateTimeType.new
string_type = FixedStringType.new 

ClickHouse.connection.insert('table', columns: %i[name time]) do |buffer|
  buffer << [string_type.serialize('a' * 1000, 20), time.serialize(Time.current, 'Europe/Moscow')]
end

## alternatively
data = @records.map do |record|
  {
    id: record.id,
    time: ClickHouse.types['DateTime(%s)'].serialize(Time.current)
  }
end

If native type supports arguments, define type with %s argument:

class DateTimeType
  def cast(value, time_zone)
    Time.parse("#{value} #{time_zone}")
  end 
end

ClickHouse.add_type('DateTime(%s)', DateTimeType.new)

if you need to redefine all built-in types with your implementation, just clear the default type system:

ClickHouse.types.clear
ClickHouse.types # => {}
ClickHouse.types.default #=> #<ClickHouse::Type::UndefinedType:0x00007fc1cfabd630>

Using with a connection pool

require 'connection_pool'

ClickHouse.connection = ConnectionPool.new(size: 2) do
  ClickHouse::Connection.new(ClickHouse::Config.new(url: 'http://replica.example.com'))
end

ClickHouse.connection.with do |conn|
  conn.tables
end

Using with Rails

# config/click_house.yml

default: &default
  url: http://localhost:8123
  timeout: 60
  open_timeout: 3

development:
  database: ecliptic_development
  <<: *default

test:
  database: ecliptic_test
  <<: *default
# config/initializers/click_house.rb

ClickHouse.config do |config|
  config.logger = Rails.logger
  config.assign(Rails.application.config_for('click_house'))
end 
# lib/tasks/click_house.rake
namespace :click_house do
  task prepare: :environment do
    @environments = Rails.env.development? ? %w[development test] : [Rails.env]
  end

  task drop: :prepare do
    @environments.each do |env|
      config = ClickHouse.config.clone.assign(Rails.application.config_for('click_house', env: env))
      connection = ClickHouse::Connection.new(config)
      connection.drop_database(config.database, if_exists: true)
    end
  end

  task create: :prepare do
    @environments.each do |env|
      config = ClickHouse.config.clone.assign(Rails.application.config_for('click_house', env: env))
      connection = ClickHouse::Connection.new(config)
      connection.create_database(config.database, if_not_exists: true)
    end
  end
end

Prepare the ClickHouse database:

rake click_house:drop click_house:create

If your are using SQL Database in Rails, you can manage ClickHouse migrations using ActiveRecord::Migration mechanism

class CreateAdvertVisits < ActiveRecord::Migration[6.0]
  def up
    ClickHouse.connection.create_table('visits', engine: 'MergeTree(date, (account_id, advert_id), 512)') do |t|
      t.UInt16   :account_id
      t.UInt16   :user_id
      t.Date     :date
      t.DateTime :time, 'UTC'
      t.UInt32   :ipv4
      t.UInt64   :ipv6
      t.UInt32   :device_type_id
      t.UInt32   :os_family_id
    end
  end

  def down    
    ClickHouse.connection.drop_table('visits')
  end
end

Using with ActiveRecord

if you use ActiveRecord, you can use the ORM query builder by using fake models (empty tables must be present in the SQL database create_table :visits)

class ClickHouseRecord < ActiveRecord::Base
  self.abstract_class = true

  class << self
    def agent
      ClickHouse.connection
    end

    def insert(*argv, &block)
      agent.insert(table_name, *argv, &block)
    end

    def select_one
      agent.select_one(current_scope.to_sql)
    end

    def select_value
      agent.select_value(current_scope.to_sql)
    end

    def select_all
      agent.select_all(current_scope.to_sql)
    end
  end
end
# FAKE MODEL FOR ClickHouse 
class Visit < ClickHouseRecord
  scope :with_os, -> { where.not(os_family_id: nil) }
end

Visit.with_os.select('COUNT(*) as counter').group(:ipv4).select_all
#=> [{ 'ipv4' => 1455869, 'counter' => 104 }]

Visit.with_os.select('COUNT(*)').select_value 
#=> 20_345_678
 
Visit.where(user_id: 1).select_one 
#=> { 'ipv4' => 1455869, 'user_id' => 1 }  

Using with RSpec

You can clear the data table before each test with RSpec

RSpec.configure do |config|
  config.before(:each, truncate_click_house: true) do
    ClickHouse.connection.truncate_tables
  end
end
RSpec.describe Api::MetricsCountroller, truncate_click_house: true do 
  it { }
  it { }
end

Development

make dockerize
rspec
rubocop
You can’t perform that action at this time.