Skip to content

joker1007/embulk-output-cassandra

Repository files navigation

Cassandra output plugin for Embulk

Java CI

Apache Cassandra output plugin for Embulk.

Compatibility

embulk-output-kafka embulk datastax-driver-core
0.6.x 0.11.x or later 4.x
0.5.x 0.9.x or later (may not work on 0.11.x) 3.11.x

Breaking Changes

0.6.0

  • timestamp column accepts string as Java's ISO_INSTANT format.
  • timestamp column accepts long and double as epoch millis. (before: as epoch seconds)
  • date column accepts long as days from epoch. (before: not supported)

Overview

  • Plugin type: output
  • Load all or nothing: no
  • Resume supported: yes
  • Cleanup supported: no

Caution

In current, version of netty components conflicts to one that is used by embulk-core.

This probrem is very severe.

I tested this plugin on embulk-0.9.7. But future embulk version may break this plugin.

Support Data types

CQL Type Embulk Type Descritpion
ascii string, boolean, long, double, timestamp, json use toString or toJson
bigint string, boolean(as 0 or 1), long, double
blob unsupported
boolean boolean, long, double 0 == false, 1 == true
counter unsupported
date string, long, timestamp long as days from epoch, timestamp as UTC timestamp
decimal string, boolean(as 0 or 1), long, double
double string, boolean(as 0 or 1), long, double
float string, boolean(as 0 or 1), long, double
inet string
int string, boolean(as 0 or 1), long, double overflowed value is reset to 0
list json
map (support only text key) json
set json
smallint string, boolean(as 0 or 1), long, double overflowed value is reset to 0
text string, boolean, long, double, timestamp, json use toString or toJson
time string, long, double, timestamp long and double as nano seconds of day,
timestamp as UTC timestamp
timestamp string, long, double, timestamp string as Java's ISO_INSTANT format, long and double as epoch millis
timeuuid null
uuid null
varchar string, boolean, long, double, timestamp, json use toString or toJson
varint string, boolean(as 0 or 1), long, double
UDT unsupported

Insert Behavior

If embulk record does not have a column, it is treated as unset. If same key record already exists, the column is not touched.

Counter table

This plugin supports counter table.

But counter table supports only increment/decrement update.

Because of it, This plugin uses input value as increment value;

For example, If input data = {id: 1, count: 5}, Executed Statement is UPDATE tablename SET count = count + 5 WHERE id = 1

Configuration

  • hosts: list of seed hosts (list, required)
  • port: port number for cassandra cluster (integer, default: 9042)
  • username: cluster username (string, default: null)
  • password: cluster password (string, default: null)
  • cluster_name: cluster name (string, default: null)
  • keyspace: target keyspace name (string, required)
  • table: target table name (string, required)
  • mode: insert or update or delete (string, default: "insert")
  • if_not_exists: Add "IF NOT EXISTS" to INSERT query (boolean, default: false)
  • if_exists: Add "IF EXISTS" to UPDATE query (boolean, default: false)
  • ttl: Add "TTL" to INSERT query (integer, default: null)
  • idempotent: Treat INSERT query as idempotent (boolean, default: false)
  • connect_timeout: Set connect timeout millisecond (integer, default: 5000)
  • request_timeout: Set each request timeout millisecond (integer, default: 12000)

Example

out:
  type: cassandra
  hosts:
    - 127.0.0.1
  port: 9042
  keyspace: sample_keyspace
  table: sample_table
  idempotent: true

Build

$ ./gradlew gem  # -t to watch change of files and rebuild continuously

About

Apache Cassandra output plugin for Embulk.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages