Prototype for saving Graphite data from Kafka into Parquet
See blog post for motivation and more details.
- Java 8
- SBT to build project
- Kafka on a single node as described in quickstart guide
- anything that produces data in Graphite plaintext protocol like collectd with write graphite plugin
Getting Graphite data into Kafka:
nc -4l localhost 2003 | kafkacat -P -b localhost -t metrics -K ' '
It feeds all data received on localhost:2003 into topic
metrics with metric name as message key and
value timestamp as a payload.
Saving data from Kafka into Parquet files
sbt -mem <jvmMemorySize> "run <topic> <partition> <offset> <fetchSize> <targetFolder>"
- topic - topic name which contains messages with
metric_nameas key and
value timestamppayload from graphite plaintext protocol
- partition - partition number
- offset - number of messages to skip from the beginning
- fetchSize - (bytes) maximum size of data to fetch from kafka
- targetFolder - path where to save Parquet files
- jvmMemorySize - maximum memory for JVM (-Xmx argument), must be at least 3 times larger than fetchSize.
Parquet files will have name
nextOffset from file name is intended to be used as
offset for subsequent invocations to save next batch of data.
sbt -mem 2048 "run metrics 0 0 500000000 /tmp"