Blockchain data pipeline using Airflow, Kubernetes, Redshift, and Grafana
Branch: master
Clone or download
Type Name Latest commit message Commit time
Failed to load latest commit information.
airflow Remove gcp config option from airflow.cfg Feb 14, 2019
docs Add getting started and terraform docs Feb 14, 2019
k8s Switch grafana to a network load balancer Feb 15, 2019
redshift Add Redshift user for Airflow Feb 14, 2019
.dockerignore Add .dockerignore including our .git directory Jan 25, 2019
.gitignore Ignore secrets Feb 14, 2019
.gitmodules Consolidate airflow related files in the airflow directory Feb 13, 2019
Makefile Add Terraform config for the Airflow scheduler Feb 14, 2019 Add getting started and terraform docs Feb 14, 2019



This project uses blockchain data to provide a platform for financial analysis of crypto assets. In particular we are researching in emerging standards for security tokens and potential methods of fundamental analysis.

Our system will has the following capabilities:

  1. Scalability to handle exponential growth in blockchain data size.
  2. Fault-tolerance to maintain data recency.
  3. Security features suitable for protecting proprietary data.

It will serve as an example for how to build a data pipeline for analyzing public blockchains.


Table of Contents

  1. Getting Started
  2. Terraform Configs
  3. Setting up Kubernetes / EKS
  4. Airflow DAGs
  5. Tech Stack
  6. Engineering Challenges

High-level Architecture

high level architecture

Example Dashboard

example dashboard