Skip to content

Create a Storage Plugin

Paul Rogers edited this page Dec 6, 2019 · 22 revisions

Overview

One of Drill's strengths is the ability to add custom storage plugins which interact with Drill at both plan and execution time. Storage plugins are one of two types of data-related plugins, the other being the "format plugin" which is limited to DFS-based file systems. Storage plugins allow working with any kind of storage system: a database (such as HBase or Cassandra), streaming systems (such as Kafka), REST-based APIs and more. The only restriction is that such systems must be able to return row-oriented data with a consistent schema. (Drill would not be well suited to, say, a text or audio stream.)

This tutorial starts with a strong emphasis on creating the basic storage plugin structure. Then, once you are familiar with that aspect, we dive into details of planning, filter push-down, parallelization and more.

Each storage plugin has a number of components:

  • Storage plugin
  • Storage plugin config
  • Default and optional nested schema definitions
  • Table definitions
  • Scan definitions, including one passed from the planner to the execution engine
  • Run-time scan operator

The example here is for the simplest possible storage plugin, implemented at runtime using the EVF framework and the "base" storage plugin framework.

Basic Steps

The steps here go from a standing start to having a very simple working storage plugin with the simplest possible defaults.

Advanced Topics

Clone this wiki locally