Create a Storage Plugin

Overview

One of Drill's strengths is the ability to add custom storage plugins which interact with Drill at both plan and execution time. Storage plugins are one of two types of data-related plugins, the other being the "format plugin" which is limited to DFS-based file systems. Storage plugins allow working with any kind of storage system: a database (such as HBase or Cassandra), streaming systems (such as Kafka), REST-based APIs and more. The only restriction is that such systems must be able to return row-oriented data with a consistent schema. (Drill would not be well suited to, say, a text or audio stream.)

This tutorial starts with a strong emphasis on creating the basic storage plugin structure. Then, once you are familiar with that aspect, we dive into details of planning, filter push-down, parallelization and more.

Each storage plugin has a number of components:

Storage plugin
Storage plugin config
Default and optional nested schema definitions
Table definitions
Scan definitions, including one passed from the planner to the execution engine
Run-time scan operator

The example here is for the simplest possible storage plugin, implemented at runtime using the EVF framework and the "base" storage plugin framework.

Basic Steps

The steps here go from a standing start to having a very simple working storage plugin with the simplest possible defaults.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create a Storage Plugin

Overview

Basic Steps

Advanced Topics

Clone this wiki locally