Skip to content

implydata/learn-druid

Repository files navigation

Learn Druid

The "Learn Druid" repository contains all manner of resources to help you learn and apply Apache Druid.

It contains:

  • Jupyter Notebooks that guide you through query, ingestion, and data management with Apache Druid.
  • A Docker Compose file to get you up and running with a learning lab.

Pre-requisites

To use the "Learn Druid" Docker Compose, you need:

  • Git or Github Desktop

  • Docker Desktop with Docker Compose

  • A machine with at least 6 GiB of RAM.

    Of course, more power is better. The notebooks have been tested with the following resources available to docker: 6 CPUs, 8GB of RAM, and 1 GB swap.

Quickstart

To get started quickly:

  1. Clone this repository locally, if you have not already done so:

    git clone https://github.com/implydata/learn-druid
  2. Navigate to the directory:

     cd learn-druid

To refresh your local copy with the latest notebooks:

git restore .
git pull
  1. Launch the "Learn Druid" Docker environment:

    docker compose --profile druid-jupyter up -d

    The first time you lanch the environment, it can take a while to start all the services.

  2. Navigate to Jupyter Lab in your browser:

    http://localhost:8889/lab

From there you can read the introduction or use Jupyter Lab to navigate the notebooks folder.

Components

The Learn Druid environment Docker Compose file includes the following services:

Jupyter Lab: An interactive environment to run Jupyter Notebooks. The image for Jupyter used in the environment contains Python along with all the supporting libraries you need to run the notebooks.

Apache Kafka: Streaming service as a data source for Druid.

Imply Data Generator: A tool to generate sample data for Druid. It can produce either batch or streaming data.

Apache Druid: The currently released version of Apache Druid by default.

You can use the web console to monitor ingestion tasks, compare query results, and more. To learn about the Druid web console, see Web console.

Profiles

You can use the following Docker Compose profiles to start various combinations of the components based upon your specific needs.

Individual notebooks may prescribe a specific profile that you need to use.

Jupyter only

Use this profile when you want to run the notebooks against an existing Apache Druid database. Use the DRUID_HOST parameter to set the Apache Druid host address.

To start Jupyter only:

DRUID_HOST=[host address] docker compose --profile jupyter up -d

For example, if Druid is running on the local machine:

DRUID_HOST=host.docker.internal docker compose --profile jupyter up -d

To stop Jupyter:

docker compose --profile jupyter down

Jupyter and Druid

Use this profile when you need to query data and do batch ingestion only.

To start Jupyter and Druid:

docker compose --profile druid-jupyter up -d

To stop Jupyter and Druid:

docker compose --profile druid-jupyter down

All services

To start all services:

docker compose --profile all-services up -d

To stop all services:

docker compose --profile all-services down

Feedback and help

For feedback and help, start a discussion in the Discussions board or make contact in the docs and training channel in Apache Druid Slack.

About

Learn the basics of Apache Druid® from leaders in the community with these notebooks and useful tools.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published