Skip to content

mapr/drill-perf-test-framework

Repository files navigation

Performance Test Framework for Apache Drill

Performance Test Framework for SQL on Hadoop technologies. Currently supports Apache Drill, a schema-free SQL query engine for Hadoop, NoSQL and cloud storage.

The framework is built for regression testing with focus on query performance. Test cases include customized industry standard benchmarks such as TPC-H and TPC-DS. A subset of these tests are used by the Apache Drill community for pre-commit and pre-release criteria.

Overview

  1. Clone the repository
  2. Configure test environment
  3. Review tests
  4. Build test framework
  5. Execute tests

Clone the repository


 git clone git@github.com:mapr/drill-perf-test-framework.git
 

Refer to Github documentation on how to clone a repository.

Configure test environment

  1. The test framework requires a distributed file system such as HDFS or MapR-FS to be configured. It also requires that Drill services to be setup on a clustered environment. Refer to Drill documentation for details on how to setup Drill.
  2. Ensure passwordless SSH is enabled among the server nodes in the cluster.
  3. Ensure the following are installed:
    • clush

          yum --enablerepo=epel install clustershell
 
      also ensure that /etc/clustershell/group contains appropriate groups, such as 
       -- "all":  for all the nodes that will run drillbit
           -- "remoteDrillbits": for all the remote nodes that are running drillbits 
  
- dstat

          yum install -y dstat
 
  1. Edit PerfTestEnv.conf to set needed environmental variables
  2. Edit drillbits.lst to contain all the IPs of the drillbit nodes.
  3. Build the databases
    • Currently the kit includes data generation scripts for TPCH and TPCDS databases and some queries for those benchmark tests. See READMEs in TPCH/datagen and TPCDS/datagen for how to generate data and build database for those tests (only parquet files are implemented now).
    • If database is already built, ensure the connect string and workspaces are defined in storage plugin as specified in utils/dfs.json_Template.
  4. Copy stats collection scripts to remote drillbit nodes

   ./CopyScriptsToRemote.sh
 
  1. Build the driver

   cd driver
   ./buildDriver.sh
 

Review tests

Each test case is specified in a directory structure:


   benchmark_name (e.g., TPCH, TPCDS)
      |_ datagen
      |_ Queries
 

datagen contains the needed resources for building the database

Execute tests

  1. Edit params.conf to reflect what to be run.
  2. ./run.sh

logs and results

results will be located at log/<runid>_<gitCommitId>_<benchmark>_<timestamp>/ For each query the following metrics are collected, e.g.:


[STAT] Rows Fetched : 21842
[STAT] Time to load queries : 3 msec
[STAT] Time to register Driver : 632 msec
[STAT] Time to connect : 1045 msec
[STAT] Time to alter session : 0 msec
[STAT] Time to prep Statement  : 3 msec
[STAT] Time to execute query : 24818 msec
[STAT] Time to get query ID : 0 msec
[STAT] Time to fetch 1st Row : 36858 msec
[STAT] Time to fetch All Rows : 37180 msec
[STAT] Time to disconnect : 3 msec
[STAT] TOTAL TIME : 61998 msec
 

along with iostat, vmstat, mpstat, dstat, as well as jstack for Drillbit.

About

Performance Test Framework for Apache Drill

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published