Skip to content
Kaustav Chakravorty edited this page Sep 21, 2022 · 13 revisions

Welcome to the Diligent wiki!

What is Diligent?

Diligent is a tool we created at Flipkart for running performance experiments on our SQL databases.

Capabilities

Diligent provides the following key capabilities:

  • Predefined SQL Workloads: Diligent provides a set of predefined workloads. There are workloads for each of the for main SQL DML operations: insert, select, update, delete. For each DML operation Diligent provides workloads that operate with and without transactions. For select there are variants for lookup by primary key and and lookup by secondary key.
  • Study the Impact of Indexes: Diligent can run workloads on different tables. We can create similar tables that differ in the index definitions. By running insert workloads on these tables we can study the impact of adding / removing indexes on writes. By running select workloads on these tables by primary key and by secondary key we can study the impact of indexes on reads.
  • Study the Impact of Transactions: Diligent has workloads that issue individual statements, and those that issue multiple statements in a transaction. The number of statements in a transaction can be specified at runtime. This enables us to study the impact of executing statements with and without transactions, and also the impact of the number of statements in a transaction.
  • Study the Impact of Row Size: Diligent allows us to specify the number of rows in the dataset and the size of each row. We can set the row size to simulate the workload of a particular application we have in mind. We can run multiple experiments with different row sizes to study the impact of row size.
  • Horizontal Scalability: Diligent can use multiple nodes to run a workload. It can be scaled horizontally to generate more load. This is especially useful for distributed / sharded databases that can handle more scale. It also helps to validate load balancing aspects if any.
  • Visualisation Of Performance Characteristics: Diligent exposes the observed performance characteristics of the database as Prometheus metrics. Using Prometheus + Grafana along with Diligent allows us to visualise the performance characteristics over one or more runs. This provides richer information compared to just printing some summarised stats.
  • Report Generation: Diligent can produce an HTML report which has charts of the observed performance. This allows results to be captured and stored easily for future reference.
  • Support For Large Datasets: Diligent can generate workloads for multi-TB datasets.
  • Repeatability: Diligent captures the essential information about a dataset in a Dataspec file. By using the same Dataspec file we can repeatedly populate tables with exactly the same primary and secondary keys. This repeatability is useful in a variety of benchmarking scenarios such as: comparing the performance of the same database but tuned differently, the performance of different releases of the same database, and so on.
  • Automation: Diligent allows us to represent commonly run experiments as parameterized scripts. We can then execute these scripts as needed and get a report of the observed metrics.

Next Steps

To familiarize yourself with the key concepts related to Diligent, read the Concepts Page

To see how to put Diligent in action read the Quick Start Example Page