Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time


deequ.NET codecov Nuget NuGet

⚠️Warning: The library is still in alpha, and it is not fully tested.

deequ.NET is a port of the awslabs/deequ library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets. deequ.NET runs on dotnet/spark.

Requirements and Installation

deequ.NET runs on Apache Spark and depends on dotnet/spark. Therefore it is required to install the following dependencies locally:

It is also necessary to install the Microsoft.Spark.Worker on your local machine and configure the path into the PATH env var. For a detailed instructions, see dotnet/spark - Getting started


The following example implements a set of checks on some records and it submits the execution using the spark-submit command.

  • Use the dotnet CLI to create a console application:

    dotnet new console -o DeequExample
  • Install Microsoft.Spark and the deequ Nuget packages into the project:

    cd DeequExample
    dotnet add package Microsoft.Spark
    dotnet add package deequ
  • Replace the contents of the Program.cs file with the following code:

    using deequ;
    using deequ.Checks;
    using deequ.Extensions;
    using Microsoft.Spark.Sql;
    namespace DeequExample
        class Program
            static void Main(string[] args)
                SparkSession spark = SparkSession.Builder().GetOrCreate();
                DataFrame data = spark.Read().Json("inventory.json");
                VerificationResult verificationResult = new VerificationSuite()
                        new Check(CheckLevel.Error, "integrity checks")
                            .HasSize(value => value == 5)
                            .IsContainedIn("priority", new[] { "high", "low" })
                        new Check(CheckLevel.Warning, "distribution checks")
                            .ContainsURL("description", value => value >= .5)
  • Use the dotnet CLI to build the application:

    dotnet build

Running the example

  • Open your terminal and navigate into your app folder.

    cd <your-app-output-directory>
  • Create inventory.json with the following content:

    {"id":1, "productName":"Thingy A", "description":"awesome thing.", "priority":"high", "numViews":0}
    {"id":2, "productName":"Thingy B", "description":"available at","priority":null, "numViews":0}
    {"id":3, "productName":"Thingy C", "description": null, "priority":"low", "numViews":5}
    {"id":4, "productName":"Thingy D", "description": "checkout", "priority":"low","numViews": 10}
    {"id":5, "productName":"Thingy E", "description":null, "priority":"high","numViews": 12}
  • Run your app.

    spark-submit \
        --class org.apache.spark.deploy.dotnet.DotnetRunner \
        --master local \
        microsoft-spark-2.4.x-<version>.jar \
    dotnet DeequExample.dll

    Note: This command requires Apache Spark in your PATH environment variable to be able to use spark-submit. For detailed instructions, you can see Building .NET for Apache Spark from Source on Ubuntu.

  • The output of the application should look similar to the output below:

         _                         _   _ ______ _______
        | |                       | \ | |  ____|__   __|
      __| | ___  ___  __ _ _   _  |  \| | |__     | |
     / _` |/ _ \/ _ \/ _` | | | | | . ` |  __|    | |
    | (_| |  __/  __/ (_| | |_| |_| |\  | |____   | |
     \__,_|\___|\___|\__, |\__,_(_)_| \_|______|  |_|
                        | |

More examples

The following list shows more examples/showcases of the deequ.NET API:



Sebastian Schelter, Dustin Lange, Philipp Schmidt, Meltem Celikel, Felix Biessmann, and Andreas Grafberger. 2018. Automating large-scale data quality verification. Proc. VLDB Endow. 11, 12 (August 2018), 1781-1794.


deequ.NET is a port of the awslabs/deequ library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.