Skip to content

FuzzBerg is a database fuzzer for Iceberg and other file-format readers. It holistically fuzzes query driven, ingestion interfaces such as COPY FROM and Table Valued Functions.

License

Notifications You must be signed in to change notification settings

firebolt-db/FuzzBerg

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FuzzBerg Icon FuzzBerg

A database fuzzer (structured + black-box) for Apache Iceberg, and other file-format readers


Description

FuzzBerg was built to secure the launch of Firebolt Core and READ_ICEBERG, and helped us overcome the challenges of fuzzing complex database interfaces, such as Table Valued Functions and COPY_FROM.

It quickly proved its worth by discovering 5 critical bugs across all our TVF formats- including READ_ICEBERG.


Features


  • Fuzz data ingestion interfaces (e.g., COPY FROM, TVFs: read_iceberg(), read_csv(), read_parquet())
  • No need to write/maintain unit-level harnesses
  • Currently supported formats: Iceberg, CSV, Parquet
  • Easily extensible for new targets and file-formats

Note: Iceberg fuzzing is currently supported for S3-based readers only. Use a compatible S3 interface such as Minio to fuzz on Linux platforms.

Mutations are both structure-aware and randomised with libRadamsa (no coverage guidance), seeded by a Mersenne Twister PRNG.


Fuzz Your Database


  • Place target code under src/Databases/<database>.{cpp,h}
  • Add <database>.cpp to CMakeLists.txt
  • Implement a target DB class, and override the following base interfaces:
    • DatabaseHandler::ForkTarget() : to launch target as a child of the fuzzer
    • DatabaseHandler::fuzz() : call the relevant file-format fuzzer
  • Create a JSON file under queries/<database>/*.json listing relevant queries for your target.
    • Only add queries for file-formats currently supported by the fuzzer (CSV, Parquet, Iceberg).

Build Instructions


  1. Install libcurl4-openssl-dev (Ubuntu/Debian). See details.
  2. Build with CMake & Ninja:
    mkdir build && cd build
    cmake -DCMAKE_BUILD_TYPE=RelWithDebInfo -DCMAKE_C_COMPILER=clang-18 -DCMAKE_CXX_COMPILER=clang++-18 -G Ninja ../
    ninja -j<N> fuzzberg

Note: For efficient fuzzing, compile your target with AddressSanitizer. Also, fuzzing a Release build is recommended (where invariants like DCHECK is compiled out).


License

FuzzBerg is released under the Apache License 2.0. See the LICENSE file for details.


Usage


Usage: ./fuzzberg [OPTIONS]

Required:
  -d, --database NAME         Database name (e.g., duckdb, firebolt)
  -f, --format FORMAT         File format (csv, parquet, iceberg)
  -u, --url URL               Database server URL
  -i, --input DIR             Input corpus directory
  -o, --output DIR            Output (crash) directory
  -b, --bin PATH              Path to the target binary
  -m, --mutate FILE           Mutation payload file
  -q, --queries FILE          JSON file containing queries (see queries/<database>/*.json)
Optional:
  -t, --auth TOKEN            Authentication token (JWT)
  -B, --bucket BUCKET_NAME    S3 bucket name for Iceberg (required if --format=iceberg)

Fuzzing Examples


./fuzzberg \
  -i ./corpus_iceberg \
  -o ./crash \
  --database=firebolt \
  --bucket iceberg-fuzzing \
  --format=iceberg \
  --url=http://localhost:3473 \
  -m /data/minio/iceberg-fuzzing/metadata \
  -q fb_core_iceberg.json \
  -b ./firebolt-core

Sample output:

Adding query: SELECT * FROM READ_ICEBERG(url => 's3://iceberg-fuzzing/metadata/v3.metadata.json');
Loaded 1 queries from ./fb_core_iceberg.json
Checking connection to server...
starting up
...

******** Starting structured metadata fuzzing *********

Key: "current-snapshot-id", Original Value: 4676137652994606811, Mutated Value: 170141183460469231731687303715884105727


Query :  SELECT * FROM READ_ICEBERG(url => 's3://iceberg-fuzzing/metadata/v3.metadata.json');

Response: {
  "errors": [
    {
      "description": "Exception: Value too large."
    }
  ],
  "query": {
    "query_id": "c1c6a6c5-c612-438d-a574-ecc563303247",
    "query_label": null,
    "request_id": "54bd0463-c45b-448d-82ea-efd487c95e6e"
  },
  "statistics": {
    "elapsed": 0.0
  }
}

Key: "current-schema-id", Original Value: 0, Mutated Value: 128


Query :  SELECT * FROM READ_ICEBERG(url => 's3://iceberg-fuzzing/metadata/v3.metadata.json');

Response: {
  "errors": [
    {
      "description": "There is no schema with \"schema-id\" that matches \"current-schema-id\" in metadata"
    }
  ],
  "query": {
    "query_id": "28a2b98f-e599-4310-953e-372f00732aa0",
    "query_label": null,
    "request_id": "2ce997c2-5a20-43cc-88bf-7b78f5cae5a7"
  },
  "statistics": {
    "elapsed": 0.0
  }
}

./fuzzberg \
  -i ./corpus_parquet \
  -o ./crash \
  --database=firebolt \
  --format=parquet \
  --url=http://localhost:3473 \
  -m /data/minio/black-box-fuzzer/ \
  -q fb_core_parquet.json \
  -b ./firebolt-core

Sample output:

Query : SELECT * FROM READ_PARQUET(url => 's3://black-box-fuzzer/fuzz.parquet');

Response: {
  "errors": [
    {
      "description": "Error reading column 'l_partkey' in row group 0 of 's3://black-box-fuzzer/fuzz.parquet': IOError: Corrupt snappy compressed data."
    }
  ],
  "query": {
    "query_id": "63be960d-b218-41b6-afa1-dd5590d2d781",
    "query_label": null,
    "request_id": "b0404488-579f-41d3-b8cd-6e6f30fe2689"
  },
  "statistics": {
    "elapsed": 0.016309347
  }
}

DuckDB read_csv() (with HTTP Server Extension)

./fuzzberg \
  -i ./corpus/csv \
  -o ./crash \
  --database=duckdb \
  --format=csv \
  --url=http://localhost:9999 \
  -m /tmp \
  -q duckdb_csv.json \
  -b ./duckdb-extension-httpserver/build/release/duckdb \
  -- \
  --ascii \
  --init /home/ubuntu/ddb/duckdb/init.sql \
  --batch

Sample output:

Adding query: SELECT * FROM read_csv('/tmp/fuzz.csv');

Adding query: SELECT * FROM read_csv('/tmp/fuzz.csv',header = true,delim = '|',allow_quoted_nulls = false, ignore_errors=false);

Loaded 2 queries from queries/duckdb_csv.json

Checking connection to server...

┌──────────────────────────────────────┐
│ httpserve_start('0.0.0.0', 9999, '') │
│               varchar                │
├──────────────────────────────────────┤
│ HTTP server started on 0.0.0.0:9999  │
└──────────────────────────────────────┘


Query : SELECT * FROM read_csv('/tmp/fuzz.csv',header = true,delim = '|',allow_quoted_nulls = false, ignore_errors=false);

Response: {"c9223372036854775809,c2,c3,c5,c5,c6,c7,c128,c9,c10,c11,c12,c13,c14,c15":"t,2�,,,,,,�,,,,,I ,,,c4294967296,c6,c212,c8,c9,c10,c�,c12,c13"}
{"c9223372036854775809,c2,c3,c5,c5,c6,c7,c128,c9,c10,c11,c12,c13,c14,c15":"e,QrUe,10,100,-32642,-263749625369741"}


Query : SELECT * FROM read_csv('/tmp/fuzz.csv');

Response: Invalid Input Error: CSV Error on Line: 1
Invalid unicode (byte sequence mismatch) detected. This file is not utf-8 encoded.

Possible Solution: Set the correct encoding, if available, to read this CSV File (e.g., encoding='UTF-16')
....

Reporting Bugs


If you discover a bug, please report it via GitHub Issues or contact the maintainers directly.

About

FuzzBerg is a database fuzzer for Iceberg and other file-format readers. It holistically fuzzes query driven, ingestion interfaces such as COPY FROM and Table Valued Functions.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published