FuzzBerg

A database fuzzer (structured + black-box) for Apache Iceberg, and other file-format readers

Description

FuzzBerg was built to secure the launch of Firebolt Core and READ_ICEBERG, and helped us overcome the challenges of fuzzing complex database interfaces, such as Table Valued Functions and COPY_FROM.

It quickly proved its worth by discovering 5 critical bugs across all our TVF formats- including READ_ICEBERG.

Features

Fuzz data ingestion interfaces (e.g., COPY FROM, TVFs: read_iceberg(), read_csv(), read_parquet())
No need to write/maintain unit-level harnesses
Currently supported formats: Iceberg, CSV, Parquet
Easily extensible for new targets and file-formats

Note: Iceberg fuzzing is currently supported for S3-based readers only. Use a compatible S3 interface such as Minio to fuzz on Linux platforms.

Mutations are both structure-aware and randomised with libRadamsa (no coverage guidance), seeded by a Mersenne Twister PRNG.

Fuzz Your Database

Place target code under src/Databases/<database>.{cpp,h}
Add <database>.cpp to CMakeLists.txt
Implement a target DB class, and override the following base interfaces:
- DatabaseHandler::ForkTarget() : to launch target as a child of the fuzzer
- DatabaseHandler::fuzz() : call the relevant file-format fuzzer
Create a JSON file under queries/<database>/*.json listing relevant queries for your target.
- Only add queries for file-formats currently supported by the fuzzer (CSV, Parquet, Iceberg).

Build Instructions

Install libcurl4-openssl-dev (Ubuntu/Debian). See details.

Build with CMake & Ninja:

mkdir build && cd build
cmake -DCMAKE_BUILD_TYPE=RelWithDebInfo -DCMAKE_C_COMPILER=clang-18 -DCMAKE_CXX_COMPILER=clang++-18 -G Ninja ../
ninja -j<N> fuzzberg

Note: For efficient fuzzing, compile your target with AddressSanitizer. Also, fuzzing a Release build is recommended (where invariants like DCHECK is compiled out).

License

FuzzBerg is released under the Apache License 2.0. See the LICENSE file for details.

Usage

Usage: ./fuzzberg [OPTIONS]

Required:
  -d, --database NAME         Database name (e.g., duckdb, firebolt)
  -f, --format FORMAT         File format (csv, parquet, iceberg)
  -u, --url URL               Database server URL
  -i, --input DIR             Input corpus directory
  -o, --output DIR            Output (crash) directory
  -b, --bin PATH              Path to the target binary
  -m, --mutate FILE           Mutation payload file
  -q, --queries FILE          JSON file containing queries (see queries/<database>/*.json)
Optional:
  -t, --auth TOKEN            Authentication token (JWT)
  -B, --bucket BUCKET_NAME    S3 bucket name for Iceberg (required if --format=iceberg)

Fuzzing Examples

Firebolt `READ_ICEBERG()`

./fuzzberg \
  -i ./corpus_iceberg \
  -o ./crash \
  --database=firebolt \
  --bucket iceberg-fuzzing \
  --format=iceberg \
  --url=http://localhost:3473 \
  -m /data/minio/iceberg-fuzzing/metadata \
  -q fb_core_iceberg.json \
  -b ./firebolt-core

Sample output:

Adding query: SELECT * FROM READ_ICEBERG(url => 's3://iceberg-fuzzing/metadata/v3.metadata.json');
Loaded 1 queries from ./fb_core_iceberg.json
Checking connection to server...
starting up
...

******** Starting structured metadata fuzzing *********

Key: "current-snapshot-id", Original Value: 4676137652994606811, Mutated Value: 170141183460469231731687303715884105727


Query :  SELECT * FROM READ_ICEBERG(url => 's3://iceberg-fuzzing/metadata/v3.metadata.json');

Response: {
  "errors": [
    {
      "description": "Exception: Value too large."
    }
  ],
  "query": {
    "query_id": "c1c6a6c5-c612-438d-a574-ecc563303247",
    "query_label": null,
    "request_id": "54bd0463-c45b-448d-82ea-efd487c95e6e"
  },
  "statistics": {
    "elapsed": 0.0
  }
}

Key: "current-schema-id", Original Value: 0, Mutated Value: 128


Query :  SELECT * FROM READ_ICEBERG(url => 's3://iceberg-fuzzing/metadata/v3.metadata.json');

Response: {
  "errors": [
    {
      "description": "There is no schema with \"schema-id\" that matches \"current-schema-id\" in metadata"
    }
  ],
  "query": {
    "query_id": "28a2b98f-e599-4310-953e-372f00732aa0",
    "query_label": null,
    "request_id": "2ce997c2-5a20-43cc-88bf-7b78f5cae5a7"
  },
  "statistics": {
    "elapsed": 0.0
  }
}

Firebolt `READ_PARQUET()`

./fuzzberg \
  -i ./corpus_parquet \
  -o ./crash \
  --database=firebolt \
  --format=parquet \
  --url=http://localhost:3473 \
  -m /data/minio/black-box-fuzzer/ \
  -q fb_core_parquet.json \
  -b ./firebolt-core

Sample output:

Query : SELECT * FROM READ_PARQUET(url => 's3://black-box-fuzzer/fuzz.parquet');

Response: {
  "errors": [
    {
      "description": "Error reading column 'l_partkey' in row group 0 of 's3://black-box-fuzzer/fuzz.parquet': IOError: Corrupt snappy compressed data."
    }
  ],
  "query": {
    "query_id": "63be960d-b218-41b6-afa1-dd5590d2d781",
    "query_label": null,
    "request_id": "b0404488-579f-41d3-b8cd-6e6f30fe2689"
  },
  "statistics": {
    "elapsed": 0.016309347
  }
}

DuckDB `read_csv()` (with HTTP Server Extension)

./fuzzberg \
  -i ./corpus/csv \
  -o ./crash \
  --database=duckdb \
  --format=csv \
  --url=http://localhost:9999 \
  -m /tmp \
  -q duckdb_csv.json \
  -b ./duckdb-extension-httpserver/build/release/duckdb \
  -- \
  --ascii \
  --init /home/ubuntu/ddb/duckdb/init.sql \
  --batch

Sample output:

Adding query: SELECT * FROM read_csv('/tmp/fuzz.csv');

Adding query: SELECT * FROM read_csv('/tmp/fuzz.csv',header = true,delim = '|',allow_quoted_nulls = false, ignore_errors=false);

Loaded 2 queries from queries/duckdb_csv.json

Checking connection to server...

┌──────────────────────────────────────┐
│ httpserve_start('0.0.0.0', 9999, '') │
│               varchar                │
├──────────────────────────────────────┤
│ HTTP server started on 0.0.0.0:9999  │
└──────────────────────────────────────┘


Query : SELECT * FROM read_csv('/tmp/fuzz.csv',header = true,delim = '|',allow_quoted_nulls = false, ignore_errors=false);

Response: {"c9223372036854775809,c2,c3,c5,c5,c6,c7,c128,c9,c10,c11,c12,c13,c14,c15":"t,2�,,,,,,�,,,,,I ,,,c4294967296,c6,c212,c8,c9,c10,c�,c12,c13"}
{"c9223372036854775809,c2,c3,c5,c5,c6,c7,c128,c9,c10,c11,c12,c13,c14,c15":"e,QrUe,10,100,-32642,-263749625369741"}


Query : SELECT * FROM read_csv('/tmp/fuzz.csv');

Response: Invalid Input Error: CSV Error on Line: 1
Invalid unicode (byte sequence mismatch) detected. This file is not utf-8 encoded.

Possible Solution: Set the correct encoding, if available, to read this CSV File (e.g., encoding='UTF-16')
....

Reporting Bugs

If you discover a bug, please report it via GitHub Issues or contact the maintainers directly.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
queries		queries
src		src
third-party		third-party
.gitignore		.gitignore
.gitmodules		.gitmodules
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
fuzzberg-icon.png		fuzzberg-icon.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

FuzzBerg

Description

Features

Fuzz Your Database

Build Instructions

License

Usage

Fuzzing Examples

Firebolt `READ_ICEBERG()`

Sample output:

Firebolt `READ_PARQUET()`

Sample output:

DuckDB `read_csv()` (with HTTP Server Extension)

Sample output:

Reporting Bugs

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

firebolt-db/FuzzBerg

Folders and files

Latest commit

History

Repository files navigation

FuzzBerg

Description

Features

Fuzz Your Database

Build Instructions

License

Usage

Fuzzing Examples

Firebolt READ_ICEBERG()

Sample output:

Firebolt READ_PARQUET()

Sample output:

DuckDB read_csv() (with HTTP Server Extension)

Sample output:

Reporting Bugs

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Firebolt `READ_ICEBERG()`

Firebolt `READ_PARQUET()`

DuckDB `read_csv()` (with HTTP Server Extension)

Packages