Skip to content

size_t underflow in COVER_ctx_init when ZDICT_optimizeTrainFromBuffer_cover invoked with params.d=0 leads to multi-EB malloc #4682

@OwenSanzas

Description

@OwenSanzas

Summary

COVER_ctx_init() in lib/dictBuilder/cover.c computes a partial-suffix-array
size as trainingSamplesSize - MAX(d, sizeof(U64)) + 1 (a size_t subtraction)
without first checking that trainingSamplesSize is large enough. The function
does guard against an undersized corpus, but it guards on totalSamplesSize,
while it uses trainingSamplesSize (a fraction controlled by params.splitPoint)
for the subtraction. With splitPoint < 1.0 and a small corpus the two
quantities diverge and the subtraction underflows, producing
(size_t)-2 (= 0xfffffffffffffffe). The next malloc request then becomes
roughly 16 EB, which ASan rejects as allocation-size-too-big and which in
production builds returns NULL (followed by a SEGV at the next dereference).

The path is most easily reached from ZDICT_optimizeTrainFromBuffer_cover,
because that entry point documents params.d == 0 (and params.k == 0) as
"let optimize pick" and then iterates internally over candidate d values.
A library consumer that forwards zero-initialized ZDICT_cover_params_t to
optimize — exactly what the documentation invites — triggers the bug as soon
as the corpus is small enough.

The zstd CLI is not affected: --train-cover validates and clamps cover
parameters before calling the API.

Root Cause

lib/dictBuilder/cover.c, COVER_ctx_init() (function defined at line 628).

The sanity check on the corpus uses totalSamplesSize:

// lib/dictBuilder/cover.c:641
  if (totalSamplesSize < MAX(d, sizeof(U64)) ||
      totalSamplesSize >= (size_t)COVER_MAX_SAMPLES_SIZE) {
    ...
    return ERROR(srcSize_wrong);
  }

But the subtraction immediately below operates on trainingSamplesSize, which
is a fraction (splitPoint) of totalSamplesSize:

// lib/dictBuilder/cover.c:669-670
  ctx->suffixSize = trainingSamplesSize - MAX(d, sizeof(U64)) + 1;
  ctx->suffix = (U32 *)malloc(ctx->suffixSize * sizeof(U32));

When splitPoint is the default for optimize (0.5 after splitPoint <= 0.0 ? COVER_DEFAULT_SPLITPOINT and consumer overrides) and the corpus is small,
trainingSamplesSize < MAX(d, sizeof(U64)) even though totalSamplesSize
passes the check. The subtraction wraps to (size_t)-N, and malloc() is
called with (size_t)-N * sizeof(U32) ≈ 16 EB.

How params.d == 0 makes this trivial to hit:
ZDICT_optimizeTrainFromBuffer_cover (cover.c:1197) maps a zero params.d
to a search range:

// lib/dictBuilder/cover.c:1206-1207
  const unsigned kMinD = parameters->d == 0 ? 6 : parameters->d;
  const unsigned kMaxD = parameters->d == 0 ? 8 : parameters->d;

and then iterates for (d = kMinD; d <= kMaxD; d += 2) (line 1254), calling
COVER_ctx_init(..., d, splitPoint, ...) (line 1261) for each candidate. There
is no precondition on the corpus size before the loop, so a caller that hands
optimize a tiny sample set — perfectly valid as far as the documented "let
optimize pick" contract is concerned — drives the inner function into the
underflowing branch.

PoC

Trigger file

crash_input is a 24-byte blob consumed by the AGF harness. The harness
parses the bytes as [8-byte header][16 bytes of sample data], then derives
nbSamples (≥5), per-sample sizes (summing to 16), and the cover parameters
from the header bits. The specific decoded shape that crashes:

  • nbSamples = 10, ten one-byte samples
  • params.d = 0 (zero-initialized — selects the optimize search)
  • params.k = 0 (likewise)
  • params.splitPoint = 0.5
  • dictBufferCapacity = 1360

The header bit pattern leaves d and k at zero so the harness enters the
optimize code path with a very small corpus.

How to generate

Any input that ends up calling ZDICT_optimizeTrainFromBuffer_cover with

  • nbSamples ≥ 5
  • per-sample sizes such that COVER_sum(first nbTrainSamples) < 8 (e.g. five
    to ten 1-byte samples with splitPoint = 0.5)
  • zero-initialized params.d and params.k

will reproduce. Minimal recipe: feed N=10 one-byte samples to optimize with
zeroed params (see app_realworld.c below).


Trigger Method 1: Direct libzstd API

app_realworld.c — a minimal libzstd consumer that calls the documented
optimize entry point with zero-initialized parameters and ten one-byte samples
(a realistic edge case: training on very short log lines).

/* app_realworld.c */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define ZDICT_STATIC_LINKING_ONLY
#include <zdict.h>

int main(void)
{
    const size_t N = 10;
    unsigned char samples[N];
    size_t sizes[N];
    for (size_t i = 0; i < N; ++i) {
        samples[i] = (unsigned char)('A' + i);
        sizes[i]   = 1;
    }

    const size_t dictCap = 1360;
    void *dict = malloc(dictCap);

    ZDICT_cover_params_t params;
    memset(&params, 0, sizeof(params));
    /* Documented as "let optimize pick" — but underflows downstream. */
    params.d          = 0;
    params.k          = 0;
    params.steps      = 0;
    params.nbThreads  = 1;
    params.splitPoint = 0.50;
    params.zParams.compressionLevel = 0;

    size_t r = ZDICT_optimizeTrainFromBuffer_cover(
        dict, dictCap, samples, sizes, (unsigned)N, &params);
    fprintf(stderr, "ZDICT_optimizeTrainFromBuffer_cover returned: %zu (isError=%d)\n",
            r, ZDICT_isError(r));
    free(dict);
    return 0;
}

Build:

ZSTD_SRC=/path/to/zstd
clang -O1 -g -fsanitize=address \
    -I"$ZSTD_SRC/lib" -I"$ZSTD_SRC/lib/dictBuilder" \
    -DZDICT_STATIC_LINKING_ONLY \
    app_realworld.c "$ZSTD_SRC/lib/libzstd.a" -pthread -o app_realworld

Run:

ASAN_OPTIONS=detect_leaks=0 ./app_realworld

Output:

==ERROR: AddressSanitizer: requested allocation size 0xfffffffffffffff8
   exceeds maximum supported size of 0x10000000000
    #0 0x...... in malloc
    #1 0x...... in COVER_ctx_init lib/dictBuilder/cover.c:670:24
    #2 0x...... in ZDICT_optimizeTrainFromBuffer_cover lib/dictBuilder/cover.c:1261
    #3 0x...... in main app_realworld.c:...
SUMMARY: AddressSanitizer: allocation-size-too-big in malloc

Without ASan, malloc() returns NULL and the next access in COVER_ctx_init
SEGVs.


Trigger Method 2: Fuzzer (AGF harness)

simple_compress.c — the libFuzzer harness used by AGF. It deterministically
parses the input into samples and cover parameters and calls
ZDICT_optimizeTrainFromBuffer_cover.

#include <stddef.h>
#include <stdint.h>
#include <stdlib.h>
#include <string.h>

#define ZDICT_STATIC_LINKING_ONLY
#include "zdict.h"

#define MIN_DICT_CAPACITY 256U
#define MAX_DICT_CAPACITY 2048U
#define MAX_SAMPLE_BYTES 4096U
#define MAX_NB_SAMPLES 32U
#define HEADER_SIZE 8U

static size_t min_size(size_t a, size_t b) { return a < b ? a : b; }

int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size)
{
    if (size < HEADER_SIZE + 8) return 0;

    size_t sampleBytes = min_size(size - HEADER_SIZE, MAX_SAMPLE_BYTES);
    if (sampleBytes < 8) return 0;
    const uint8_t *samplesBuffer = data + HEADER_SIZE;

    unsigned nbSamples = 5U + (unsigned)(data[0] % (MAX_NB_SAMPLES - 4U));
    if ((size_t)nbSamples > sampleBytes) nbSamples = (unsigned)sampleBytes;
    if (nbSamples < 5U) return 0;

    size_t dictBufferCapacity = MIN_DICT_CAPACITY
        + ((size_t)data[1] | ((size_t)data[2] << 8))
          % (MAX_DICT_CAPACITY - MIN_DICT_CAPACITY + 1U);

    void *dictBuffer = malloc(dictBufferCapacity);
    size_t *samplesSizes = (size_t *)malloc((size_t)nbSamples * sizeof(samplesSizes[0]));
    if (!dictBuffer || !samplesSizes) { free(dictBuffer); free(samplesSizes); return 0; }

    /* Build nbSamples non-empty samples whose sizes sum to sampleBytes. */
    size_t remaining = sampleBytes - nbSamples;
    for (unsigned i = 0; i + 1U < nbSamples; ++i) {
        size_t add = remaining
            ? (size_t)data[HEADER_SIZE + (i % sampleBytes)] % (remaining + 1U)
            : 0;
        samplesSizes[i] = 1U + add;
        remaining -= add;
    }
    samplesSizes[nbSamples - 1U] = 1U + remaining;

    ZDICT_cover_params_t params;
    memset(&params, 0, sizeof(params));
    if ((data[3] & 1U) == 0)
        params.d = 6U + 2U * (unsigned)((data[4] >> 4) % 3U);   /* otherwise 0 */
    if ((data[3] & 2U) == 0) {
        unsigned minK = params.d == 0 ? 8U : params.d;
        unsigned range = (unsigned)(dictBufferCapacity - minK + 1U);
        params.k = minK + (unsigned)(data[4] % range);
    }
    if ((data[3] & 4U) == 0) params.steps = 1U + (unsigned)(data[5] % 4U);
    params.nbThreads = 1U;
    if ((data[3] & 8U) != 0 && nbSamples >= 10U)
        params.splitPoint = 0.50 + ((double)(data[6] % 51U) / 100.0);
    else if ((data[3] & 16U) != 0)
        params.splitPoint = 1.0;
    params.zParams.compressionLevel = (int)(data[7] % 10U);

    (void)ZDICT_optimizeTrainFromBuffer_cover(dictBuffer, dictBufferCapacity,
                                              samplesBuffer, samplesSizes,
                                              nbSamples, &params);
    free(samplesSizes);
    free(dictBuffer);
    return 0;
}

Build:

clang++ -fsanitize=fuzzer,address -g -O1 \
    -I"$ZSTD_SRC/lib" -I"$ZSTD_SRC/lib/dictBuilder" \
    -DZDICT_STATIC_LINKING_ONLY \
    simple_compress.c "$ZSTD_SRC/lib/libzstd.a" -pthread -o cover_fuzzer

Run on crash_input:

ASAN_OPTIONS=detect_leaks=0 ./cover_fuzzer crash_input

Same allocation-size-too-big at lib/dictBuilder/cover.c:670 in
COVER_ctx_init.


Impact

Aspect Details
Type Denial of Service (~16 EB malloc; abort under ASan, NULL-deref / SEGV otherwise)
Severity Low
Attack Vector Local; library consumer asks for dictionary optimization via the documented "auto" mode (params.d == 0 / params.k == 0) on a small corpus
Affected Components libzstd dictBuilder (lib/dictBuilder/cover.c, ZDICT_optimizeTrainFromBuffer_coverCOVER_ctx_init). Not the zstd CLI — --train-cover validates parameters before the API call
Reachability Programmatic only: applications that pass zero-initialized ZDICT_cover_params_t (or any small d paired with a small corpus and splitPoint < 1.0) into the optimize entry point
CWE CWE-190 (Integer Overflow/Underflow) and CWE-789 (Memory Allocation with Excessive Size Value)

Suggested Fix

The cleanest local fix is in COVER_ctx_init itself: gate the subtraction on
the same quantity that participates in it. In lib/dictBuilder/cover.c around
line 641, change the precondition from totalSamplesSize to
trainingSamplesSize (or add a second check), so the function bails out with
ERROR(srcSize_wrong) instead of underflowing at line 669:

/* lib/dictBuilder/cover.c, COVER_ctx_init */
  if (trainingSamplesSize < MAX(d, sizeof(U64)) ||
      totalSamplesSize    < MAX(d, sizeof(U64)) ||
      totalSamplesSize    >= (size_t)COVER_MAX_SAMPLES_SIZE) {
    DISPLAYLEVEL(1, "Total samples size is too small/large (...)");
    return ERROR(srcSize_wrong);
  }
  ...
  ctx->suffixSize = trainingSamplesSize - MAX(d, sizeof(U64)) + 1;

Equivalently, clamp the subtraction:

  if (trainingSamplesSize < MAX(d, sizeof(U64)) + 1)
    return ERROR(srcSize_wrong);
  ctx->suffixSize = trainingSamplesSize - MAX(d, sizeof(U64)) + 1;

A complementary hardening at the API boundary, in
ZDICT_optimizeTrainFromBuffer_cover (cover.c:1197), would be to require
that COVER_sum(samplesSizes, nbSamples) * splitPoint >= kMaxD before
entering the for (d = kMinD; d <= kMaxD; d += 2) loop, so the "auto" mode is
rejected up front when the corpus is too small to support any candidate d.

Both changes are small and local; either alone closes the underflow.


Credit

Aisle Research (Ze Sheng, Dmitrijs Trizna, Luigino Camastra, Guido Vranken)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions