Skip to content

[c++] avrogencpp: emit deterministic include guard#399

Merged
travisdowns merged 2 commits into
masterfrom
td-stable-avrogen-include-guard
May 15, 2026
Merged

[c++] avrogencpp: emit deterministic include guard#399
travisdowns merged 2 commits into
masterfrom
td-stable-avrogen-include-guard

Conversation

@travisdowns
Copy link
Copy Markdown
Member

Summary

CodeGen::guard() (lang/c++/impl/avrogencpp.cc:770) was suffixing the generated header's include guard with the output of boost::mt19937 seeded from ::time(nullptr) — producing a different guard on every invocation:

#ifndef FOO_AVROGEN_H_3350718792_H
#ifndef FOO_AVROGEN_H_2362587291_H

This makes generated headers non-deterministic by design, which is a problem for build systems that key their cache on input-content digests (Bazel remote cache, Nix store paths, etc.). On byte-identical schemas they nonetheless see every consumer of the generated header invalidate its action key on every invocation, forcing the entire downstream chain to rebuild.

Discovered while doing hermeticity work on Redpanda — manifest_file.avrogen.h showed up as a root non-hermetic action in a two-output-base bazel build comparison, then cascaded through manifest_list_avro.cc.o → iceberg lib → the //src/v/redpanda:redpanda binary, blowing the PGO-instrument cache on the overnight CI builds.

Why this is safe

headerFile_ is already a unique-per-output path (the path of the file being written). makeCanonical(h, true) turns that path into a valid C identifier, which by itself is a fine guard name. The RNG-derived suffix only added entropy, not uniqueness — there's no scenario where two avrogen runs producing headers at the same output path are supposed to coexist with conflicting guards.

After this change the guard is <canonicalised-path>_H, deterministic across builds.

Upstream

Same change being proposed against apache/avro so users on a stock upstream cppgen pick it up too. Sister PR in redpanda-data/redpanda: #30487 (consumes the fix via a bazel patch on this branch as a stopgap until this lands).

CodeGen::guard() in avrogencpp.cc was suffixing the generated header's
include guard with the output of boost::mt19937 seeded from
::time(nullptr). That produced a different guard on every avrogen
invocation, e.g.:

  #ifndef FOO_AVROGEN_H_3350718792_H
  #ifndef FOO_AVROGEN_H_2362587291_H

Two consequences:

1. Generated headers were non-deterministic. Repeated runs on the same
   schema produced different bytes.

2. Build systems that key their cache on input-content digests
   (e.g. Bazel's remote cache, the Nix store) saw every consumer of the
   generated header miss the cache on every build, even when the schema
   was byte-identical. In a hermetic two-output-base Bazel build of
   Redpanda this surfaced as a chain of cascade rebuilds starting at
   manifest_file.avrogen.h and propagating through every .cc that
   included it.

headerFile_ is already guaranteed-unique per output. The random suffix
doesn't add uniqueness, only entropy. Mirrors the same change being
proposed upstream at apache/avro.
@travisdowns
Copy link
Copy Markdown
Member Author

also upstream: apache#3778

@travisdowns
Copy link
Copy Markdown
Member Author

@copilot review

Copy link
Copy Markdown

Copilot AI commented May 14, 2026

@copilot review

The change is straightforward and correct. One thing the original commit left behind: after removing the RNG suffix from guard(), the boost::mt19937 random_ member was declared and initialized but never used. I cleaned that up in 96c2f51:

  • Removed boost::mt19937 random_; member field
  • Removed random_(static_cast<uint32_t>(::time(nullptr))) from the constructor initializer list
  • Removed #include <boost/random/mersenne_twister.hpp>
  • Removed #include <ctime> (and its now-empty #ifndef _WIN32 guard) — only needed to provide ::time() for the RNG seed

Copy link
Copy Markdown
Member

@dotnwat dotnwat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any clue why the random value existed? i hit a brick wall git blaming back to the 15 year old commit that added it.

@travisdowns
Copy link
Copy Markdown
Member Author

any clue why the random value existed? i hit a brick wall git blaming back to the 15 year old commit that added it.

I guess it's because you can pass any small relative path to the generator, depending on how the build works, so you might have a/foo.h and b/foo.h and pass -o foo.h in both cases with a and b being CWD respectively, in which cause the guard would clash.

In our build we pass full paths (relative to the repo root) so that shouldn't be a problem, but I guess it does mean the change may not be accepted upstream.

Another way to solve this would be to have an option to use #pragma once.

@travisdowns travisdowns merged commit bdeab3a into master May 15, 2026
1 check passed
@travisdowns travisdowns deleted the td-stable-avrogen-include-guard branch May 15, 2026 17:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants