Skip to content

pr-1632/listx/trailer-api-refactor-part-1-v4

This patch series is the first 10 patches of a larger cleanup/bugfix series
(henceforth "larger series") I've been working on. The main goal of this
series is to begin the process of "libifying" the trailer API. By "API" I
mean the interface exposed in trailer.h. The larger series brings a number
of additional cleanups (exposing and fixing some bugs along the way), and
builds on top of this series.

When the larger series is merged, we will be in a good state to additionally
pursue the following goals:

 1. "API reuse inside Git": make the API expressive enough to eliminate any
    need by other parts of Git to use the interpret-trailers builtin as a
    subprocess (instead they could just use the API directly);
 2. "API stability": add unit tests to codify the expected behavior of API
    functions; and
 3. "API documentation": create developer-focused documentation to explain
    how to use the API effectively, noting any API limitations or
    anti-patterns.

The reason why the larger series itself doesn't tackle these goals directly
is because I believe that API code should be thought from the ground up with
a libification-focused perspective. Some structs and functions exposed in
the API today should probably not be libified (read: kept in trailer.h) as
is. For example, the "trailer_iterator" struct has a "private" member and it
feels wrong to allow API users to peek inside here (and take at face value
our future API users' pinky promise that they won't depend on those private
internals not meant for public consumption).

One pattern we could use here to cleanly separate "what is the API"
(publicly exposed) and "what is the implementation" (private) is the
pointer-to-implementation ("pimpl") idiom. There may be other appropriate
patterns, but I've chosen this one because it's a simple, low-level concept
(put structs in foo.c instead of foo.h), which has far-reaching high-level
consequences (API users must rely exclusively on the API to make use of such
private structs, via opaque pointers). The pimpl idiom for C comes from the
book "C Interfaces and Implementations" (see patch "trailer: make
trailer_info struct private").

The idea of turning a public struct into a private one is a fundamental
question of libification because it forces us to reconsider all of the data
structures we have and how they're actually used by already existing users.
For the trailer API, those existing users are the "interpret-trailers"
builtin command, and anything else that includes the "trailer.h" header file
(e.g., sequencer.c). One advantage of this idiom is that even the compiler
understands it --- the compiler will loudly complain if you try to access
the innards of a private struct through an opaque pointer.

Another advantage of this idiom is that it helps to reduce the probability
of breaking changes in the API. Because a private struct's members are out
of view from our users (they only know about opaque pointers to the private
struct, not its members), we are free to modify the members of the struct at
any time, as much as we like, as long as we don't break the semantics of the
exposed API functions (which is why unit-testing these API functions will be
crucial long-term).

If this pimpl idiom turns out to be a mistake, undoing it is easy --- just
move the relevant struct definition from foo.c to the header file. So it's a
great way to try things out without digging ourselves into a pit of despair
that will be difficult to get out of.

With the libification-focused goals out of the way, let's turn to this patch
series in more detail.

Currently, we have "process_trailers()" in trailer.h which does many
different things (parse command-line arguments, create temporary files, etc)
that are independent of the concept of "trailers". Keeping this function as
an API function would make unit-testing it difficult. While there is no
technical reason why we couldn't write unit tests for the smaller functions
that are called within process_trailers(), doing so would involve testing
private ("static" in trailer.c) functions instead of API functions, which
defeats the goal of "API stability" mentioned earlier above.

As an alternative to how things are done in this patch series, we could keep
trailer.h intact and decide to unit-test the existing "trailer_info_get()"
function which does most of the trailer parsing work (and is used by
sequencer.c). However this function wouldn't be easy to test either, because
the resulting "trailer_info" struct merely contains the unparsed "trailers"
lines. So the unit test (if it wants to inspect the result of parsing these
lines) would have to invoke additional parsing functions itself. And at that
point it would no longer be a unit test in the traditional sense, because it
would be invoking multiple functions at once.

In summary this series breaks up "process_trailers()" into smaller pieces,
exposing many of the parts relevant to trailer-related processing in
trailer.h. This will force us to eventually introduce unit tests for these
API functions, but that is a good thing for API stability.

In the future after libification is "complete", users external to Git will
be able to use the same trailer processing API used by the
interpret-trailers builtin. For example, a web server may want to parse
trailers the same way that Git would parse them, without having to call
interpret-trailers as a subprocess. This use case was the original
motivation behind my work in this area.

Thanks to the aggressive refactoring in this series, I've been able to
identify and fix several bugs in our existing implementation. Those fixes
build on top of this series but were not included here, in order to keep
this series small. Below is a "shortlog" of those fixes I have locally:

 * "trailer: trailer replacement should not change its position" (If we
   found a trailer we'd like to replace, preserve its position relative to
   the other trailers found in the trailer block, instead of always moving
   it to the beginning or end of the entire trailer block.)
 * "interpret-trailers: preserve trailers coming from the input" (Sometimes,
   the parsed trailers from the input will be formatted differently
   depending on whether we provide --only-trailers or not. Make the trailers
   that were not modified and which are coming directly from the input get
   formatted the same way, regardless of this flag.)
 * "interpret-trailers: do not modify the input if NOP" (Refrain from
   subtracting or adding a newline around the patch divider "---" if we are
   not adding new trailers.)
 * "trailer formatter: split up format_trailer() monolith" (Fix a bug in
   git-log where we still printed a blank newline even if we didn't want to
   format anything.)
 * "interpret-trailers: fail if given unrecognized arguments" (E.g., for
   "--where", only accept recognized WHERE_* enum values. If we get
   something unrecognized, fail with an error instead of silently doing
   nothing. Ditto for "--if-exists" and "--if-missing".)

Notable changes in v4
=====================

 * Patches 3, 4, 5, and 8 have been broken up into smaller steps. There are
   28 instead of 10 patches now, but these 28 should be much easier to
   review than the (previously condensed) 10.
 * NEW Patch 1: "trailer: free trailer_info after all related usage" fixes
   awkward use-after-free coding style
 * NEW Patch 2: "shortlog: add test for de-duplicating folded trailers"
   increases test coverage related to trailer iterators and "unfold_value()"
 * NEW Patch 27: "trailer_set_*(): put out parameter at the end" is a small
   refactor to reorder parameters.
 * Patches 5-16: These smaller patches make up Patch 3 from v3.
 * Patches 17-18: These smaller patches make up Patch 4 from v3.
 * Patches 19-20: These smaller patches make up Patch 5 from v3.
 * Patches 23-26: These smaller patches make up Patch 8 from v3.
 * Anonymize unambiguous parameters in <trailer.h>.

Notable changes in v3
=====================

 * Squashed Patch 4 into Patch 3 ("trailer: unify trailer formatting
   machinery"), to avoid breaking the build ("-Werror=unused-function"
   violations)
 * NEW (Patch 10): Introduce "trailer template" terminology for readability
   (no behavioral change)
 * (API function) Rename default_separators() to
   trailer_default_separators()
 * (API function) Rename new_trailers_clear() to free_trailer_templates()
 * trailer.h: for single-parameter functions, anonymize the parameter name
   to reduce verbosity

Notable changes in v2
=====================

 * (cover letter) Discuss goals of the larger series in more detail,
   especially the pimpl idiom
 * (cover letter) List bug fixes pending in the larger series that depend on
   this series
 * Reorder function parameters to have trailer options at the beginning (and
   out parameters toward the end)
 * "sequencer: use the trailer iterator": prefer C string instead of strbuf
   for new "raw" field
 * Patch 1 (was Patch 2) also renames ensure_configured() to
   trailer_config_init() (forgot to rename this one previously)

Linus Arver (28):
  trailer: free trailer_info _after_ all related usage
  shortlog: add test for de-duplicating folded trailers
  trailer: prepare to expose functions as part of API
  trailer: move interpret_trailers() to interpret-trailers.c
  trailer: start preparing for formatting unification
  trailer_info_get(): reorder parameters
  format_trailers(): use strbuf instead of FILE
  format_trailer_info(): move "fast path" to caller
  format_trailers_from_commit(): indirectly call trailer_info_get()
  format_trailer_info(): use trailer_item objects
  format_trailer_info(): drop redundant unfold_value()
  format_trailer_info(): append newline for non-trailer lines
  trailer: begin formatting unification
  format_trailer_info(): teach it about opts->trim_empty
  format_trailer_info(): avoid double-printing the separator
  trailer: finish formatting unification
  trailer: teach iterator about non-trailer lines
  sequencer: use the trailer iterator
  trailer: make trailer_info struct private
  trailer: retire trailer_info_get() from API
  trailer: spread usage of "trailer_block" language
  trailer: prepare to delete "parse_trailers_from_command_line_args()"
  trailer: add new helper functions to API
  trailer_add_arg_item(): drop new_trailer_item usage
  trailer: deprecate "new_trailer_item" struct from API
  trailer: unify "--trailer ..." arg handling
  trailer_set_*(): put out parameter at the end
  trailer: introduce "template" term for readability

 builtin/interpret-trailers.c | 189 ++++++--
 pretty.c                     |   2 +-
 ref-filter.c                 |   2 +-
 sequencer.c                  |  27 +-
 t/t4201-shortlog.sh          |  32 ++
 trailer.c                    | 811 +++++++++++++++++------------------
 trailer.h                    | 109 ++---
 7 files changed, 642 insertions(+), 530 deletions(-)

base-commit: a54a84b333adbecf7bc4483c0e36ed5878cac17b

Submitted-As: https://lore.kernel.org/git/pull.1632.v4.git.1707196348.gitgitgadget@gmail.com
In-Reply-To: https://lore.kernel.org/git/pull.1632.git.1704869487.gitgitgadget@gmail.com
In-Reply-To: https://lore.kernel.org/git/pull.1632.v2.git.1706308737.gitgitgadget@gmail.com
In-Reply-To: https://lore.kernel.org/git/pull.1632.v3.git.1706664144.gitgitgadget@gmail.com
Assets 2