The purpose of a global sink is to work around compiler optimizations that gets rid of the entire append. It still demonstrates my point that appending to a slice with sufficient capacity is dramatically faster as individual appends rather than a single one.
Also, whether multiple small appends is faster for empty slices is an implementation detail. The transformation can compute the total size of all the appended strings and call runtime.growslice before calling the series of appends. Thus, the first-time allocation for append is identical to BenchmarkNaive, but avoids the allocations that come from constructing the concatenated string.