Skip to content

Commit

Permalink
[ELF] Implement --build-id={md5,sha1} with truncated BLAKE3
Browse files Browse the repository at this point in the history
--build-id was introduced as "approximation of true uniqueness across all
binaries that might be used by overlapping sets of people". It does not require
the some resistance mentioned below. In practice, people just use --build-id=md5
for 16-byte build ID and --build-id=sha1 for 20-byte build ID.

BLAKE3 has 256-bit key length, which provides 128-bit security against
(second-)preimage, collision, and differentiability attacks. Its portable
implementation is fast. It additionally provides Arm Neon/AVX2/AVX-512. Just
implement --build-id={md5,sha1} with truncated BLAKE3.

Linking clang 14 RelWithDebInfo with --threads=8 on a Skylake CPU:

* 1.13x as fast with --build-id=md5
* 1.15x as fast with --build-id=sha1

--threads=4 on Apple m1:

* 1.25x as fast with --build-id=md5
* 1.17x as fast with --build-id=sha1

Reviewed By: ikudrin

Differential Revision: https://reviews.llvm.org/D121531
  • Loading branch information
MaskRay committed Mar 24, 2022
1 parent 418ecab commit d3e5b6f
Show file tree
Hide file tree
Showing 3 changed files with 14 additions and 9 deletions.
13 changes: 9 additions & 4 deletions lld/ELF/Writer.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -25,10 +25,9 @@
#include "lld/Common/Filesystem.h"
#include "lld/Common/Strings.h"
#include "llvm/ADT/StringMap.h"
#include "llvm/Support/MD5.h"
#include "llvm/Support/BLAKE3.h"
#include "llvm/Support/Parallel.h"
#include "llvm/Support/RandomNumberGenerator.h"
#include "llvm/Support/SHA1.h"
#include "llvm/Support/TimeProfiler.h"
#include "llvm/Support/xxhash.h"
#include <climits>
Expand Down Expand Up @@ -2925,6 +2924,12 @@ template <class ELFT> void Writer<ELFT>::writeBuildId() {
MutableArrayRef<uint8_t> output(buildId.get(), hashSize);
llvm::ArrayRef<uint8_t> input{Out::bufferStart, size_t(fileSize)};

// Fedora introduced build ID as "approximation of true uniqueness across all
// binaries that might be used by overlapping sets of people". It does not
// need some security goals that some hash algorithms strive to provide, e.g.
// (second-)preimage and collision resistance. In practice people use 'md5'
// and 'sha1' just for different lengths. Implement them with the more
// efficient BLAKE3.
switch (config->buildId) {
case BuildIdKind::Fast:
computeHash(output, input, [](uint8_t *dest, ArrayRef<uint8_t> arr) {
Expand All @@ -2933,12 +2938,12 @@ template <class ELFT> void Writer<ELFT>::writeBuildId() {
break;
case BuildIdKind::Md5:
computeHash(output, input, [&](uint8_t *dest, ArrayRef<uint8_t> arr) {
memcpy(dest, MD5::hash(arr).data(), hashSize);
memcpy(dest, BLAKE3::hash<16>(arr).data(), hashSize);
});
break;
case BuildIdKind::Sha1:
computeHash(output, input, [&](uint8_t *dest, ArrayRef<uint8_t> arr) {
memcpy(dest, SHA1::hash(arr).data(), hashSize);
memcpy(dest, BLAKE3::hash<20>(arr).data(), hashSize);
});
break;
case BuildIdKind::Uuid:
Expand Down
8 changes: 4 additions & 4 deletions lld/test/ELF/build-id.s
Original file line number Diff line number Diff line change
Expand Up @@ -69,11 +69,11 @@ _start:

# MD5: Contents of section .note.gnu.build-id:
# MD5-NEXT: 04000000 10000000 03000000 474e5500 ............GNU.
# MD5-NEXT: 7b00fd9e 054ceb4b 06f64d0e 482cb476
# MD5-NEXT: dbf0bc13 b3ff11e9 fde6e17c 0304983c

# SHA1: Contents of section .note.gnu.build-id:
# SHA1-NEXT: 04000000 14000000 03000000 474e5500 ............GNU.
# SHA1-NEXT: 221a99da dd1d2bf3 05e48a91 dde8a0cb
# SHA1-NEXT: 1215775f d3b60050 70afd970 e8a10972

# UUID: Contents of section .note.gnu.build-id:
# UUID-NEXT: 04000000 10000000 03000000 474e5500 ............GNU.
Expand All @@ -89,11 +89,11 @@ _start:

# SEPARATE: Hex dump of section '.note.gnu.build-id':
# SEPARATE-NEXT: 0x00200198 04000000 14000000 03000000 474e5500
# SEPARATE-NEXT: 0x002001a8 96820adf d90d5470 0a0c32ff a88c4017
# SEPARATE-NEXT: 0x002001a8 5cd067a4 2631c0fd 42029037 4b8e0938

# RUN: ld.lld --build-id=sha1 --no-rosegment %t -o %t2
# RUN: llvm-readelf -x .note.gnu.build-id %t2 | FileCheck --check-prefix=NORO %s

# NORO: Hex dump of section '.note.gnu.build-id':
# NORO-NEXT: 0x00200160 04000000 14000000 03000000 474e5500
# NORO-NEXT: 0x00200170 cf6d7b3a 0b3297c3 5b47c079 ce048349
# NORO-NEXT: 0x00200170 a328cc99 45bfc3fc a9fc8615 37102f9d
2 changes: 1 addition & 1 deletion lld/test/ELF/partition-notes.s
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@
// CHECK-NEXT: Owner: GNU
// CHECK-NEXT: Data size:
// CHECK-NEXT: Type: NT_GNU_BUILD_ID (unique build ID bitstring)
// CHECK-NEXT: Build ID: bb5542bd74252653e286044980d602874d237ae0
// CHECK-NEXT: Build ID: ab81108a3d85b729980356331fddc2bfc4c10177{{$}}
// CHECK-NEXT: }
// CHECK-NEXT: }
// CHECK-NEXT: ]
Expand Down

0 comments on commit d3e5b6f

Please sign in to comment.