Skip to content

Commit

Permalink
[NFC][Bitstream] Improve the dumpability of bitstream/bitcode headers
Browse files Browse the repository at this point in the history
The `LLVMBitCodes.h` header contains various enums that are updated whenever LLVM's bitcode fundamentally changes. It would be nice to track these changes in a semi-automated way, so that external tools that attempt to parse LLVM's bitstream and bitcode can remain in sync.

Before this change, `LLVMBitCodes.h` had a single dependency -- it needed the `FIRST_APPLICATION_BLOCKID` enum value from `BitCodes.h`. `BitCodes.h`, in turn, had a whole tree of include dependencies that boiled down to `llvm-config.h`, meaning that it was impossible to dump the AST of either file without having a partial or full LLVM build tree already present.

To eliminate that requirement, this patch introduces a new leaf-only header, `BitCodeEnums.h`, which includes the "core" enums originally in `BitCodes.h`. `LLVMBitCodes.h` and `BitCodes.h` both include this new header in turn, preserving the current header relationships while allowing `LLVMBitCodes.h` to be dumped fully independently with a command like this (run from the repository root):

```
clang -fsyntax-only -x c++ -Illvm/include -Xclang -ast-dump=json -Xclang -ast-dump-filter -Xclang llvm::bitc::BlockIDs llvm/include/llvm/Bitcode/LLVMBitCodes.h
```

I recognize that this is a pretty unusual change and perhaps not a guarantee that the LLVM authors would like to make in the general case (i.e., that individual files within LLVM can have their AST dumped with minimal dependencies). However, I believe the criticality/limited scope of the file(s) in this patch warrants an exception. Please let me know if there's any other information I can provide, or anything else I can do to improve this patch!

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D108438
  • Loading branch information
woodruffw authored and teresajohnson committed Apr 5, 2022
1 parent 96e9b6c commit d81b014
Show file tree
Hide file tree
Showing 3 changed files with 96 additions and 66 deletions.
5 changes: 4 additions & 1 deletion llvm/include/llvm/Bitcode/LLVMBitCodes.h
Expand Up @@ -17,7 +17,10 @@
#ifndef LLVM_BITCODE_LLVMBITCODES_H
#define LLVM_BITCODE_LLVMBITCODES_H

#include "llvm/Bitstream/BitCodes.h"
// This is the only file included, and it, in turn, is a leaf header.
// This allows external tools to dump the AST of this file and analyze it for
// changes without needing to fully or partially build LLVM itself.
#include "llvm/Bitstream/BitCodeEnums.h"

namespace llvm {
namespace bitc {
Expand Down
90 changes: 90 additions & 0 deletions llvm/include/llvm/Bitstream/BitCodeEnums.h
@@ -0,0 +1,90 @@
//===- BitCodeEnums.h - Core enums for the bitstream format -----*- C++ -*-===//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//
//
// This header defines "core" bitstream enum values.
// It has been separated from the other header that defines bitstream enum
// values, BitCodes.h, to allow tools to track changes to the various
// bitstream and bitcode enums without needing to fully or partially build
// LLVM itself.
//
// The enum values defined in this file should be considered permanent. If
// new features are added, they should have values added at the end of the
// respective lists.
//
//===----------------------------------------------------------------------===//

#ifndef LLVM_BITSTREAM_BITCODEENUMS_H
#define LLVM_BITSTREAM_BITCODEENUMS_H

namespace llvm {
/// Offsets of the 32-bit fields of bitstream wrapper header.
enum BitstreamWrapperHeader : unsigned {
BWH_MagicField = 0 * 4,
BWH_VersionField = 1 * 4,
BWH_OffsetField = 2 * 4,
BWH_SizeField = 3 * 4,
BWH_CPUTypeField = 4 * 4,
BWH_HeaderSize = 5 * 4
};

namespace bitc {
enum StandardWidths {
BlockIDWidth = 8, // We use VBR-8 for block IDs.
CodeLenWidth = 4, // Codelen are VBR-4.
BlockSizeWidth = 32 // BlockSize up to 2^32 32-bit words = 16GB per block.
};

// The standard abbrev namespace always has a way to exit a block, enter a
// nested block, define abbrevs, and define an unabbreviated record.
enum FixedAbbrevIDs {
END_BLOCK = 0, // Must be zero to guarantee termination for broken bitcode.
ENTER_SUBBLOCK = 1,

/// DEFINE_ABBREV - Defines an abbrev for the current block. It consists
/// of a vbr5 for # operand infos. Each operand info is emitted with a
/// single bit to indicate if it is a literal encoding. If so, the value is
/// emitted with a vbr8. If not, the encoding is emitted as 3 bits followed
/// by the info value as a vbr5 if needed.
DEFINE_ABBREV = 2,

// UNABBREV_RECORDs are emitted with a vbr6 for the record code, followed by
// a vbr6 for the # operands, followed by vbr6's for each operand.
UNABBREV_RECORD = 3,

// This is not a code, this is a marker for the first abbrev assignment.
FIRST_APPLICATION_ABBREV = 4
};

/// StandardBlockIDs - All bitcode files can optionally include a BLOCKINFO
/// block, which contains metadata about other blocks in the file.
enum StandardBlockIDs {
/// BLOCKINFO_BLOCK is used to define metadata about blocks, for example,
/// standard abbrevs that should be available to all blocks of a specified
/// ID.
BLOCKINFO_BLOCK_ID = 0,

// Block IDs 1-7 are reserved for future expansion.
FIRST_APPLICATION_BLOCKID = 8
};

/// BlockInfoCodes - The blockinfo block contains metadata about user-defined
/// blocks.
enum BlockInfoCodes {
// DEFINE_ABBREV has magic semantics here, applying to the current SETBID'd
// block, instead of the BlockInfo block.

BLOCKINFO_CODE_SETBID = 1, // SETBID: [blockid#]
BLOCKINFO_CODE_BLOCKNAME = 2, // BLOCKNAME: [name]
BLOCKINFO_CODE_SETRECORDNAME = 3 // BLOCKINFO_CODE_SETRECORDNAME:
// [id, name]
};

} // namespace bitc
} // namespace llvm

#endif
67 changes: 2 additions & 65 deletions llvm/include/llvm/Bitstream/BitCodes.h
Expand Up @@ -19,75 +19,12 @@

#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/StringExtras.h"
#include "llvm/Bitstream/BitCodeEnums.h"
#include "llvm/Support/DataTypes.h"
#include "llvm/Support/ErrorHandling.h"
#include <cassert>

namespace llvm {
/// Offsets of the 32-bit fields of bitstream wrapper header.
enum BitstreamWrapperHeader : unsigned {
BWH_MagicField = 0 * 4,
BWH_VersionField = 1 * 4,
BWH_OffsetField = 2 * 4,
BWH_SizeField = 3 * 4,
BWH_CPUTypeField = 4 * 4,
BWH_HeaderSize = 5 * 4
};

namespace bitc {
enum StandardWidths {
BlockIDWidth = 8, // We use VBR-8 for block IDs.
CodeLenWidth = 4, // Codelen are VBR-4.
BlockSizeWidth = 32 // BlockSize up to 2^32 32-bit words = 16GB per block.
};

// The standard abbrev namespace always has a way to exit a block, enter a
// nested block, define abbrevs, and define an unabbreviated record.
enum FixedAbbrevIDs {
END_BLOCK = 0, // Must be zero to guarantee termination for broken bitcode.
ENTER_SUBBLOCK = 1,

/// DEFINE_ABBREV - Defines an abbrev for the current block. It consists
/// of a vbr5 for # operand infos. Each operand info is emitted with a
/// single bit to indicate if it is a literal encoding. If so, the value is
/// emitted with a vbr8. If not, the encoding is emitted as 3 bits followed
/// by the info value as a vbr5 if needed.
DEFINE_ABBREV = 2,

// UNABBREV_RECORDs are emitted with a vbr6 for the record code, followed by
// a vbr6 for the # operands, followed by vbr6's for each operand.
UNABBREV_RECORD = 3,

// This is not a code, this is a marker for the first abbrev assignment.
FIRST_APPLICATION_ABBREV = 4
};

/// StandardBlockIDs - All bitcode files can optionally include a BLOCKINFO
/// block, which contains metadata about other blocks in the file.
enum StandardBlockIDs {
/// BLOCKINFO_BLOCK is used to define metadata about blocks, for example,
/// standard abbrevs that should be available to all blocks of a specified
/// ID.
BLOCKINFO_BLOCK_ID = 0,

// Block IDs 1-7 are reserved for future expansion.
FIRST_APPLICATION_BLOCKID = 8
};

/// BlockInfoCodes - The blockinfo block contains metadata about user-defined
/// blocks.
enum BlockInfoCodes {
// DEFINE_ABBREV has magic semantics here, applying to the current SETBID'd
// block, instead of the BlockInfo block.

BLOCKINFO_CODE_SETBID = 1, // SETBID: [blockid#]
BLOCKINFO_CODE_BLOCKNAME = 2, // BLOCKNAME: [name]
BLOCKINFO_CODE_SETRECORDNAME = 3 // BLOCKINFO_CODE_SETRECORDNAME:
// [id, name]
};

} // End bitc namespace

/// BitCodeAbbrevOp - This describes one or more operands in an abbreviation.
/// This is actually a union of two different things:
/// 1. It could be a literal integer value ("the operand is always 17").
Expand Down Expand Up @@ -183,6 +120,6 @@ class BitCodeAbbrev {
OperandList.push_back(OpInfo);
}
};
} // End llvm namespace
} // namespace llvm

#endif

0 comments on commit d81b014

Please sign in to comment.