Skip to content

Commit

Permalink
Bitcode: Add a string table to the bitcode format.
Browse files Browse the repository at this point in the history
Add a top-level STRTAB block containing a string table blob, and start storing
strings for module codes FUNCTION, GLOBALVAR, ALIAS, IFUNC and COMDAT in
the string table.

This change allows us to share names between globals and comdats as well
as between modules, and improves the efficiency of loading bitcode files by
no longer using a bit encoding for symbol names. Once we start writing the
irsymtab to the bitcode file we will also be able to share strings between
it and the module.

On my machine, link time for Chromium for Linux with ThinLTO decreases by
about 7% for no-op incremental builds or about 1% for full builds. Total
bitcode file size decreases by about 3%.

As discussed on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2017-April/111732.html

Differential Revision: https://reviews.llvm.org/D31838

llvm-svn: 300464
  • Loading branch information
pcc committed Apr 17, 2017
1 parent dc77b2e commit a0f371a
Show file tree
Hide file tree
Showing 21 changed files with 673 additions and 438 deletions.
42 changes: 38 additions & 4 deletions llvm/docs/BitCodeFormat.rst
Expand Up @@ -550,6 +550,8 @@ LLVM IR is defined with the following blocks:

* 17 --- `TYPE_BLOCK`_ --- This describes all of the types in the module.

* 23 --- `STRTAB_BLOCK`_ --- The bitcode file's string table.

.. _MODULE_BLOCK:

MODULE_BLOCK Contents
Expand Down Expand Up @@ -577,7 +579,7 @@ MODULE_CODE_VERSION Record
``[VERSION, version#]``

The ``VERSION`` record (code 1) contains a single value indicating the format
version. Versions 0 and 1 are supported at this time. The difference between
version. Versions 0, 1 and 2 are supported at this time. The difference between
version 0 and 1 is in the encoding of instruction operands in
each `FUNCTION_BLOCK`_.

Expand Down Expand Up @@ -620,6 +622,12 @@ as unsigned VBRs. However, forward references are rare, except in the
case of phi instructions. For phi instructions, operands are encoded as
`Signed VBRs`_ to deal with forward references.

In version 2, the meaning of module records ``FUNCTION``, ``GLOBALVAR``,
``ALIAS``, ``IFUNC`` and ``COMDAT`` change such that the first two operands
specify an offset and size of a string in a string table (see `STRTAB_BLOCK
Contents`_), the function name is removed from the ``FNENTRY`` record in the
value symbol table, and the top-level ``VALUE_SYMTAB_BLOCK`` may only contain
``FNENTRY`` records.

MODULE_CODE_TRIPLE Record
^^^^^^^^^^^^^^^^^^^^^^^^^
Expand Down Expand Up @@ -673,11 +681,14 @@ for each library name referenced.
MODULE_CODE_GLOBALVAR Record
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

``[GLOBALVAR, pointer type, isconst, initid, linkage, alignment, section, visibility, threadlocal, unnamed_addr, externally_initialized, dllstorageclass, comdat]``
``[GLOBALVAR, strtab offset, strtab size, pointer type, isconst, initid, linkage, alignment, section, visibility, threadlocal, unnamed_addr, externally_initialized, dllstorageclass, comdat]``

The ``GLOBALVAR`` record (code 7) marks the declaration or definition of a
global variable. The operand fields are:

* *strtab offset*, *strtab size*: Specifies the name of the global variable.
See `STRTAB_BLOCK Contents`_.

* *pointer type*: The type index of the pointer type used to point to this
global variable

Expand Down Expand Up @@ -755,11 +766,14 @@ global variable. The operand fields are:
MODULE_CODE_FUNCTION Record
^^^^^^^^^^^^^^^^^^^^^^^^^^^

``[FUNCTION, type, callingconv, isproto, linkage, paramattr, alignment, section, visibility, gc, prologuedata, dllstorageclass, comdat, prefixdata, personalityfn]``
``[FUNCTION, strtab offset, strtab size, type, callingconv, isproto, linkage, paramattr, alignment, section, visibility, gc, prologuedata, dllstorageclass, comdat, prefixdata, personalityfn]``

The ``FUNCTION`` record (code 8) marks the declaration or definition of a
function. The operand fields are:

* *strtab offset*, *strtab size*: Specifies the name of the function.
See `STRTAB_BLOCK Contents`_.

* *type*: The type index of the function type describing this function

* *callingconv*: The calling convention number:
Expand Down Expand Up @@ -817,11 +831,14 @@ function. The operand fields are:
MODULE_CODE_ALIAS Record
^^^^^^^^^^^^^^^^^^^^^^^^

``[ALIAS, alias type, aliasee val#, linkage, visibility, dllstorageclass, threadlocal, unnamed_addr]``
``[ALIAS, strtab offset, strtab size, alias type, aliasee val#, linkage, visibility, dllstorageclass, threadlocal, unnamed_addr]``

The ``ALIAS`` record (code 9) marks the definition of an alias. The operand
fields are

* *strtab offset*, *strtab size*: Specifies the name of the alias.
See `STRTAB_BLOCK Contents`_.

* *alias type*: The type index of the alias

* *aliasee val#*: The value index of the aliased value
Expand Down Expand Up @@ -1300,3 +1317,20 @@ METADATA_ATTACHMENT Contents
----------------------------

The ``METADATA_ATTACHMENT`` block (id 16) ...

.. _STRTAB_BLOCK:

STRTAB_BLOCK Contents
---------------------

The ``STRTAB`` block (id 23) contains a single record (``STRTAB_BLOB``, id 1)
with a single blob operand containing the bitcode file's string table.

Strings in the string table are not null terminated. A record's *strtab
offset* and *strtab size* operands specify the byte offset and size of a
string within the string table.

The string table is used by all preceding blocks in the bitcode file that are
not succeeded by another intervening ``STRTAB`` block. Normally a bitcode
file will have a single string table, but it may have more than one if it
was created by binary concatenation of multiple bitcode files.
4 changes: 4 additions & 0 deletions llvm/include/llvm/Bitcode/BitcodeReader.h
Expand Up @@ -46,6 +46,9 @@ namespace llvm {
ArrayRef<uint8_t> Buffer;
StringRef ModuleIdentifier;

// The string table used to interpret this module.
StringRef Strtab;

// The bitstream location of the IDENTIFICATION_BLOCK.
uint64_t IdentificationBit;

Expand All @@ -70,6 +73,7 @@ namespace llvm {
StringRef getBuffer() const {
return StringRef((const char *)Buffer.begin(), Buffer.size());
}
StringRef getStrtab() const { return Strtab; }

StringRef getModuleIdentifier() const { return ModuleIdentifier; }

Expand Down
14 changes: 14 additions & 0 deletions llvm/include/llvm/Bitcode/BitcodeWriter.h
Expand Up @@ -15,6 +15,7 @@
#define LLVM_BITCODE_BITCODEWRITER_H

#include "llvm/IR/ModuleSummaryIndex.h"
#include "llvm/MC/StringTableBuilder.h"
#include <string>

namespace llvm {
Expand All @@ -26,12 +27,25 @@ namespace llvm {
SmallVectorImpl<char> &Buffer;
std::unique_ptr<BitstreamWriter> Stream;

StringTableBuilder StrtabBuilder{StringTableBuilder::RAW};
bool WroteStrtab = false;

void writeBlob(unsigned Block, unsigned Record, StringRef Blob);

public:
/// Create a BitcodeWriter that writes to Buffer.
BitcodeWriter(SmallVectorImpl<char> &Buffer);

~BitcodeWriter();

/// Write the bitcode file's string table. This must be called exactly once
/// after all modules have been written.
void writeStrtab();

/// Copy the string table for another module into this bitcode file. This
/// should be called after copying the module itself into the bitcode file.
void copyStrtab(StringRef Strtab);

/// Write the specified module to the buffer specified at construction time.
///
/// If \c ShouldPreserveUseListOrder, encode the use-list order for each \a
Expand Down
14 changes: 12 additions & 2 deletions llvm/include/llvm/Bitcode/LLVMBitCodes.h
Expand Up @@ -22,7 +22,7 @@

namespace llvm {
namespace bitc {
// The only top-level block type defined is for a module.
// The only top-level block types are MODULE, IDENTIFICATION and STRTAB.
enum BlockIDs {
// Blocks
MODULE_BLOCK_ID = FIRST_APPLICATION_BLOCKID,
Expand Down Expand Up @@ -52,7 +52,9 @@ enum BlockIDs {

OPERAND_BUNDLE_TAGS_BLOCK_ID,

METADATA_KIND_BLOCK_ID
METADATA_KIND_BLOCK_ID,

STRTAB_BLOCK_ID,
};

/// Identification block contains a string that describes the producer details,
Expand Down Expand Up @@ -232,6 +234,10 @@ enum GlobalValueSummarySymtabCodes {
// llvm.type.checked.load intrinsic with all constant integer arguments.
// [typeid, offset, n x arg]
FS_TYPE_CHECKED_LOAD_CONST_VCALL = 15,
// Assigns a GUID to a value ID. This normally appears only in combined
// summaries, but it can also appear in per-module summaries for PGO data.
// [valueid, guid]
FS_VALUE_GUID = 16,
};

enum MetadataCodes {
Expand Down Expand Up @@ -550,6 +556,10 @@ enum ComdatSelectionKindCodes {
COMDAT_SELECTION_KIND_SAME_SIZE = 5,
};

enum StrtabCodes {
STRTAB_BLOB = 1,
};

} // End bitc namespace
} // End llvm namespace

Expand Down

0 comments on commit a0f371a

Please sign in to comment.