Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CGData][MachineOutliner] Global Outlining #90074

Draft
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

kyulee-com
Copy link
Contributor

@kyulee-com kyulee-com commented Apr 25, 2024

This commit introduces support for outlining functions across modules using codegen data generated from previous codegen. The codegen data currently manages the outlined hash tree, which records outlining instances that occurred locally in the past.

The machine outliner now operates in one of three modes:

  1. CGDataMode::None: This is the default outliner mode that uses the suffix tree to identify (local) outlining candidates within a module. This mode is also used by (full)LTO to maintain optimal behavior with the combined module.
  2. CGDataMode::Write (-codegen-data-generate): This mode is identical to the default mode, but it also publishes the stable hash sequences of instructions in the outlined functions into a local outlined hash tree. It then encodes this into the __llvm_outline section, which will be dead-stripped at link time.
  3. CGDataMode::Read (-codegen-data-use-path={.cgdata}): This mode reads a codegen data file (.cgdata) and initializes a global outlined hash tree. This tree is used to generate global outlining candidates. Note that the codegen data file has been post-processed with the raw __llvm_outline sections from all native objects using the llvm-cgdata tool (or a linker, LLD, or a new ThinLTO pipeline later).

This is a patch for https://discourse.llvm.org/t/rfc-enhanced-machine-outliner-part-2-thinlto-nolto/78753.

xuanzh-meta and others added 7 commits April 26, 2024 12:58
This defines the OutlinedHashTree class.
It contains sequences of stable hash values of instructions that have been outlined.
This OutlinedHashTree can be used to track the outlined instruction sequences across modules.
A trie structure is used in its implementation, allowing for a compact sharing of common prefixes.
The llvm-cgdata tool has been introduced to handle reading and writing of codegen data. This data includes an optimistic codegen summary that can be utilized to enhance subsequent codegen. Currently, the tool supports saving and restoring the outlined hash tree, facilitating machine function outlining across modules. Additional codegen summaries can be incorporated into separate sections as required. This patch primarily establishes basic support for the reader and writer, similar to llvm-profdata.

The high-level operations of llvm-cgdata are as follows:
1. It reads local raw codegen data from a custom section (for example, __llvm_outline)  embedded in native binary files
2. It merges local raw codegen data into an indexed codegen data, complete with a suitable header.
3. It handles reading and writing of the indexed codegen data into a standalone file.
This commit introduces support for outlining functions across modules using codegen data generated from previous codegen. The codegen data currently manages the outlined hash tree, which records outlining instances that occurred locally in the past.

The machine outliner now operates in one of three modes:
1. CGDataMode::None: This is the default outliner mode that uses the suffix tree to identify (local) outlining candidates within a module. This mode is also used by (full)LTO to maintain optimal behavior with the combined module.
2. CGDataMode::Write (`codegen-data-generate`): This mode is identical to the default mode, but it also publishes the stable hash sequences of instructions in the outlined functions into a local outlined hash tree. It then encodes this into the `__llvm_outline` section, which will be dead-stripped at link time.
3. CGDataMode::Read (`codegen-data-use-path={.cgdata}`): This mode reads a codegen data file (.cgdata) and initializes a global outlined hash tree. This tree is used to generate global outlining candidates. Note that the codegen data file has been post-processed with the raw `__llvm_outline` sections from all native objects using the `llvm-cgdata` tool (or a linker, `LLD`, or a new ThinLTO pipeline later).
@kyulee-com kyulee-com changed the title [MachineOutliner][CGData] Global Outlining [CGData][MachineOutliner] Global Outlining May 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants