-
Notifications
You must be signed in to change notification settings - Fork 10.7k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
A new basic block ordering improving existing MachineBlockPlacement. The algorithm tries to find a layout of nodes (basic blocks) of a given CFG optimizing jump locality and thus processor I-cache utilization. This is achieved via increasing the number of fall-through jumps and co-locating frequently executed nodes together. The name follows the underlying optimization problem, Extended-TSP, which is a generalization of classical (maximum) Traveling Salesmen Problem. The algorithm is a greedy heuristic that works with chains (ordered lists) of basic blocks. Initially all chains are isolated basic blocks. On every iteration, we pick a pair of chains whose merging yields the biggest increase in the ExtTSP value, which models how i-cache "friendly" a specific chain is. A pair of chains giving the maximum gain is merged into a new chain. The procedure stops when there is only one chain left, or when merging does not increase ExtTSP. In the latter case, the remaining chains are sorted by density in decreasing order. An important aspect is the way two chains are merged. Unlike earlier algorithms (e.g., based on the approach of Pettis-Hansen), two chains, X and Y, are first split into three, X1, X2, and Y. Then we consider all possible ways of gluing the three chains (e.g., X1YX2, X1X2Y, X2X1Y, X2YX1, YX1X2, YX2X1) and choose the one producing the largest score. This improves the quality of the final result (the search space is larger) while keeping the implementation sufficiently fast. Differential Revision: https://reviews.llvm.org/D113424
- Loading branch information
Showing
6 changed files
with
1,867 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,58 @@ | ||
//===- CodeLayout.h - Code layout/placement algorithms ---------*- C++ -*-===// | ||
// | ||
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. | ||
// See https://llvm.org/LICENSE.txt for license information. | ||
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception | ||
// | ||
//===----------------------------------------------------------------------===// | ||
// | ||
/// \file | ||
/// Declares methods and data structures for code layout algorithms. | ||
// | ||
//===----------------------------------------------------------------------===// | ||
|
||
#ifndef LLVM_TRANSFORMS_UTILS_CODELAYOUT_H | ||
#define LLVM_TRANSFORMS_UTILS_CODELAYOUT_H | ||
|
||
#include "llvm/ADT/DenseMap.h" | ||
|
||
#include <vector> | ||
|
||
namespace llvm { | ||
|
||
class MachineBasicBlock; | ||
|
||
/// Find a layout of nodes (basic blocks) of a given CFG optimizing jump | ||
/// locality and thus processor I-cache utilization. This is achieved via | ||
/// increasing the number of fall-through jumps and co-locating frequently | ||
/// executed nodes together. | ||
/// The nodes are assumed to be indexed by integers from [0, |V|) so that the | ||
/// current order is the identity permutation. | ||
/// \p NodeSizes: The sizes of the nodes (in bytes). | ||
/// \p NodeCounts: The execution counts of the nodes in the profile. | ||
/// \p EdgeCounts: The execution counts of every edge (jump) in the profile. The | ||
/// map also defines the edges in CFG and should include 0-count edges. | ||
/// \returns The best block order found. | ||
std::vector<uint64_t> applyExtTspLayout( | ||
const std::vector<uint64_t> &NodeSizes, | ||
const std::vector<uint64_t> &NodeCounts, | ||
const DenseMap<std::pair<uint64_t, uint64_t>, uint64_t> &EdgeCounts); | ||
|
||
/// Estimate the "quality" of a given node order in CFG. The higher the score, | ||
/// the better the order is. The score is designed to reflect the locality of | ||
/// the given order, which is anti-correlated with the number of I-cache misses | ||
/// in a typical execution of the function. | ||
double calcExtTspScore( | ||
const std::vector<uint64_t> &Order, const std::vector<uint64_t> &NodeSizes, | ||
const std::vector<uint64_t> &NodeCounts, | ||
const DenseMap<std::pair<uint64_t, uint64_t>, uint64_t> &EdgeCounts); | ||
|
||
/// Estimate the "quality" of the current node order in CFG. | ||
double calcExtTspScore( | ||
const std::vector<uint64_t> &NodeSizes, | ||
const std::vector<uint64_t> &NodeCounts, | ||
const DenseMap<std::pair<uint64_t, uint64_t>, uint64_t> &EdgeCounts); | ||
|
||
} // end namespace llvm | ||
|
||
#endif // LLVM_TRANSFORMS_UTILS_CODELAYOUT_H |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.