Skip to content

Commit

Permalink
[llvm-exegesis] Opcode stabilization / reclusterization (PR40715)
Browse files Browse the repository at this point in the history
Summary:
Given an instruction `Opcode`, we can make benchmarks (measurements) of the
instruction characteristics/performance. Then, to facilitate further analysis
we group the benchmarks with *similar* characteristics into clusters.
Now, this is all not entirely deterministic. Some instructions have variable
characteristics, depending on their arguments. And thus, if we do several
benchmarks of the same instruction `Opcode`, we may end up with *different*
performance characteristics measurements. And when we then do clustering,
these several benchmarks of the same instruction `Opcode` may end up being
clustered into *different* clusters. This is not great for further analysis.

We shall find every `Opcode` with benchmarks not in just one cluster, and move
*all* the benchmarks of said `Opcode` into one new unstable cluster per `Opcode`.

I have solved this by making `ClusterId` a bit field, adding a `IsUnstable` bit,
and introducing `-analysis-display-unstable-clusters` switch to toggle between
displaying stable-only clusters and unstable-only clusters.

The reclusterization is deterministically stable, produces identical reports
between runs. (Or at least that is what i'm seeing, maybe it isn't)

Timings/comparisons:
old (current trunk/head) {F8303582}
```
$ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-old.html
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 43970 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-old.html'
...
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 43970 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-old.html'

 Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-old.html' (25 runs):

           6624.73 msec task-clock                #    0.999 CPUs utilized            ( +-  0.53% )
               172      context-switches          #   25.965 M/sec                    ( +- 29.89% )
                 0      cpu-migrations            #    0.042 M/sec                    ( +- 56.54% )
             31073      page-faults               # 4690.754 M/sec                    ( +-  0.08% )
       26538711696      cycles                    # 4006230.292 GHz                   ( +-  0.53% )  (83.31%)
        2017496807      stalled-cycles-frontend   #    7.60% frontend cycles idle     ( +-  0.93% )  (83.32%)
       13403650062      stalled-cycles-backend    #   50.51% backend cycles idle      ( +-  0.33% )  (33.37%)
       19770706799      instructions              #    0.74  insn per cycle
                                                  #    0.68  stalled cycles per insn  ( +-  0.04% )  (50.04%)
        4419821812      branches                  # 667207369.714 M/sec               ( +-  0.03% )  (66.69%)
         121741669      branch-misses             #    2.75% of all branches          ( +-  0.28% )  (83.34%)

            6.6283 +- 0.0358 seconds time elapsed  ( +-  0.54% )
```

patch, with reclustering but without filtering (i.e. outputting all the stable *and* unstable clusters) {F8303586}
```
$ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-all.html
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 43970 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-new-all.html'
...
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 43970 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-new-all.html'

 Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-all.html' (25 runs):

           6475.29 msec task-clock                #    0.999 CPUs utilized            ( +-  0.31% )
               213      context-switches          #   32.952 M/sec                    ( +- 23.81% )
                 1      cpu-migrations            #    0.130 M/sec                    ( +- 43.84% )
             31287      page-faults               # 4832.057 M/sec                    ( +-  0.08% )
       25939086577      cycles                    # 4006160.279 GHz                   ( +-  0.31% )  (83.31%)
        1958812858      stalled-cycles-frontend   #    7.55% frontend cycles idle     ( +-  0.68% )  (83.32%)
       13218961512      stalled-cycles-backend    #   50.96% backend cycles idle      ( +-  0.29% )  (33.37%)
       19752995402      instructions              #    0.76  insn per cycle
                                                  #    0.67  stalled cycles per insn  ( +-  0.04% )  (50.04%)
        4417079244      branches                  # 682195472.305 M/sec               ( +-  0.03% )  (66.70%)
         121510065      branch-misses             #    2.75% of all branches          ( +-  0.19% )  (83.34%)

            6.4832 +- 0.0229 seconds time elapsed  ( +-  0.35% )
```
Funnily, *this* measurement shows that said reclustering actually improved performance.

patch, with reclustering, only the stable clusters {F8303594}
```
$ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-stable.html
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 43970 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-new-stable.html'
...
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 43970 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-new-stable.html'

 Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-stable.html' (25 runs):

           6387.71 msec task-clock                #    0.999 CPUs utilized            ( +-  0.13% )
               133      context-switches          #   20.792 M/sec                    ( +- 23.39% )
                 0      cpu-migrations            #    0.063 M/sec                    ( +- 61.24% )
             31318      page-faults               # 4903.256 M/sec                    ( +-  0.08% )
       25591984967      cycles                    # 4006786.266 GHz                   ( +-  0.13% )  (83.31%)
        1881234904      stalled-cycles-frontend   #    7.35% frontend cycles idle     ( +-  0.25% )  (83.33%)
       13209749965      stalled-cycles-backend    #   51.62% backend cycles idle      ( +-  0.16% )  (33.36%)
       19767554347      instructions              #    0.77  insn per cycle
                                                  #    0.67  stalled cycles per insn  ( +-  0.04% )  (50.03%)
        4417480305      branches                  # 691618858.046 M/sec               ( +-  0.03% )  (66.68%)
         118676358      branch-misses             #    2.69% of all branches          ( +-  0.07% )  (83.33%)

            6.3954 +- 0.0118 seconds time elapsed  ( +-  0.18% )
```
Performance improved even further?! Makes sense i guess, less clusters to print.

patch, with reclustering, only the unstable clusters {F8303601}
```
$ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-unstable.html -analysis-display-unstable-clusters
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 43970 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-new-unstable.html'
...
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 43970 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-new-unstable.html'

 Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-unstable.html -analysis-display-unstable-clusters' (25 runs):

           6124.96 msec task-clock                #    1.000 CPUs utilized            ( +-  0.20% )
               194      context-switches          #   31.709 M/sec                    ( +- 20.46% )
                 0      cpu-migrations            #    0.039 M/sec                    ( +- 49.77% )
             31413      page-faults               # 5129.261 M/sec                    ( +-  0.06% )
       24536794267      cycles                    # 4006425.858 GHz                   ( +-  0.19% )  (83.31%)
        1676085087      stalled-cycles-frontend   #    6.83% frontend cycles idle     ( +-  0.46% )  (83.32%)
       13035595603      stalled-cycles-backend    #   53.13% backend cycles idle      ( +-  0.16% )  (33.36%)
       18260877653      instructions              #    0.74  insn per cycle
                                                  #    0.71  stalled cycles per insn  ( +-  0.05% )  (50.03%)
        4112411983      branches                  # 671484364.603 M/sec               ( +-  0.03% )  (66.68%)
         114066929      branch-misses             #    2.77% of all branches          ( +-  0.11% )  (83.32%)

            6.1278 +- 0.0121 seconds time elapsed  ( +-  0.20% )
```
This tells us that the actual `-analysis-inconsistencies-output-file=` outputting only takes ~0.4 sec for 43970 benchmark points (3 whole sweeps)
(Also, wow this is fast, it used to take several minutes originally)

Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=40715 | PR40715 ]].

Reviewers: courbet, gchatelet

Reviewed By: courbet

Subscribers: tschuett, jdoerfert, llvm-commits, RKSimon

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58355

llvm-svn: 354441
  • Loading branch information
LebedevRI committed Feb 20, 2019
1 parent 2d6bb13 commit 6971639
Show file tree
Hide file tree
Showing 8 changed files with 241 additions and 16 deletions.
7 changes: 7 additions & 0 deletions llvm/docs/CommandGuide/llvm-exegesis.rst
Expand Up @@ -224,6 +224,13 @@ OPTIONS
Specify the numPoints parameters to be used for DBSCAN clustering
(`analysis` mode).

.. option:: -analysis-display-unstable-clusters

If there is more than one benchmark for an opcode, said benchmarks may end up
not being clustered into the same cluster if the measured performance
characteristics are different. by default all such opcodes are filtered out.
This flag will instead show only such unstable opcodes.

.. option:: -ignore-invalid-sched-class=false

If set, ignore instructions that do not have a sched class (class idx = 0).
Expand Down
@@ -0,0 +1,82 @@
# RUN: llvm-exegesis -mode=analysis -benchmarks-file=%s -analysis-clusters-output-file=- -analysis-epsilon=0.1 -analysis-numpoints=1 | FileCheck -check-prefixes=CHECK-CLUSTERS %s
# RUN: llvm-exegesis -mode=analysis -benchmarks-file=%s -analysis-inconsistencies-output-file=- -analysis-epsilon=0.5 -analysis-numpoints=1 | FileCheck -check-prefixes=CHECK-INCONSISTENCIES-ALL,CHECK-INCONSISTENCIES-STABLE %s
# RUN: llvm-exegesis -mode=analysis -benchmarks-file=%s -analysis-inconsistencies-output-file=- -analysis-epsilon=0.5 -analysis-display-unstable-clusters -analysis-numpoints=1 | FileCheck -check-prefixes=CHECK-INCONSISTENCIES-ALL,CHECK-INCONSISTENCIES-UNSTABLE %s

# We have one ADD32rr measurement, and two measurements for SQRTSSr.
# The ADD32rr measurement and one of the SQRTSSr measurements are identical,
# and thus will be be in the same cluster. But the second SQRTSSr measurement
# is different from the first SQRTSSr measurement, and thus it will be in it's
# own cluster. We do reclusterization, and thus since there is more than one
# measurement from SQRTSSr, and they are not in the same cluster, we move
# all two SQRTSSr measurements into their own cluster, and mark it as unstable.
# By default, we do not show such unstable clusters.
# If told to show, we *only* show such unstable clusters.

# CHECK-CLUSTERS: {{^}}cluster_id,opcode_name,config,sched_class,latency{{$}}
# CHECK-CLUSTERS-NEXT: {{^}}0,
# CHECK-CLUSTERS-SAME: ,90.00{{$}}
# CHECK-CLUSTERS: {{^}}3,
# CHECK-CLUSTERS-SAME: ,90.11{{$}}
# CHECK-CLUSTERS-NEXT: {{^}}3,
# CHECK-CLUSTERS-SAME: ,100.00{{$}}

# CHECK-INCONSISTENCIES-STABLE: ADD32rr
# CHECK-INCONSISTENCIES-STABLE-NOT: ADD32rr
# CHECK-INCONSISTENCIES-STABLE-NOT: SQRTSSr

# CHECK-INCONSISTENCIES-UNSTABLE: SQRTSSr
# CHECK-INCONSISTENCIES-UNSTABLE: SQRTSSr
# CHECK-INCONSISTENCIES-UNSTABLE-NOT: SQRTSSr
# CHECK-INCONSISTENCIES-UNSTABLE-NOT: ADD32rr

---
mode: latency
key:
instructions:
- 'ADD32rr EDX EDX EAX'
config: ''
register_initial_values:
- 'EDX=0x0'
- 'EAX=0x0'
cpu_name: bdver2
llvm_triple: x86_64-unknown-linux-gnu
num_repetitions: 10000
measurements:
- { key: latency, value: 90.0000, per_snippet_value: 90.0000 }
error: ''
info: Repeating a single implicitly serial instruction
assembled_snippet: BA00000000B80000000001C201C201C201C201C201C201C201C201C201C201C201C201C201C201C201C2C3
---
mode: latency
key:
instructions:
- 'SQRTSSr XMM11 XMM11'
config: ''
register_initial_values:
- 'XMM11=0x0'
cpu_name: bdver2
llvm_triple: x86_64-unknown-linux-gnu
num_repetitions: 10000
measurements:
- { key: latency, value: 90.1111, per_snippet_value: 90.1111 }
error: ''
info: Repeating a single explicitly serial instruction
assembled_snippet: 4883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C57A6F1C244883C410F3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBC3
...
---
mode: latency
key:
instructions:
- 'SQRTSSr XMM11 XMM11'
config: ''
register_initial_values:
- 'XMM11=0x0'
cpu_name: bdver2
llvm_triple: x86_64-unknown-linux-gnu
num_repetitions: 10000
measurements:
- { key: latency, value: 100, per_snippet_value: 100 }
error: ''
info: Repeating a single explicitly serial instruction
assembled_snippet: 4883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C57A6F1C244883C410F3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBC3
...
14 changes: 9 additions & 5 deletions llvm/tools/llvm-exegesis/lib/Analysis.cpp
Expand Up @@ -149,7 +149,7 @@ void Analysis::printInstructionRowCsv(const size_t PointId,
writeEscaped<kEscapeCsv>(OS, Point.Key.Config);
OS << kCsvSep;
assert(!Point.Key.Instructions.empty());
const llvm::MCInst &MCI = Point.Key.Instructions[0];
const llvm::MCInst &MCI = Point.keyInstruction();
const unsigned SchedClassId = resolveSchedClassId(
*SubtargetInfo_, InstrInfo_->get(MCI.getOpcode()).getSchedClass(), MCI);

Expand All @@ -168,13 +168,15 @@ void Analysis::printInstructionRowCsv(const size_t PointId,
}

Analysis::Analysis(const llvm::Target &Target,
const InstructionBenchmarkClustering &Clustering)
: Clustering_(Clustering) {
std::unique_ptr<llvm::MCInstrInfo> InstrInfo,
const InstructionBenchmarkClustering &Clustering,
bool AnalysisDisplayUnstableOpcodes)
: Clustering_(Clustering), InstrInfo_(std::move(InstrInfo)),
AnalysisDisplayUnstableOpcodes_(AnalysisDisplayUnstableOpcodes) {
if (Clustering.getPoints().empty())
return;

const InstructionBenchmark &FirstPoint = Clustering.getPoints().front();
InstrInfo_.reset(Target.createMCInstrInfo());
RegInfo_.reset(Target.createMCRegInfo(FirstPoint.LLVMTriple));
AsmInfo_.reset(Target.createMCAsmInfo(*RegInfo_, FirstPoint.LLVMTriple));
SubtargetInfo_.reset(Target.createMCSubtargetInfo(FirstPoint.LLVMTriple,
Expand Down Expand Up @@ -233,7 +235,7 @@ Analysis::makePointsPerSchedClass() const {
assert(!Point.Key.Instructions.empty());
// FIXME: we should be using the tuple of classes for instructions in the
// snippet as key.
const llvm::MCInst &MCI = Point.Key.Instructions[0];
const llvm::MCInst &MCI = Point.keyInstruction();
unsigned SchedClassId = InstrInfo_->get(MCI.getOpcode()).getSchedClass();
const bool WasVariant = SchedClassId && SubtargetInfo_->getSchedModel()
.getSchedClassDesc(SchedClassId)
Expand Down Expand Up @@ -668,6 +670,8 @@ llvm::Error Analysis::run<Analysis::PrintSchedClassInconsistencies>(
const auto &ClusterId = Clustering_.getClusterIdForPoint(PointId);
if (!ClusterId.isValid())
continue; // Ignore noise and errors. FIXME: take noise into account ?
if (ClusterId.isUnstable() ^ AnalysisDisplayUnstableOpcodes_)
continue; // Either display stable or unstable clusters only.
auto SchedClassClusterIt =
std::find_if(SchedClassClusters.begin(), SchedClassClusters.end(),
[ClusterId](const SchedClassCluster &C) {
Expand Down
5 changes: 4 additions & 1 deletion llvm/tools/llvm-exegesis/lib/Analysis.h
Expand Up @@ -36,7 +36,9 @@ namespace exegesis {
class Analysis {
public:
Analysis(const llvm::Target &Target,
const InstructionBenchmarkClustering &Clustering);
std::unique_ptr<llvm::MCInstrInfo> InstrInfo,
const InstructionBenchmarkClustering &Clustering,
bool AnalysisDisplayUnstableOpcodes);

// Prints a csv of instructions for each cluster.
struct PrintClusters {};
Expand Down Expand Up @@ -125,6 +127,7 @@ class Analysis {
std::unique_ptr<llvm::MCAsmInfo> AsmInfo_;
std::unique_ptr<llvm::MCInstPrinter> InstPrinter_;
std::unique_ptr<llvm::MCDisassembler> Disasm_;
const bool AnalysisDisplayUnstableOpcodes_;
};

// Computes the idealized ProcRes Unit pressure. This is the expected
Expand Down
2 changes: 2 additions & 0 deletions llvm/tools/llvm-exegesis/lib/BenchmarkResult.h
Expand Up @@ -61,6 +61,8 @@ struct InstructionBenchmark {
ModeE Mode;
std::string CpuName;
std::string LLVMTriple;
// Which instruction is being benchmarked here?
const llvm::MCInst &keyInstruction() const { return Key.Instructions[0]; }
// The number of instructions inside the repeated snippet. For example, if a
// snippet of 3 instructions is repeated 4 times, this is 12.
int NumRepetitions = 0;
Expand Down
101 changes: 100 additions & 1 deletion llvm/tools/llvm-exegesis/lib/Clustering.cpp
Expand Up @@ -8,8 +8,11 @@

#include "Clustering.h"
#include "llvm/ADT/SetVector.h"
#include "llvm/ADT/SmallSet.h"
#include "llvm/ADT/SmallVector.h"
#include <algorithm>
#include <string>
#include <vector>

namespace llvm {
namespace exegesis {
Expand Down Expand Up @@ -147,10 +150,102 @@ void InstructionBenchmarkClustering::dbScan(const size_t MinPts) {
}
}

// Given an instruction Opcode, we can make benchmarks (measurements) of the
// instruction characteristics/performance. Then, to facilitate further analysis
// we group the benchmarks with *similar* characteristics into clusters.
// Now, this is all not entirely deterministic. Some instructions have variable
// characteristics, depending on their arguments. And thus, if we do several
// benchmarks of the same instruction Opcode, we may end up with *different*
// performance characteristics measurements. And when we then do clustering,
// these several benchmarks of the same instruction Opcode may end up being
// clustered into *different* clusters. This is not great for further analysis.
// We shall find every opcode with benchmarks not in just one cluster, and move
// *all* the benchmarks of said Opcode into one new unstable cluster per Opcode.
void InstructionBenchmarkClustering::stabilize(unsigned NumOpcodes) {
// Given an instruction Opcode, in which clusters do benchmarks of this
// instruction lie? Normally, they all should be in the same cluster.
std::vector<llvm::SmallSet<ClusterId, 1>> OpcodeToClusterIDs;
OpcodeToClusterIDs.resize(NumOpcodes);
// The list of opcodes that have more than one cluster.
llvm::SetVector<size_t> UnstableOpcodes;
// Populate OpcodeToClusterIDs and UnstableOpcodes data structures.
assert(ClusterIdForPoint_.size() == Points_.size() && "size mismatch");
for (const auto &Point : zip(Points_, ClusterIdForPoint_)) {
const ClusterId &ClusterIdOfPoint = std::get<1>(Point);
if (!ClusterIdOfPoint.isValid())
continue; // Only process fully valid clusters.
const unsigned Opcode = std::get<0>(Point).keyInstruction().getOpcode();
assert(Opcode < NumOpcodes && "NumOpcodes is incorrect (too small)");
llvm::SmallSet<ClusterId, 1> &ClusterIDsOfOpcode =
OpcodeToClusterIDs[Opcode];
ClusterIDsOfOpcode.insert(ClusterIdOfPoint);
// Is there more than one ClusterID for this opcode?.
if (ClusterIDsOfOpcode.size() < 2)
continue; // If not, then at this moment this Opcode is stable.
// Else let's record this unstable opcode for future use.
UnstableOpcodes.insert(Opcode);
}
assert(OpcodeToClusterIDs.size() == NumOpcodes && "sanity check");

// We know with how many [new] clusters we will end up with.
const auto NewTotalClusterCount = Clusters_.size() + UnstableOpcodes.size();
Clusters_.reserve(NewTotalClusterCount);
for (const size_t UnstableOpcode : UnstableOpcodes.getArrayRef()) {
const llvm::SmallSet<ClusterId, 1> &ClusterIDs =
OpcodeToClusterIDs[UnstableOpcode];
assert(ClusterIDs.size() > 1 &&
"Should only have Opcodes with more than one cluster.");

// Create a new unstable cluster, one per Opcode.
Clusters_.emplace_back(ClusterId::makeValidUnstable(Clusters_.size()));
Cluster &UnstableCluster = Clusters_.back();
// We will find *at least* one point in each of these clusters.
UnstableCluster.PointIndices.reserve(ClusterIDs.size());

// Go through every cluster which we recorded as containing benchmarks
// of this UnstableOpcode. NOTE: we only recorded valid clusters.
for (const ClusterId &CID : ClusterIDs) {
assert(CID.isValid() &&
"We only recorded valid clusters, not noise/error clusters.");
Cluster &OldCluster = Clusters_[CID.getId()]; // Valid clusters storage.
// Within each cluster, go through each point, and either move it to the
// new unstable cluster, or 'keep' it.
// In this case, we'll reshuffle OldCluster.PointIndices vector
// so that all the points that are *not* for UnstableOpcode are first,
// and the rest of the points is for the UnstableOpcode.
const auto it = std::stable_partition(
OldCluster.PointIndices.begin(), OldCluster.PointIndices.end(),
[this, UnstableOpcode](size_t P) {
return Points_[P].keyInstruction().getOpcode() != UnstableOpcode;
});
assert(std::distance(it, OldCluster.PointIndices.end()) > 0 &&
"Should have found at least one bad point");
// Mark to-be-moved points as belonging to the new cluster.
std::for_each(it, OldCluster.PointIndices.end(),
[this, &UnstableCluster](size_t P) {
ClusterIdForPoint_[P] = UnstableCluster.Id;
});
// Actually append to-be-moved points to the new cluster.
UnstableCluster.PointIndices.insert(UnstableCluster.PointIndices.cend(),
it, OldCluster.PointIndices.end());
// And finally, remove "to-be-moved" points form the old cluster.
OldCluster.PointIndices.erase(it, OldCluster.PointIndices.cend());
// Now, the old cluster may end up being empty, but let's just keep it
// in whatever state it ended up. Purging empty clusters isn't worth it.
};
assert(UnstableCluster.PointIndices.size() > 1 &&
"New unstable cluster should end up with more than one point.");
assert(UnstableCluster.PointIndices.size() >= ClusterIDs.size() &&
"New unstable cluster should end up with no less points than there "
"was clusters");
}
assert(Clusters_.size() == NewTotalClusterCount && "sanity check");
}

llvm::Expected<InstructionBenchmarkClustering>
InstructionBenchmarkClustering::create(
const std::vector<InstructionBenchmark> &Points, const size_t MinPts,
const double Epsilon) {
const double Epsilon, llvm::Optional<unsigned> NumOpcodes) {
InstructionBenchmarkClustering Clustering(Points, Epsilon * Epsilon);
if (auto Error = Clustering.validateAndSetup()) {
return std::move(Error);
Expand All @@ -160,6 +255,10 @@ InstructionBenchmarkClustering::create(
}

Clustering.dbScan(MinPts);

if (NumOpcodes.hasValue())
Clustering.stabilize(NumOpcodes.getValue());

return Clustering;
}

Expand Down
28 changes: 22 additions & 6 deletions llvm/tools/llvm-exegesis/lib/Clustering.h
Expand Up @@ -15,7 +15,9 @@
#define LLVM_TOOLS_LLVM_EXEGESIS_CLUSTERING_H

#include "BenchmarkResult.h"
#include "llvm/ADT/Optional.h"
#include "llvm/Support/Error.h"
#include <limits>
#include <vector>

namespace llvm {
Expand All @@ -27,21 +29,28 @@ class InstructionBenchmarkClustering {
// for more explanations on the algorithm.
static llvm::Expected<InstructionBenchmarkClustering>
create(const std::vector<InstructionBenchmark> &Points, size_t MinPts,
double Epsilon);
double Epsilon, llvm::Optional<unsigned> NumOpcodes = llvm::None);

class ClusterId {
public:
static ClusterId noise() { return ClusterId(kNoise); }
static ClusterId error() { return ClusterId(kError); }
static ClusterId makeValid(size_t Id) { return ClusterId(Id); }
ClusterId() : Id_(kUndef) {}
static ClusterId makeValidUnstable(size_t Id) {
return ClusterId(Id, /*IsUnstable=*/true);
}

ClusterId() : Id_(kUndef), IsUnstable_(false) {}

// Compare id's, ignoring the 'unstability' bit.
bool operator==(const ClusterId &O) const { return Id_ == O.Id_; }
bool operator<(const ClusterId &O) const { return Id_ < O.Id_; }

bool isValid() const { return Id_ <= kMaxValid; }
bool isUndef() const { return Id_ == kUndef; }
bool isUnstable() const { return IsUnstable_; }
bool isNoise() const { return Id_ == kNoise; }
bool isError() const { return Id_ == kError; }
bool isUndef() const { return Id_ == kUndef; }

// Precondition: isValid().
size_t getId() const {
Expand All @@ -50,14 +59,19 @@ class InstructionBenchmarkClustering {
}

private:
explicit ClusterId(size_t Id) : Id_(Id) {}
ClusterId(size_t Id, bool IsUnstable = false)
: Id_(Id), IsUnstable_(IsUnstable) {}

static constexpr const size_t kMaxValid =
std::numeric_limits<size_t>::max() - 4;
(std::numeric_limits<size_t>::max() >> 1) - 4;
static constexpr const size_t kNoise = kMaxValid + 1;
static constexpr const size_t kError = kMaxValid + 2;
static constexpr const size_t kUndef = kMaxValid + 3;
size_t Id_;

size_t Id_ : (std::numeric_limits<size_t>::digits - 1);
size_t IsUnstable_ : 1;
};
static_assert(sizeof(ClusterId) == sizeof(size_t), "should be a bit field.");

struct Cluster {
Cluster() = delete;
Expand Down Expand Up @@ -101,8 +115,10 @@ class InstructionBenchmarkClustering {
private:
InstructionBenchmarkClustering(
const std::vector<InstructionBenchmark> &Points, double EpsilonSquared);

llvm::Error validateAndSetup();
void dbScan(size_t MinPts);
void stabilize(unsigned NumOpcodes);
void rangeQuery(size_t Q, std::vector<size_t> &Scratchpad) const;

const std::vector<InstructionBenchmark> &Points_;
Expand Down

0 comments on commit 6971639

Please sign in to comment.