Skip to content

Conversation

@svkeerthy
Copy link
Contributor

No description provided.

Copy link
Contributor Author

svkeerthy commented Oct 17, 2025

@svkeerthy svkeerthy changed the title Update MLGO Doc [MLGO] Add MIR2Vec embedding framework documentation Oct 17, 2025
@svkeerthy svkeerthy marked this pull request as ready for review October 17, 2025 23:43
@llvmbot llvmbot added the mlgo label Oct 17, 2025
@llvmbot
Copy link
Member

llvmbot commented Oct 17, 2025

@llvm/pr-subscribers-mlgo

Author: S. VenkataKeerthy (svkeerthy)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/164033.diff

1 Files Affected:

  • (modified) llvm/docs/MLGO.rst (+140-4)
diff --git a/llvm/docs/MLGO.rst b/llvm/docs/MLGO.rst
index bf3de11a2640e..2443835ea2fff 100644
--- a/llvm/docs/MLGO.rst
+++ b/llvm/docs/MLGO.rst
@@ -434,8 +434,27 @@ The latter is also used in tests.
 There is no C++ implementation of a log reader. We do not have a scenario
 motivating one.
 
-IR2Vec Embeddings
-=================
+Embeddings
+==========
+
+LLVM provides embedding frameworks to generate vector representations of code
+at different abstraction levels. These embeddings capture syntactic, semantic,
+and structural properties of the code and can be used as features for machine
+learning models in various compiler optimization tasks.
+
+Two embedding frameworks are available:
+
+- **IR2Vec**: Generates embeddings for LLVM IR
+- **MIR2Vec**: Generates embeddings for Machine IR
+
+Both frameworks follow a similar architecture with vocabulary-based embedding
+generation, where a vocabulary maps code entities to n-dimensional floating
+point vectors. These embeddings can be computed at multiple granularity levels
+(instruction, basic block, and function) and used for ML-guided compiler
+optimizations.
+
+IR2Vec
+------
 
 IR2Vec is a program embedding approach designed specifically for LLVM IR. It
 is implemented as a function analysis pass in LLVM. The IR2Vec embeddings
@@ -466,7 +485,7 @@ The core components are:
     compute embeddings for instructions, basic blocks, and functions.
 
 Using IR2Vec
-------------
+^^^^^^^^^^^^
 
 .. note::
 
@@ -526,7 +545,7 @@ embeddings can be computed and accessed via an ``ir2vec::Embedder`` instance.
    between different code snippets, or perform other analyses as needed.
 
 Further Details
----------------
+^^^^^^^^^^^^^^^
 
 For more detailed information about the IR2Vec algorithm, its parameters, and
 advanced usage, please refer to the original paper:
@@ -538,6 +557,123 @@ triplets from LLVM IR, see :doc:`CommandGuide/llvm-ir2vec`.
 The LLVM source code for ``IR2Vec`` can also be explored to understand the 
 implementation details.
 
+MIR2Vec
+-------
+
+MIR2Vec is an extension of IR2Vec designed specifically for LLVM Machine IR 
+(MIR). It generates embeddings for machine-level instructions, basic blocks, 
+and functions. MIR2Vec operates on the target-specific machine representation,
+capturing machine instruction semantics including opcodes, operands, and 
+register information at the machine level.
+
+MIR2Vec extends the vocabulary to include:
+
+- **Machine Opcodes**: Target-specific instruction opcodes derived from the
+  TargetInstrInfo, grouped by instruction semantics.
+
+- **Common Operands**: All common operand types (excluding register operands),
+  defined by the ``MachineOperand::MachineOperandType`` enum.
+
+- **Physical Register Classes**: Register classes defined by the target,
+  specialized for physical registers.
+
+- **Virtual Register Classes**: Register classes defined by the target,
+  specialized for virtual registers.
+
+The core components are:
+
+- **Vocabulary**: A mapping from machine IR entities (opcodes, operands, register
+  classes) to their vector representations. This is managed by 
+  ``MIR2VecVocabLegacyAnalysis`` for the legacy pass manager, with a 
+  ``MIR2VecVocabProvider`` that can be used standalone or wrapped by pass 
+  managers. The vocabulary (.json file) contains sections for opcodes, common 
+  operands, physical register classes, and virtual register classes.
+
+  .. note::
+    
+    The vocabulary file should contain these sections for it to be valid.
+
+- **Embedder**: A class (``mir2vec::MIREmbedder``) that uses the vocabulary to
+  compute embeddings for machine instructions, machine basic blocks, and 
+  machine functions. Currently, ``SymbolicMIREmbedder`` is the available 
+  implementation.
+
+Using MIR2Vec
+^^^^^^^^^^^^^
+
+.. note::
+
+   This section describes how to use MIR2Vec within LLVM passes. `llvm-ir2vec`
+   tool ` :doc:`CommandGuide/llvm-ir2vec` can be used for generating MIR2Vec
+   embeddings from Machine IR files (.mir), which can be useful for generating
+   embeddings outside of compiler passes.
+
+To generate MIR2Vec embeddings in a compiler pass, first obtain the vocabulary,
+then create an embedder instance to compute and access embeddings.
+
+1. **Get the Vocabulary**:
+   In a MachineFunctionPass, get the vocabulary from the analysis:
+
+   .. code-block:: c++
+
+      auto &VocabAnalysis = getAnalysis<MIR2VecVocabLegacyAnalysis>();
+      auto VocabOrErr = VocabAnalysis.getMIR2VecVocabulary(*MF.getFunction().getParent());
+      if (!VocabOrErr) {
+        // Handle error: vocabulary is not available or invalid
+        return;
+      }
+      const mir2vec::MIRVocabulary &Vocabulary = *VocabOrErr;
+
+   Note that ``MIR2VecVocabLegacyAnalysis`` is an immutable pass.
+
+2. **Create Embedder instance**:
+   With the vocabulary, create an embedder for a specific machine function:
+
+   .. code-block:: c++
+
+      // Assuming MF is a MachineFunction&
+      // For example, using MIR2VecKind::Symbolic:
+      std::unique_ptr<mir2vec::MIREmbedder> Emb =
+          mir2vec::MIREmbedder::create(MIR2VecKind::Symbolic, MF, Vocabulary);
+
+
+3. **Compute and Access Embeddings**:
+   Call ``getMFunctionVector()`` to get the embedding for the machine function.
+
+   .. code-block:: c++
+
+    mir2vec::Embedding FuncVector = Emb->getMFunctionVector();
+
+   Currently, ``MIREmbedder`` can generate embeddings at three levels: Machine
+   Instructions, Machine Basic Blocks, and Machine Functions. Appropriate 
+   getters are provided to access the embeddings at these levels.
+
+   .. note::
+
+    The validity of the ``MIREmbedder`` instance (and the embeddings it 
+    generates) is tied to the machine function it is associated with. If the 
+    machine function is modified, the embeddings may become stale and should 
+    be recomputed accordingly.
+
+4. **Working with Embeddings:**
+   Embeddings are represented as ``std::vector<double>``. These vectors can be
+   used as features for machine learning models, compute similarity scores
+   between different code snippets, or perform other analyses as needed.
+
+Further Details
+^^^^^^^^^^^^^^^
+
+For more detailed information about the MIR2Vec algorithm, its parameters, and
+advanced usage, please refer to the original paper:
+`RL4ReAl: Reinforcement Learning for Register Allocation <https://doi.org/10.1145/3578360.3580273>`_.
+
+For information about using MIR2Vec tool for generating embeddings from
+Machine IR, see :doc:`CommandGuide/llvm-ir2vec`.
+
+The LLVM source code for ``MIR2Vec`` can be explored to understand the 
+implementation details. See ``llvm/include/llvm/CodeGen/MIR2Vec.h`` and 
+``llvm/lib/CodeGen/MIR2Vec.cpp``.
+
 Building with ML support
 ========================
 

@svkeerthy svkeerthy changed the title [MLGO] Add MIR2Vec embedding framework documentation [MLGO] Add MIR2Vec embedding documentation Oct 20, 2025
@svkeerthy svkeerthy force-pushed the users/svkeerthy/10-17-use_colored_error_messages branch from af3e3dd to fd9e92d Compare October 20, 2025 22:35
@svkeerthy svkeerthy force-pushed the users/svkeerthy/10-17-update_mlgo_doc branch from 65fe880 to 2c5f2d3 Compare October 20, 2025 22:35
@svkeerthy svkeerthy force-pushed the users/svkeerthy/10-17-use_colored_error_messages branch from fd9e92d to fc95c26 Compare October 20, 2025 23:24
@svkeerthy svkeerthy force-pushed the users/svkeerthy/10-17-update_mlgo_doc branch from 2c5f2d3 to 869c0a3 Compare October 20, 2025 23:24
@svkeerthy svkeerthy force-pushed the users/svkeerthy/10-17-use_colored_error_messages branch from fc95c26 to a73d282 Compare October 20, 2025 23:48
@svkeerthy svkeerthy force-pushed the users/svkeerthy/10-17-update_mlgo_doc branch from 869c0a3 to 1cd5b76 Compare October 20, 2025 23:48
@svkeerthy svkeerthy force-pushed the users/svkeerthy/10-17-use_colored_error_messages branch from a73d282 to 52ca99b Compare October 21, 2025 00:23
@svkeerthy svkeerthy force-pushed the users/svkeerthy/10-17-update_mlgo_doc branch from 1cd5b76 to 3d9c8cd Compare October 21, 2025 00:23
@svkeerthy svkeerthy force-pushed the users/svkeerthy/10-17-use_colored_error_messages branch from 52ca99b to a3b210f Compare October 21, 2025 17:20
@svkeerthy svkeerthy force-pushed the users/svkeerthy/10-17-update_mlgo_doc branch 2 times, most recently from 4edfbb7 to d3ac741 Compare October 21, 2025 18:16
@svkeerthy svkeerthy force-pushed the users/svkeerthy/10-17-use_colored_error_messages branch from a3b210f to ae927bf Compare October 21, 2025 18:16
@svkeerthy svkeerthy force-pushed the users/svkeerthy/10-17-update_mlgo_doc branch from d3ac741 to 9b54ed5 Compare October 21, 2025 21:47
@svkeerthy svkeerthy force-pushed the users/svkeerthy/10-17-use_colored_error_messages branch from ae927bf to 5239e0b Compare October 21, 2025 21:47
@svkeerthy svkeerthy force-pushed the users/svkeerthy/10-17-update_mlgo_doc branch from 9b54ed5 to dcf282b Compare October 21, 2025 22:58
@svkeerthy svkeerthy force-pushed the users/svkeerthy/10-17-use_colored_error_messages branch 2 times, most recently from 0ebad73 to d059588 Compare October 22, 2025 00:13
@svkeerthy svkeerthy force-pushed the users/svkeerthy/10-17-update_mlgo_doc branch from dcf282b to 455f3f6 Compare October 22, 2025 00:13
@svkeerthy svkeerthy force-pushed the users/svkeerthy/10-17-use_colored_error_messages branch from d059588 to f2335a0 Compare October 22, 2025 18:01
@svkeerthy svkeerthy force-pushed the users/svkeerthy/10-17-update_mlgo_doc branch from 455f3f6 to 8fc9227 Compare October 22, 2025 18:01
@svkeerthy svkeerthy force-pushed the users/svkeerthy/10-17-use_colored_error_messages branch from f2335a0 to 85bc95c Compare October 22, 2025 18:01
@svkeerthy svkeerthy force-pushed the users/svkeerthy/10-17-update_mlgo_doc branch from 8fc9227 to dc8a7f5 Compare October 22, 2025 21:11
@svkeerthy svkeerthy force-pushed the users/svkeerthy/10-17-use_colored_error_messages branch 2 times, most recently from b61393a to bfad173 Compare October 22, 2025 21:53
@svkeerthy svkeerthy force-pushed the users/svkeerthy/10-17-update_mlgo_doc branch from dc8a7f5 to 4a866bc Compare October 22, 2025 21:53
@svkeerthy svkeerthy force-pushed the users/svkeerthy/10-17-use_colored_error_messages branch from bfad173 to 7cc2ec5 Compare October 22, 2025 22:27
@svkeerthy svkeerthy force-pushed the users/svkeerthy/10-17-update_mlgo_doc branch from 4a866bc to 71e0e55 Compare October 22, 2025 22:28
@svkeerthy svkeerthy force-pushed the users/svkeerthy/10-17-use_colored_error_messages branch from 7cc2ec5 to ad0555a Compare October 22, 2025 22:51
@svkeerthy svkeerthy force-pushed the users/svkeerthy/10-17-update_mlgo_doc branch from 71e0e55 to 101743e Compare October 22, 2025 22:51
Base automatically changed from users/svkeerthy/10-17-use_colored_error_messages to main October 22, 2025 23:24
@svkeerthy svkeerthy force-pushed the users/svkeerthy/10-17-update_mlgo_doc branch from 101743e to e6a125d Compare October 22, 2025 23:26
Copy link
Contributor Author

svkeerthy commented Oct 22, 2025

Merge activity

  • Oct 22, 11:31 PM UTC: A user started a stack merge that includes this pull request via Graphite.
  • Oct 22, 11:33 PM UTC: @svkeerthy merged this pull request with Graphite.

@svkeerthy svkeerthy merged commit 8a968d9 into main Oct 22, 2025
11 checks passed
@svkeerthy svkeerthy deleted the users/svkeerthy/10-17-update_mlgo_doc branch October 22, 2025 23:33
mikolaj-pirog pushed a commit to mikolaj-pirog/llvm-project that referenced this pull request Oct 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants