Contains sources of Gematria, a framework for machine learning on machine code. It includes implementations of the GRANITE model and the Ithemal hierarchical LSTM model for learning inverse throughput of basic blocks.
Our models are built on top of TensorFlow 2.x (using the TensorFlow 1.x compatibility layer) in a mix of C++ and Python. Most of the training code is written in Python; we use C++ for the more demanding parts of the code like graph construction. We use pybind11 to make C++ APIs available in Python.
Basic requirements that need to be installed before starting:
- Bazel 6.0 or newer.
- A C++ compiler supported by Bazel that compiles C++17. Recent versions of GCC and Clang on Linux both fit the bill.
- Python 3.10 or newer.
- Git.
- PIP.
Additional dependencies, including TensorFlow, Protocol buffers, and different
Python libraries are installed through PIP and through Bazel's WORKSPACE
file.
We strongly recommend using
virtualenv to install Python packages to
avoid dependency version conflicts with other libraries.
# Get the source code.
$ git clone https://github.com/google/gematria.git
$ cd gematria
# Set up virtualenv.
$ pip install virtualenv
$ virtualenv env
$ . env/bin/activate
# Install Python dependencies.
$ pip install -r requirements.in
# Build the project, run tests, ...
$ bazel build ...
$ bazel test ...
A subset of the project, consisting of tools and libraries we eventually plan to merge in the LLVM monorepo, are built with cmake. The requirements are inherited from LLVM, as we use LLVM's "external project" mechanism to build.
First, build TFLite. In addition to the requirements above, see also these prerequisites, noting the reference to the buildbot script which lists additional packages.
Then:
mkdir /tmp/tflite && cd /tmp/tflite
curl https://raw.githubusercontent.com/google/ml-compiler-opt/main/buildbot/build_tflite.sh | bash
This should produce a /tmp/tflite/tflite.cmake
.
cd ${GEMATRIA_SRC}
mkdir cmake-build && cd cmake-build
cmake -GNinja -DCMAKE_BUILD_TYPE=Release \
-C /tmp/tflite/tflite.cmake \
${LLVM_PROJECT_SRC}/llvm \
-DLLVM_EXTERNAL_PROJECTS=gematria \
-DLLVM_EXTERNAL_GEMATRIA_SOURCE_DIR=${GEMATRIA_SRC}
ninja llvm-granite llvm-cm
Where LLVM_PROJECT_SRC
is the absolute path to your local llvm repo, and
GEMATRIA_SRC
the path to this (the gematria) repo.
To run the llvm-cm
tests, you can run the following target:
ninja check-llvm-tools-llvm-cm
We develop and test our code on Linux and x86-64, and we test it on Mac OS X and ARM. While we did not test it, we expect it to work with minimal changes also on other architectures and platforms that run TensorFlow.
See the training guide and guides for Python inference and C++ inference.
See the separate document.
- Issue tracker: https://github.com/google/Gematria/issues
We welcome patches -- see CONTRIBUTING for more information on how to submit a patch.
@inproceedings{granite:iiswc:2022,
author = {O. Sykora and P. Phothilimthana and C. Mendis and A. Yazdanbakhsh},
booktitle = {2022 IEEE International Symposium on Workload Characterization (IISWC)},
title = {{GRANITE: A Graph Neural Network Model for Basic Block Throughput Estimation}},
year = {2022},
}