This code package contains algorithms (proof-of-concept implementation) and input files (profiled DNN models / workloads) from the paper "Piper: Multidimensional Planner for DNN Parallelization" published at NeurIPS 2021. It allows one to reproduce the results in the paper, as well as run the partitioning algorithms on other workloads.
All our algorithms take as input a JSON file with the following format (all fields are mandatory unless indicated otherwise). This format closely follows our model (see Section 3 "Problem Setup" in the paper):
maxMemoryPerDevice
(floating-point): a memory size limit of a single accelerator, in bytes,maxDevices
(integer): number of accelerators (k
from the paper),maxBatchSize
(integer): maximum number of microbatches in a batch (N
from the paper),bandwidth
(floating-point): bandwidth (from each device to the outside),nodes
(array): for each node (layer):id
(integer): unique ID of node,TMPCs
(dictionary): mapping from tensor-parallelism degree (t
) to an array of TMPCs, each having:id
(string): name,timePerSample
(floating-point): compute latency (backward+forward, quantityp
from the paper),parameterSize
(floating-point): size of weights (to be used in computing data-parallel resync costs, quantityw
from the paper),memoryUsageA
,memoryUsageB
(floating-point): memory usage coefficientsa
andb
(see paper),syncTimeFw
(dictionary): mapping from heads of outgoing edges to their parametersc^fw
(see paper),syncTimeBw
(dictionary): mapping from tails of incoming edges to their parametersc^bw
(see paper),
edges
(array): for each edge:sourceId
(integer): the ID of the tail of the edge (edge fromsourceId
todestId
),destId
(integer): the ID of the head of the edge,communicationCost
(floating-point): cost of transfer over this edge (in bytes).
Other debug information may be present in the input files, such as name
s on nodes.
The solution is implemented in algo.cpp
. It is a single C++ file (using one header-only library for JSON parsing) and can be compiled with a recent version of gcc
by running e.g. g++ -O3 algo.cpp -o algo.exe
.
The compiled program runs experiments from the paper - see main()
at the end of algo.cpp
.
It is possible to run only a subset of the evaluations by simply commenting out some lines in main()
.
The simplest mode of usage is shown in single()
.
The main example input file is inputs/bert32a100.json
.
Trademarks This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.
We use the JSON for Modern C++ library, copyright (c) 2013-2020 Niels Lohmann, licensed under the MIT license.