Skip to content

Commit

Permalink
Release v2.0.
Browse files Browse the repository at this point in the history
  • Loading branch information
gaomy3832 committed Mar 1, 2019
2 parents 5ac1bdc + fd4ce1a commit 5ba0fb1
Show file tree
Hide file tree
Showing 49 changed files with 8,498 additions and 138 deletions.
66 changes: 65 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,71 @@
List of major changes and improvements
======================================

## [Unreleased]
## [v1.6 -- v2.0] -- 2018-03-01

### Added

- Hardware models.

- Access forwarding.

- Buffer sharing scheme.
- Use `BufShrScheme` class to represent and calculate NoC transfers.

- Software models.

- Add `SchedulingConstraint` class to specify loop blocking and partitioning
constraints.
- Add lazily updated rules to allow refine constraint with previous
scheduling results at runtime.
- Add subclass `SchedulingConstraintLayerPipeline` for layer pipelining
constraints.

- Add `InterLayerPipeline`.
- Layers are organized into `PipelineSegment`, which are simultaneously
mapped on to the resource both spatially and temporally.
- Each layer in the segment has a 3-tuple scheduling index including
segment index, spatial index, and temporal index.
- Each layer in the segment has its resource allocation and scheduling
constraint.
- Use `PipelineSegmentTiming` to capture the timing relation of layers in
the segment.
- Specify maximum allowed execution time overhead due to layer pipelining
in `Option`.
- Specify maximum pipelining degree for layer pipelining in `Option`.

- Add layer pipelining optimizations.
- Ofmap forwarding: alternate layer loop ordering.
- Ifmap forwarding: sharing the same inputs from memory to multiple
regions.
- Support model weight pinning when no resource time-multiplexing.
- Allow disabling optimizations for layer pipelining to fall back to basic
pipelining techniques.


### Changed

- Hardware models.

- Allow data source/destination regions in `Resource` to be non-DATA type.

- Allow `NodeRegion` to be folded along the w dimension in a zig-zag manner.

- Software models.

- `LoopBlockingScheme` supports access forwarding and buffer sharing.

- `LoopBlockingScheme` supports remote node buffers as data regions (non-data
type data regions).

- `partition` unit number of hops calculation supports access forwarding and
buffer sharing.

- `DataLayout` supports closest-first forwarding data transfer for access
forwarding and buffer sharing.

- Refactor `NNDataflow` and `NNDataflowScheme` to incorporate inter-layer
pipelining.


## [v1.5 -- v1.6] -- 2018-01-31
Expand Down
48 changes: 39 additions & 9 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ Neural Network Dataflow Scheduling

This Python tool allows you to explore the energy-efficient dataflow scheduling
for neural networks (NNs), including array mapping, loop blocking and
reordering, and parallel partitioning.
reordering, and (coarse-grained) parallel processing within and across layers.

For hardware, we assume an Eyeriss-style NN accelerator [Chen16]_, i.e., a 2D
array of processing elements (PEs) with a local register file in each PE, and a
Expand All @@ -26,18 +26,27 @@ In software, we decouple the dataflow scheduling into three subproblems:
convolutions by blocking and reordering the nested loops. We support
exhaustive search over all blocking and reordering schemes [Yang16]_, and
analytical bypass solvers [Gao17]_.
- Partitioning, which partitions the NN computations for parallel processing.
We support batch partitioning, fmap partitioning, output partitioning, input
partitioning, and the combination between them (hybrid) [Gao17]_. We use
layer-wise greedy beam search.

See the details in our ASPLOS'17 paper [Gao17]_.
- Parallel processing, which partitions the NN computations across the multiple
tiled engines. We support both intra-layer and inter-layer parallelism. For
intra-layer, we support batch partitioning, fmap partitioning, output
partitioning, input partitioning, and the combination between them (hybrid)
[Gao17]_. We also explore various dataflow optimizations including access
forwarding and buffer sharing [Gao19]_. We use exhaustive search within each
layer. For inter-layer, we support spatial pipelining (inter-layer
pipelining) and temporal pipelining (time multiplexing without writing back
intermediate data) as well as their optimized scheduling [Gao19]_. We use
layer-wise greedy beam search across layers.

See the details in our ASPLOS'17 [Gao17]_ and ASPLOS'19 [Gao19]_ papers.

If you use this tool in your work, we kindly request that you reference our
paper(s) below, and send us a citation of your work.

- Gao et al., "TETRIS: Scalable and Efficient Neural Network Acceleration with
3D Memory", in ASPLOS, April 2017 [Gao17]_.
3D Memory", in ASPLOS, April 2017.

- Gao et al., "TANGRAM: Optimized Coarse-Grained Dataflow for Scalable NN
Accelerators", in ASPLOS. April 2019.


Install
Expand Down Expand Up @@ -102,6 +111,20 @@ Other options include:
layers, and output partitioning for FC layers.
- ``--batch-partitioning`` and ``--ifmap-partitioning``: whether the hybrid
partitioning also explores batch and input partitioning.
- ``--enable-access-forwarding``: access forwarding, where the nodes fetch
disjoint subsets of data and forward them to other nodes. See [Gao19]_.
- ``--enable-gbuf-sharing``: buffer sharing, where the global buffer capacity is
shared across nodes through NoC. See [Gao19]_.
- ``--enable-save-writeback``: allow to elide the intermediate data writeback to
memory when switching between layers if it is possible to store the entire
data set in on-chip buffers.
- ``--interlayer-partition``: whether to use inter-layer pipelining to
partition resources across multiple layers and process them simultaneously.
- ``--layer-pipeline-time-overhead``, ``--layer-pipeline-max-degree``:
constrain the configuration space of inter-layer pipelining, by specifying
the maximum execution time overhead, or the maximum pipelining degree.
- ``--disable-interlayer-opt``: disable optimizations and only allow basic
inter-layer pipelining.


Code Structure
Expand All @@ -115,7 +138,10 @@ Code Structure
- Array mapping: ``map_strategy``.
- Loop blocking and reordering: ``loop_blocking``,
``loop_blocking_scheme``, ``loop_blocking_solver``.
- Partitioning: ``partition``, ``partition_scheme``.
- Intra-layer partitioning: ``partition``, ``partition_scheme``,
``buf_shr_scheme``.
- Inter-layer pipelining: ``inter_layer_pipeline``,
``pipeline_segment``.
- Network and layer: ``network``, ``layer``.
- ``nns``: example NN definitions.
- ``tests``: unit tests.
Expand Down Expand Up @@ -156,6 +182,10 @@ with the Board of Trustees of Leland Stanford Junior University.
References
----------

.. [Gao19] Gao, Yang, Pu, Horowitz, and Kozyrakis, `TANGRAM: Optimized
Coarse-Grained Dataflow for Scalable NN Accelerators
<//dl.acm.org/citation.cfm?id=3297858.3304014>`__, in ASPLOS. April, 2019.
.. [Gao17] Gao, Pu, Yang, Horowitz, and Kozyrakis, `TETRIS: Scalable and
Efficient Neural Network Acceleration with 3D Memory
<//dl.acm.org/citation.cfm?id=3037697.3037702>`__, in ASPLOS. April, 2017.
Expand Down
2 changes: 1 addition & 1 deletion nn_dataflow/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,5 +13,5 @@
program. If not, see <https://opensource.org/licenses/BSD-3-Clause>.
"""

__version__ = '1.6'
__version__ = '2.0'

6 changes: 6 additions & 0 deletions nn_dataflow/core/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,11 +20,13 @@
from . import loop_enum as LoopEnum
from . import mem_hier_enum as MemHierEnum
from . import parallel_enum as ParallelEnum
from .buf_shr_scheme import BufShrScheme
from .cost import Cost
from .data_dim_loops import DataDimLoops
from .data_layout import DataLayout
from .fmap_range import FmapPosition, FmapRange, FmapRangeMap
from .int_range import IntRange
from .inter_layer_pipeline import InterLayerPipeline
from .layer import Layer, InputLayer, ConvLayer, FCLayer, \
LocalRegionLayer, PoolingLayer, EltwiseLayer
from .loop_blocking_scheme import LoopBlockingScheme
Expand All @@ -36,8 +38,12 @@
from .option import Option
from .partition_scheme import PartitionScheme
from .phy_dim2 import PhyDim2
from .pipeline_segment import PipelineSegment
from .pipeline_segment_timing import PipelineSegmentTiming
from .resource import Resource
from .scheduling import SchedulingCondition, SchedulingResult, Scheduling
from .scheduling_constraint import SchedulingConstraint, \
SchedulingConstraintLayerPipeline

from .nn_dataflow import NNDataflow

Loading

0 comments on commit 5ba0fb1

Please sign in to comment.