-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #338 from es-ude/develop
Develop
- Loading branch information
Showing
45 changed files
with
3,221 additions
and
64 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,73 @@ | ||
# Contribution Guide | ||
## Concepts | ||
The `elasticai.creator` aims to support | ||
1. the design and training of hardware optimization aware neural networks | ||
2. the translation of designs from 1. to a neural network accelerator in a hardware definition language | ||
The first point means that the network architecture, algorithms used during forward as well as backward | ||
propagation strongly depend on the targeted hardware implementation. | ||
Since the tool is aimed at researchers we want the translation process to be straight-forward and easy to reason about. | ||
Opposed to other tools (Apache TVM, FINN, etc.) we prefer flexible prototyping and handwritten | ||
hardware definitions over a wide range of supported architectures and platforms or highly scalable solutions. | ||
|
||
The code-base is composed out of the following packages | ||
- `file_generation`: | ||
- write files to paths on hard disk or to virtual paths (e.g., for testing purposes) | ||
- simple template definition | ||
- template writer/expander | ||
- `vhdl`: | ||
- helper functions to generate frequently used vhdl constructs | ||
- the `Design` interface to facilitate composition of hardware designs | ||
- basic vhdl design without a machine learning layer counterpart to be used as dependencies in other designs (e.g., rom modules) | ||
- additional vhdl designs to make the neural network accelerator accessible via the elasticai.runtime, also see [skeleton](./elasticai/creator/vhdl/system_integrations/README.md) | ||
- `base_modules`: | ||
- basic machine learning modules that are used as dependencies by translatable layers | ||
- `nn`: | ||
- package for public layer api; hosting translatable layers of different categories | ||
- layers within a subpackage of `nn`, e.g. `nn.fixed_point` are supposed to be compatible with each other | ||
|
||
## Adding a new translatable layer | ||
Adding a new layer involves three main tasks: | ||
1. define the new ml framework module, typically you want to inherit from `pytorch.nn.Module` and optionally use one | ||
of our layers from `base_module` | ||
- this specifies the forward and backward pass behavior of your layer | ||
2. define a corresponding `Design` class | ||
- this specifies | ||
- the hardware implementation (i.e., which files are written to where and what's their content) | ||
- the interface (`Port`) of the design, so we can automatically combine it with other designs | ||
- to help with the implementation, you can use the template system as well as the `elasticai.creator.vhdl.code_generation` modules | ||
3. define a trainable `DesignCreator`, typically inheriting from the class defined in 1. and implement the `create_design` method which | ||
a. extracts information from the module defined in 1. | ||
b. converts that information to native python types | ||
c. instantiates the corresponding design from 2. providing the necessary data from a. | ||
- this step might involve calling `create_design` on submodules and inject them into the design from 2. | ||
|
||
|
||
### Ports and automatically combining layers | ||
The algorithm for combining layers lives in `elasticai.creator.vhdl.auto_wire_protocols`. | ||
Currently, we support two types of interfaces: a) bufferless design, b) buffered design. | ||
|
||
b) a design that features its own buffer to store computation results and will fetch its input data from a previous buffer | ||
c) a design without buffer that processes data as a stream, this is assumed to be fast enough such that a buffered design can fetch its input data through a bufferless design | ||
|
||
The *autowiring algorithm* will take care of generating vhdl code to correctly connect a graph of buffered and bufferless designs. | ||
|
||
A bufferless design features the following signals: | ||
|
||
| name |direction | type | meaning | | ||
|------|----------|----------------|-------------------------------------------------| | ||
| x | in |std_logic_vector| input data for this layer | | ||
| y | out |std_logic_vector| output data of this layer | | ||
| clock| in |std_logic | clock signal, possibly shared with other layers | | ||
|
||
|
||
For a buffered design we define the following signals: | ||
|
||
| name |direction | type | meaning | | ||
|------|----------|----------------|-------------------------------------------------| | ||
| x | in |std_logic_vector| input data for this layer | | ||
| x_address | out | std_logic_vector | used by this layer to address the previous buffer and fetch data, we address per input data point (this typically corresponds to the number of input features) | | ||
| y | out |std_logic_vector| output data of this layer | | ||
| y_address | in | std_logic_vector | used by the following buffered layer to address this layers output buffer (connected to the following layers x_address). | | ||
| clock| in |std_logic | clock signal, possibly shared with other layers | | ||
|done | out | std_logic | set to "1" when computation is finished | | ||
|enable | in | std_logic | compute while set to "1" | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,7 @@ | ||
module.exports = { | ||
extends: ['@commitlint/config-conventional'], | ||
rules: { | ||
'type-enum': [2, "always", ['feat', 'fix', 'docs', 'style', 'refactor', 'revert', 'chore', 'wip', 'perf']] | ||
'type-enum': [2, "always", ['feat', 'fix', 'docs', 'style', 'refactor', 'revert', 'chore', 'wip', 'perf']], | ||
'header-max-length': [1, 'always', 100] | ||
} | ||
} |
Empty file.
Empty file.
19 changes: 19 additions & 0 deletions
19
elasticai/creator/nn/fixed_point/lstm/design/_common_imports.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
from functools import partial | ||
|
||
from elasticai.creator.file_generation.savable import Path | ||
from elasticai.creator.file_generation.template import ( | ||
InProjectTemplate, | ||
module_to_package, | ||
) | ||
from elasticai.creator.nn.fixed_point._two_complement_fixed_point_config import ( | ||
FixedPointConfig, | ||
) | ||
from elasticai.creator.nn.fixed_point.hard_sigmoid import HardSigmoid | ||
from elasticai.creator.nn.fixed_point.hard_tanh.design import HardTanh | ||
from elasticai.creator.nn.fixed_point.linear.design import Linear as FPLinear1d | ||
from elasticai.creator.vhdl.code_generation.addressable import calculate_address_width | ||
from elasticai.creator.vhdl.design import std_signals | ||
from elasticai.creator.vhdl.design.design import Design | ||
from elasticai.creator.vhdl.design.ports import Port | ||
from elasticai.creator.vhdl.design.signal import Signal | ||
from elasticai.creator.vhdl.shared_designs.rom import Rom |
112 changes: 112 additions & 0 deletions
112
elasticai/creator/nn/fixed_point/lstm/design/dual_port_2_clock_ram.tpl.vhd
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,112 @@ | ||
-- based on xilinx_simple_dual_port_2_clock_ram | ||
-- but we did some custom modifications(chao) | ||
-- Xilinx Simple Dual Port 2 Clock RAM | ||
-- This code implements a parameterizable SDP dual clock memory. | ||
-- If a reset or enable is not necessary, it may be tied off or removed from the code. | ||
|
||
library ieee; | ||
use ieee.std_logic_1164.all; | ||
|
||
|
||
library ieee; | ||
use ieee.std_logic_1164.all; | ||
use ieee.numeric_std.all; | ||
|
||
USE std.textio.all; | ||
|
||
entity dual_port_2_clock_ram_${name} is | ||
generic ( | ||
RAM_WIDTH : integer := 64; -- Specify RAM data width | ||
RAM_DEPTH_WIDTH : integer := 8; -- Specify RAM depth (number of entries) | ||
RAM_PERFORMANCE : string := "LOW_LATENCY"; -- Select "HIGH_PERFORMANCE" or "LOW_LATENCY" | ||
INIT_FILE : string := "" -- Specify name/location of RAM initialization file if using one (leave blank if not) | ||
); | ||
|
||
port ( | ||
addra : in std_logic_vector((RAM_DEPTH_WIDTH-1) downto 0); -- Write address bus, width determined from RAM_DEPTH | ||
addrb : in std_logic_vector((RAM_DEPTH_WIDTH-1) downto 0); -- Read address bus, width determined from RAM_DEPTH | ||
dina : in std_logic_vector(RAM_WIDTH-1 downto 0); -- RAM input data | ||
clka : in std_logic; -- Write Clock | ||
clkb : in std_logic; -- Read Clock | ||
wea : in std_logic; -- Write enable | ||
enb : in std_logic; -- RAM Enable, for additional power savings, disable port when not in use | ||
rstb : in std_logic; -- Output reset (does not affect memory contents) | ||
regceb: in std_logic; -- Output register enable | ||
doutb : out std_logic_vector(RAM_WIDTH-1 downto 0) -- RAM output data | ||
); | ||
|
||
end dual_port_2_clock_ram_${name}; | ||
|
||
architecture rtl of dual_port_2_clock_ram_${name} is | ||
|
||
constant C_RAM_WIDTH : integer := RAM_WIDTH; | ||
constant C_RAM_DEPTH : integer := 2**RAM_DEPTH_WIDTH; | ||
constant C_RAM_PERFORMANCE : string := RAM_PERFORMANCE; | ||
constant C_INIT_FILE : string := INIT_FILE; | ||
|
||
|
||
signal doutb_reg : std_logic_vector(C_RAM_WIDTH-1 downto 0) := (others => '0'); | ||
|
||
type ram_type is array (0 to C_RAM_DEPTH-1) of std_logic_vector (C_RAM_WIDTH-1 downto 0); -- 2D Array Declaration for RAM signal | ||
|
||
signal ram_data : std_logic_vector(C_RAM_WIDTH-1 downto 0) ; | ||
|
||
|
||
function init_from_file_or_zeroes(ramfile : string) return ram_type is | ||
begin | ||
-- if ramfile = "" then --if the file name is empty then init ram with 0 | ||
return (others => (others => '0')); | ||
-- else | ||
-- return InitRamFromFile(ramfile) ; | ||
-- end if; | ||
end; | ||
-- Following code defines RAM | ||
|
||
signal ram_name : ram_type := init_from_file_or_zeroes(C_INIT_FILE); | ||
|
||
begin | ||
|
||
process(clka) | ||
begin | ||
if(clka'event and clka = '1') then | ||
if(wea = '1') then | ||
ram_name(to_integer(unsigned(addra))) <= dina; | ||
end if; | ||
end if; | ||
end process; | ||
|
||
process(clkb) | ||
begin | ||
if(clkb'event and clkb = '1') then | ||
if(enb = '1') then | ||
ram_data <= ram_name(to_integer(unsigned(addrb))); | ||
end if; | ||
end if; | ||
end process; | ||
|
||
|
||
-- Following code generates LOW_LATENCY (no output register) | ||
-- Following is a 1 clock cycle read latency at the cost of a longer clock-to-out timing | ||
|
||
no_output_register : if C_RAM_PERFORMANCE = "LOW_LATENCY" generate | ||
doutb <= ram_data; | ||
end generate; | ||
|
||
-- Following code generates HIGH_PERFORMANCE (use output register) | ||
-- Following is a 2 clock cycle read latency with improved clock-to-out timing | ||
|
||
output_register : if C_RAM_PERFORMANCE = "HIGH_PERFORMANCE" generate | ||
process(clkb) | ||
begin | ||
if(clkb'event and clkb = '1') then | ||
if(rstb = '1') then | ||
doutb_reg <= (others => '0'); | ||
elsif(regceb = '1') then | ||
doutb_reg <= ram_data; | ||
end if; | ||
end if; | ||
end process; | ||
doutb <= doutb_reg; | ||
end generate; | ||
|
||
end rtl; |
166 changes: 166 additions & 0 deletions
166
elasticai/creator/nn/fixed_point/lstm/design/fp_lstm_cell.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,166 @@ | ||
from collections.abc import Iterable | ||
from functools import partial | ||
from typing import Any, cast | ||
|
||
import numpy as np | ||
|
||
from ._common_imports import ( | ||
Design, | ||
FixedPointConfig, | ||
InProjectTemplate, | ||
Path, | ||
Port, | ||
Rom, | ||
Signal, | ||
calculate_address_width, | ||
module_to_package, | ||
std_signals, | ||
) | ||
|
||
|
||
class FPLSTMCell(Design): | ||
def __init__( | ||
self, | ||
*, | ||
name: str, | ||
hardtanh: Design, | ||
hardsigmoid: Design, | ||
total_bits: int, | ||
frac_bits: int, | ||
w_ih: list[list[list[int]]], | ||
w_hh: list[list[list[int]]], | ||
b_ih: list[list[int]], | ||
b_hh: list[list[int]], | ||
) -> None: | ||
super().__init__(name=name) | ||
work_library_name: str = "work" | ||
|
||
self.input_size = len(w_ih[0]) | ||
self.hidden_size = len(w_ih) // 4 | ||
self.weights_ih = w_ih | ||
self.weights_hh = w_hh | ||
self.biases_ih = b_ih | ||
self.biases_hh = b_hh | ||
self._config = FixedPointConfig(total_bits=total_bits, frac_bits=frac_bits) | ||
self._htanh = hardtanh | ||
self._hsigmoid = hardsigmoid | ||
self._rom_base_names = ("wi", "wf", "wg", "wo", "bi", "bf", "bg", "bo") | ||
self._ram_base_name = f"dual_port_2_clock_ram_{self.name}" | ||
self._template = InProjectTemplate( | ||
package=module_to_package(self.__module__), | ||
file_name=f"{self.name}.tpl.vhd", | ||
parameters=dict( | ||
name=self.name, | ||
library=work_library_name, | ||
tanh_name=self._htanh.name, | ||
sigmoid_name=self._hsigmoid.name, | ||
data_width=str(total_bits), | ||
frac_width=str(frac_bits), | ||
input_size=str(self.input_size), | ||
hidden_size=str(self.hidden_size), | ||
x_h_addr_width=str( | ||
calculate_address_width(self.input_size + self.hidden_size) | ||
), | ||
hidden_addr_width=str(calculate_address_width(self.hidden_size)), | ||
w_addr_width=str( | ||
calculate_address_width( | ||
(self.input_size + self.hidden_size) * self.hidden_size | ||
) | ||
), | ||
), | ||
) | ||
|
||
@property | ||
def total_bits(self) -> int: | ||
return int(cast(str, self._template.parameters["data_width"])) | ||
|
||
@property | ||
def frac_bits(self) -> int: | ||
return int(cast(str, self._template.parameters["frac_width"])) | ||
|
||
@property | ||
def _hidden_addr_width(self) -> int: | ||
return int(cast(str, self._template.parameters["hidden_addr_width"])) | ||
|
||
@property | ||
def _weight_address_width(self) -> int: | ||
return int(cast(str, self._template.parameters["w_addr_width"])) | ||
|
||
@property | ||
def port(self) -> Port: | ||
ctrl_signal = partial(Signal, width=0) | ||
return Port( | ||
incoming=[ | ||
std_signals.clock(), | ||
# ctrl_signal("clk_hadamard"), | ||
ctrl_signal("reset"), | ||
std_signals.enable(), | ||
ctrl_signal("zero_state"), | ||
Signal("x_data", width=self.total_bits), | ||
ctrl_signal("h_out_en"), | ||
Signal("h_out_addr", width=self._hidden_addr_width), | ||
], | ||
outgoing=[ | ||
std_signals.done(), | ||
Signal("h_out_data", self.total_bits), | ||
], | ||
) | ||
|
||
def get_file_load_order(self) -> list[str]: | ||
return [f"{file}.vhd" for file in self._get_qualified_rom_names()] + [ | ||
f"{self._ram_base_name}.vhd", | ||
f"{self._htanh.name}.vhd", | ||
f"{self._hsigmoid.name}.vhd", | ||
] | ||
|
||
def save_to(self, destination: Path) -> None: | ||
destination = destination.create_subpath(self.name) | ||
weights, biases = self._build_weights() | ||
|
||
self._save_roms( | ||
destination=destination, | ||
parameters=[*weights, *biases], | ||
) | ||
self._save_dual_port_double_clock_ram(destination) | ||
self._save_hardtanh(destination) | ||
self._save_sigmoid(destination) | ||
|
||
destination.create_subpath("lstm_cell").as_file(".vhd").write(self._template) | ||
|
||
def _build_weights(self) -> tuple[list[list], list[list]]: | ||
weights = np.concatenate( | ||
(np.array(self.weights_ih), np.array(self.weights_hh)), axis=1 | ||
) | ||
w_i, w_f, w_g, w_o = weights.reshape(4, -1).tolist() | ||
|
||
bias = np.add(self.biases_ih, self.biases_hh) | ||
b_i, b_f, b_g, b_o = bias.reshape(4, -1).tolist() | ||
|
||
return [w_i, w_f, w_g, w_o], [b_i, b_f, b_g, b_o] | ||
|
||
def _get_qualified_rom_names(self) -> list[str]: | ||
suffix = f"_rom_{self.name}" | ||
return [name + suffix for name in self._rom_base_names] | ||
|
||
def _save_roms(self, destination: Path, parameters: Iterable[Any]) -> None: | ||
for name, values in zip(self._get_qualified_rom_names(), parameters): | ||
rom = Rom( | ||
name=name, | ||
data_width=self.total_bits, | ||
values_as_integers=values, | ||
) | ||
rom.save_to(destination.create_subpath(name)) | ||
|
||
def _save_hardtanh(self, destination: Path) -> None: | ||
self._htanh.save_to(destination) | ||
|
||
def _save_sigmoid(self, destination: Path) -> None: | ||
self._hsigmoid.save_to(destination) | ||
|
||
def _save_dual_port_double_clock_ram(self, destination: Path) -> None: | ||
template = InProjectTemplate( | ||
file_name="dual_port_2_clock_ram.tpl.vhd", | ||
package=module_to_package(self.__module__), | ||
parameters=dict(name=self.name), | ||
) | ||
destination.create_subpath(self._ram_base_name).as_file(".vhd").write(template) |
Oops, something went wrong.