Skip to content

Commit

Permalink
Merge pull request #338 from es-ude/develop
Browse files Browse the repository at this point in the history
Develop
  • Loading branch information
glencoe committed Jan 19, 2024
2 parents a8aaa00 + b577aab commit 9035010
Show file tree
Hide file tree
Showing 45 changed files with 3,221 additions and 64 deletions.
73 changes: 73 additions & 0 deletions CONTRIBUTION.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
# Contribution Guide
## Concepts
The `elasticai.creator` aims to support
1. the design and training of hardware optimization aware neural networks
2. the translation of designs from 1. to a neural network accelerator in a hardware definition language
The first point means that the network architecture, algorithms used during forward as well as backward
propagation strongly depend on the targeted hardware implementation.
Since the tool is aimed at researchers we want the translation process to be straight-forward and easy to reason about.
Opposed to other tools (Apache TVM, FINN, etc.) we prefer flexible prototyping and handwritten
hardware definitions over a wide range of supported architectures and platforms or highly scalable solutions.

The code-base is composed out of the following packages
- `file_generation`:
- write files to paths on hard disk or to virtual paths (e.g., for testing purposes)
- simple template definition
- template writer/expander
- `vhdl`:
- helper functions to generate frequently used vhdl constructs
- the `Design` interface to facilitate composition of hardware designs
- basic vhdl design without a machine learning layer counterpart to be used as dependencies in other designs (e.g., rom modules)
- additional vhdl designs to make the neural network accelerator accessible via the elasticai.runtime, also see [skeleton](./elasticai/creator/vhdl/system_integrations/README.md)
- `base_modules`:
- basic machine learning modules that are used as dependencies by translatable layers
- `nn`:
- package for public layer api; hosting translatable layers of different categories
- layers within a subpackage of `nn`, e.g. `nn.fixed_point` are supposed to be compatible with each other

## Adding a new translatable layer
Adding a new layer involves three main tasks:
1. define the new ml framework module, typically you want to inherit from `pytorch.nn.Module` and optionally use one
of our layers from `base_module`
- this specifies the forward and backward pass behavior of your layer
2. define a corresponding `Design` class
- this specifies
- the hardware implementation (i.e., which files are written to where and what's their content)
- the interface (`Port`) of the design, so we can automatically combine it with other designs
- to help with the implementation, you can use the template system as well as the `elasticai.creator.vhdl.code_generation` modules
3. define a trainable `DesignCreator`, typically inheriting from the class defined in 1. and implement the `create_design` method which
a. extracts information from the module defined in 1.
b. converts that information to native python types
c. instantiates the corresponding design from 2. providing the necessary data from a.
- this step might involve calling `create_design` on submodules and inject them into the design from 2.


### Ports and automatically combining layers
The algorithm for combining layers lives in `elasticai.creator.vhdl.auto_wire_protocols`.
Currently, we support two types of interfaces: a) bufferless design, b) buffered design.

b) a design that features its own buffer to store computation results and will fetch its input data from a previous buffer
c) a design without buffer that processes data as a stream, this is assumed to be fast enough such that a buffered design can fetch its input data through a bufferless design

The *autowiring algorithm* will take care of generating vhdl code to correctly connect a graph of buffered and bufferless designs.

A bufferless design features the following signals:

| name |direction | type | meaning |
|------|----------|----------------|-------------------------------------------------|
| x | in |std_logic_vector| input data for this layer |
| y | out |std_logic_vector| output data of this layer |
| clock| in |std_logic | clock signal, possibly shared with other layers |


For a buffered design we define the following signals:

| name |direction | type | meaning |
|------|----------|----------------|-------------------------------------------------|
| x | in |std_logic_vector| input data for this layer |
| x_address | out | std_logic_vector | used by this layer to address the previous buffer and fetch data, we address per input data point (this typically corresponds to the number of input features) |
| y | out |std_logic_vector| output data of this layer |
| y_address | in | std_logic_vector | used by the following buffered layer to address this layers output buffer (connected to the following layers x_address). |
| clock| in |std_logic | clock signal, possibly shared with other layers |
|done | out | std_logic | set to "1" when computation is finished |
|enable | in | std_logic | compute while set to "1" |
3 changes: 2 additions & 1 deletion commitlint.config.js
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
module.exports = {
extends: ['@commitlint/config-conventional'],
rules: {
'type-enum': [2, "always", ['feat', 'fix', 'docs', 'style', 'refactor', 'revert', 'chore', 'wip', 'perf']]
'type-enum': [2, "always", ['feat', 'fix', 'docs', 'style', 'refactor', 'revert', 'chore', 'wip', 'perf']],
'header-max-length': [1, 'always', 100]
}
}
Empty file.
Empty file.
19 changes: 19 additions & 0 deletions elasticai/creator/nn/fixed_point/lstm/design/_common_imports.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
from functools import partial

from elasticai.creator.file_generation.savable import Path
from elasticai.creator.file_generation.template import (
InProjectTemplate,
module_to_package,
)
from elasticai.creator.nn.fixed_point._two_complement_fixed_point_config import (
FixedPointConfig,
)
from elasticai.creator.nn.fixed_point.hard_sigmoid import HardSigmoid
from elasticai.creator.nn.fixed_point.hard_tanh.design import HardTanh
from elasticai.creator.nn.fixed_point.linear.design import Linear as FPLinear1d
from elasticai.creator.vhdl.code_generation.addressable import calculate_address_width
from elasticai.creator.vhdl.design import std_signals
from elasticai.creator.vhdl.design.design import Design
from elasticai.creator.vhdl.design.ports import Port
from elasticai.creator.vhdl.design.signal import Signal
from elasticai.creator.vhdl.shared_designs.rom import Rom
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
-- based on xilinx_simple_dual_port_2_clock_ram
-- but we did some custom modifications(chao)
-- Xilinx Simple Dual Port 2 Clock RAM
-- This code implements a parameterizable SDP dual clock memory.
-- If a reset or enable is not necessary, it may be tied off or removed from the code.

library ieee;
use ieee.std_logic_1164.all;


library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;

USE std.textio.all;

entity dual_port_2_clock_ram_${name} is
generic (
RAM_WIDTH : integer := 64; -- Specify RAM data width
RAM_DEPTH_WIDTH : integer := 8; -- Specify RAM depth (number of entries)
RAM_PERFORMANCE : string := "LOW_LATENCY"; -- Select "HIGH_PERFORMANCE" or "LOW_LATENCY"
INIT_FILE : string := "" -- Specify name/location of RAM initialization file if using one (leave blank if not)
);

port (
addra : in std_logic_vector((RAM_DEPTH_WIDTH-1) downto 0); -- Write address bus, width determined from RAM_DEPTH
addrb : in std_logic_vector((RAM_DEPTH_WIDTH-1) downto 0); -- Read address bus, width determined from RAM_DEPTH
dina : in std_logic_vector(RAM_WIDTH-1 downto 0); -- RAM input data
clka : in std_logic; -- Write Clock
clkb : in std_logic; -- Read Clock
wea : in std_logic; -- Write enable
enb : in std_logic; -- RAM Enable, for additional power savings, disable port when not in use
rstb : in std_logic; -- Output reset (does not affect memory contents)
regceb: in std_logic; -- Output register enable
doutb : out std_logic_vector(RAM_WIDTH-1 downto 0) -- RAM output data
);

end dual_port_2_clock_ram_${name};

architecture rtl of dual_port_2_clock_ram_${name} is

constant C_RAM_WIDTH : integer := RAM_WIDTH;
constant C_RAM_DEPTH : integer := 2**RAM_DEPTH_WIDTH;
constant C_RAM_PERFORMANCE : string := RAM_PERFORMANCE;
constant C_INIT_FILE : string := INIT_FILE;


signal doutb_reg : std_logic_vector(C_RAM_WIDTH-1 downto 0) := (others => '0');

type ram_type is array (0 to C_RAM_DEPTH-1) of std_logic_vector (C_RAM_WIDTH-1 downto 0); -- 2D Array Declaration for RAM signal

signal ram_data : std_logic_vector(C_RAM_WIDTH-1 downto 0) ;


function init_from_file_or_zeroes(ramfile : string) return ram_type is
begin
-- if ramfile = "" then --if the file name is empty then init ram with 0
return (others => (others => '0'));
-- else
-- return InitRamFromFile(ramfile) ;
-- end if;
end;
-- Following code defines RAM

signal ram_name : ram_type := init_from_file_or_zeroes(C_INIT_FILE);

begin

process(clka)
begin
if(clka'event and clka = '1') then
if(wea = '1') then
ram_name(to_integer(unsigned(addra))) <= dina;
end if;
end if;
end process;

process(clkb)
begin
if(clkb'event and clkb = '1') then
if(enb = '1') then
ram_data <= ram_name(to_integer(unsigned(addrb)));
end if;
end if;
end process;


-- Following code generates LOW_LATENCY (no output register)
-- Following is a 1 clock cycle read latency at the cost of a longer clock-to-out timing

no_output_register : if C_RAM_PERFORMANCE = "LOW_LATENCY" generate
doutb <= ram_data;
end generate;

-- Following code generates HIGH_PERFORMANCE (use output register)
-- Following is a 2 clock cycle read latency with improved clock-to-out timing

output_register : if C_RAM_PERFORMANCE = "HIGH_PERFORMANCE" generate
process(clkb)
begin
if(clkb'event and clkb = '1') then
if(rstb = '1') then
doutb_reg <= (others => '0');
elsif(regceb = '1') then
doutb_reg <= ram_data;
end if;
end if;
end process;
doutb <= doutb_reg;
end generate;

end rtl;
166 changes: 166 additions & 0 deletions elasticai/creator/nn/fixed_point/lstm/design/fp_lstm_cell.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,166 @@
from collections.abc import Iterable
from functools import partial
from typing import Any, cast

import numpy as np

from ._common_imports import (
Design,
FixedPointConfig,
InProjectTemplate,
Path,
Port,
Rom,
Signal,
calculate_address_width,
module_to_package,
std_signals,
)


class FPLSTMCell(Design):
def __init__(
self,
*,
name: str,
hardtanh: Design,
hardsigmoid: Design,
total_bits: int,
frac_bits: int,
w_ih: list[list[list[int]]],
w_hh: list[list[list[int]]],
b_ih: list[list[int]],
b_hh: list[list[int]],
) -> None:
super().__init__(name=name)
work_library_name: str = "work"

self.input_size = len(w_ih[0])
self.hidden_size = len(w_ih) // 4
self.weights_ih = w_ih
self.weights_hh = w_hh
self.biases_ih = b_ih
self.biases_hh = b_hh
self._config = FixedPointConfig(total_bits=total_bits, frac_bits=frac_bits)
self._htanh = hardtanh
self._hsigmoid = hardsigmoid
self._rom_base_names = ("wi", "wf", "wg", "wo", "bi", "bf", "bg", "bo")
self._ram_base_name = f"dual_port_2_clock_ram_{self.name}"
self._template = InProjectTemplate(
package=module_to_package(self.__module__),
file_name=f"{self.name}.tpl.vhd",
parameters=dict(
name=self.name,
library=work_library_name,
tanh_name=self._htanh.name,
sigmoid_name=self._hsigmoid.name,
data_width=str(total_bits),
frac_width=str(frac_bits),
input_size=str(self.input_size),
hidden_size=str(self.hidden_size),
x_h_addr_width=str(
calculate_address_width(self.input_size + self.hidden_size)
),
hidden_addr_width=str(calculate_address_width(self.hidden_size)),
w_addr_width=str(
calculate_address_width(
(self.input_size + self.hidden_size) * self.hidden_size
)
),
),
)

@property
def total_bits(self) -> int:
return int(cast(str, self._template.parameters["data_width"]))

@property
def frac_bits(self) -> int:
return int(cast(str, self._template.parameters["frac_width"]))

@property
def _hidden_addr_width(self) -> int:
return int(cast(str, self._template.parameters["hidden_addr_width"]))

@property
def _weight_address_width(self) -> int:
return int(cast(str, self._template.parameters["w_addr_width"]))

@property
def port(self) -> Port:
ctrl_signal = partial(Signal, width=0)
return Port(
incoming=[
std_signals.clock(),
# ctrl_signal("clk_hadamard"),
ctrl_signal("reset"),
std_signals.enable(),
ctrl_signal("zero_state"),
Signal("x_data", width=self.total_bits),
ctrl_signal("h_out_en"),
Signal("h_out_addr", width=self._hidden_addr_width),
],
outgoing=[
std_signals.done(),
Signal("h_out_data", self.total_bits),
],
)

def get_file_load_order(self) -> list[str]:
return [f"{file}.vhd" for file in self._get_qualified_rom_names()] + [
f"{self._ram_base_name}.vhd",
f"{self._htanh.name}.vhd",
f"{self._hsigmoid.name}.vhd",
]

def save_to(self, destination: Path) -> None:
destination = destination.create_subpath(self.name)
weights, biases = self._build_weights()

self._save_roms(
destination=destination,
parameters=[*weights, *biases],
)
self._save_dual_port_double_clock_ram(destination)
self._save_hardtanh(destination)
self._save_sigmoid(destination)

destination.create_subpath("lstm_cell").as_file(".vhd").write(self._template)

def _build_weights(self) -> tuple[list[list], list[list]]:
weights = np.concatenate(
(np.array(self.weights_ih), np.array(self.weights_hh)), axis=1
)
w_i, w_f, w_g, w_o = weights.reshape(4, -1).tolist()

bias = np.add(self.biases_ih, self.biases_hh)
b_i, b_f, b_g, b_o = bias.reshape(4, -1).tolist()

return [w_i, w_f, w_g, w_o], [b_i, b_f, b_g, b_o]

def _get_qualified_rom_names(self) -> list[str]:
suffix = f"_rom_{self.name}"
return [name + suffix for name in self._rom_base_names]

def _save_roms(self, destination: Path, parameters: Iterable[Any]) -> None:
for name, values in zip(self._get_qualified_rom_names(), parameters):
rom = Rom(
name=name,
data_width=self.total_bits,
values_as_integers=values,
)
rom.save_to(destination.create_subpath(name))

def _save_hardtanh(self, destination: Path) -> None:
self._htanh.save_to(destination)

def _save_sigmoid(self, destination: Path) -> None:
self._hsigmoid.save_to(destination)

def _save_dual_port_double_clock_ram(self, destination: Path) -> None:
template = InProjectTemplate(
file_name="dual_port_2_clock_ram.tpl.vhd",
package=module_to_package(self.__module__),
parameters=dict(name=self.name),
)
destination.create_subpath(self._ram_base_name).as_file(".vhd").write(template)
Loading

0 comments on commit 9035010

Please sign in to comment.