Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Develop #338

Merged
merged 68 commits into from
Jan 19, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
68 commits
Select commit Hold shift + click to select a range
0ffa9b0
feat: lstm reintegration
julianhoever May 8, 2023
c23aa53
feat: lstm reintegration
Oct 4, 2023
67117a4
feat: added skeleton, middleware, top module and Vivado constraints f…
Oct 4, 2023
b480800
Merge branch 'main' into reintegrate-lstm
glencoe Oct 4, 2023
9440fbb
feat: reintegrate lstm implementation
glencoe Oct 4, 2023
1fda0ce
Merge remote-tracking branch 'origin/integrate_middleware_and_skeleto…
glencoe Oct 5, 2023
2a272ea
fix: turn `vhdl.top` into a package
glencoe Oct 5, 2023
41aa4f1
fix: add saving constraints and sources to plug&play ENV5
glencoe Oct 5, 2023
38fe8a7
fix: added missing file for network skeleton tpl
Oct 5, 2023
3473ef2
Merge branch 'integrate_middleware_and_skeletons' into develop
glencoe Oct 5, 2023
c13576d
fix: parametrize names
glencoe Oct 5, 2023
37d7921
feat: add lstm_network testbench
glencoe Oct 5, 2023
da279f7
fix: fix lstm names
glencoe Oct 5, 2023
f879b4d
fix: fix lstm test bench file name
glencoe Oct 5, 2023
e82af52
fix: correct `create_testbench` for lstm
glencoe Oct 5, 2023
53bc568
fix: move `create_testbench` to correct class
glencoe Oct 5, 2023
7db8974
docs: explain relationship between LSTM, LSTMNetwork and their sw/hw …
glencoe Oct 5, 2023
3ad358c
fix: names and templates for lstm
glencoe Oct 5, 2023
397499e
wip: refactor lstm
glencoe Oct 6, 2023
bf2c53f
feat: inject network to FirmwareENv5
glencoe Oct 6, 2023
7f09a2a
fix: don't save uut in testbench
glencoe Oct 6, 2023
2bbd588
fix: add skeleton, etc. to generated files
glencoe Oct 6, 2023
baad73b
fix: fix fxp mac test
glencoe Oct 7, 2023
12b7c27
fix: fix skeleton test
glencoe Oct 7, 2023
7da4b5a
fix: remove unnecessary instance name templ variables
glencoe Oct 7, 2023
73bd008
Merge branch 'develop' into refactor_lstm_impl
glencoe Oct 7, 2023
9d48f09
fix: use linear layer name
glencoe Oct 9, 2023
21d057d
fix(lstm): skeleton naming
glencoe Oct 9, 2023
59f7ed4
docs(middleware): add register documentation
DavidFederl Oct 9, 2023
3678cf8
Merge branch 'main' into develop
glencoe Oct 9, 2023
9937431
fix(firmwareEnv5): save testbench to separate folder
glencoe Oct 9, 2023
eb8c835
feat: added a bash script to automatically build the vivado file with…
Oct 9, 2023
adba799
wip: add skeleton+middleware spec
glencoe Oct 9, 2023
7a95f7c
wip: update skeleton specification
glencoe Oct 9, 2023
005ed36
fix(lstm_skeleton): xil to work lib
glencoe Oct 9, 2023
35f2de2
wip: contribution docs
glencoe Oct 10, 2023
c56a903
Merge remote-tracking branch 'origin/develop' into develop
glencoe Oct 10, 2023
56b0eae
Merge branch 'definition' into develop
glencoe Oct 10, 2023
63a1b9d
fix(contribution,docs): fix tables and language
glencoe Oct 10, 2023
b62d982
docs: add more middleware/skeleton specification
glencoe Oct 10, 2023
f2bd5af
fix(MiddlewareSpec): transmit high byte first instead of low
Oct 10, 2023
1860657
fix(MiddlewareSpec): correct counter in example code
Oct 10, 2023
6828739
refactor(lstm): rename _integration_test to example
julianhoever Oct 11, 2023
7e3f54b
feat(lstm): set more specific return type for create_testbench function
julianhoever Oct 11, 2023
b4ffacb
feat(skeleton): add general skeleton class
julianhoever Oct 11, 2023
8ad3272
feat(firmware): new firmware that does not save testbenches
julianhoever Oct 11, 2023
74ebc32
fix(skeleton): fix wrong signal name in integration test
julianhoever Oct 11, 2023
3a18656
feat(firmware): test that firmware generates skeleton correctly
julianhoever Oct 11, 2023
7234680
Merge pull request #331 from es-ude/Fix-middlewware-spec
telefonjoker100 Oct 11, 2023
17a274c
feat(firmware): create separate LSTMFirmwareENv5
julianhoever Oct 11, 2023
3539c9f
docs: fix hw function id length
glencoe Oct 12, 2023
24683c4
Merge pull request #333 from es-ude/glencoe-patch-1
glencoe Oct 12, 2023
632bf89
fix: added skeleton_1.vhd needs to be changed
Oct 12, 2023
34e8202
feat: add skeleton for sequential layer
glencoe Oct 12, 2023
26b8afa
refactor: remove unnecessary files
glencoe Oct 12, 2023
dcb20b7
fix(test): add expected newline to end of skeleton
glencoe Oct 12, 2023
0f51559
wip: start timing diagram for middleware/skeleton spec
glencoe Oct 13, 2023
574116b
docs(skeleton): add timing diagram to skeleton/middleware spec
glencoe Oct 13, 2023
231f0ca
feat: add support for less than 8 bit in skeleton
Oct 13, 2023
96572fb
docs(skeleton): explain we need to read each result byte two times
glencoe Oct 17, 2023
258a18c
Merge remote-tracking branch 'origin/support-mlp-use-case' into suppo…
glencoe Oct 17, 2023
e4b67cc
fix(skeleton): fix skeleton for mlp use case
glencoe Oct 17, 2023
374d966
wip: prepare convolution hw impl
glencoe Oct 19, 2023
2ac8941
Merge branch 'develop' into support-mlp-use-case
glencoe Oct 19, 2023
5a6baf4
Merge pull request #330 from es-ude/support-mlp-use-case
glencoe Oct 19, 2023
c94dc3b
feat: convert negative numbers to bit patterns using two's complement
Oct 31, 2023
3e1d509
chore: only throw a warning if commit message exceeds char limit
julianhoever Dec 21, 2023
b577aab
Merge pull request #340 from es-ude/fix-failing-commitlint
glencoe Jan 19, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
73 changes: 73 additions & 0 deletions CONTRIBUTION.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
# Contribution Guide
## Concepts
The `elasticai.creator` aims to support
1. the design and training of hardware optimization aware neural networks
2. the translation of designs from 1. to a neural network accelerator in a hardware definition language
The first point means that the network architecture, algorithms used during forward as well as backward
propagation strongly depend on the targeted hardware implementation.
Since the tool is aimed at researchers we want the translation process to be straight-forward and easy to reason about.
Opposed to other tools (Apache TVM, FINN, etc.) we prefer flexible prototyping and handwritten
hardware definitions over a wide range of supported architectures and platforms or highly scalable solutions.

The code-base is composed out of the following packages
- `file_generation`:
- write files to paths on hard disk or to virtual paths (e.g., for testing purposes)
- simple template definition
- template writer/expander
- `vhdl`:
- helper functions to generate frequently used vhdl constructs
- the `Design` interface to facilitate composition of hardware designs
- basic vhdl design without a machine learning layer counterpart to be used as dependencies in other designs (e.g., rom modules)
- additional vhdl designs to make the neural network accelerator accessible via the elasticai.runtime, also see [skeleton](./elasticai/creator/vhdl/system_integrations/README.md)
- `base_modules`:
- basic machine learning modules that are used as dependencies by translatable layers
- `nn`:
- package for public layer api; hosting translatable layers of different categories
- layers within a subpackage of `nn`, e.g. `nn.fixed_point` are supposed to be compatible with each other

## Adding a new translatable layer
Adding a new layer involves three main tasks:
1. define the new ml framework module, typically you want to inherit from `pytorch.nn.Module` and optionally use one
of our layers from `base_module`
- this specifies the forward and backward pass behavior of your layer
2. define a corresponding `Design` class
- this specifies
- the hardware implementation (i.e., which files are written to where and what's their content)
- the interface (`Port`) of the design, so we can automatically combine it with other designs
- to help with the implementation, you can use the template system as well as the `elasticai.creator.vhdl.code_generation` modules
3. define a trainable `DesignCreator`, typically inheriting from the class defined in 1. and implement the `create_design` method which
a. extracts information from the module defined in 1.
b. converts that information to native python types
c. instantiates the corresponding design from 2. providing the necessary data from a.
- this step might involve calling `create_design` on submodules and inject them into the design from 2.


### Ports and automatically combining layers
The algorithm for combining layers lives in `elasticai.creator.vhdl.auto_wire_protocols`.
Currently, we support two types of interfaces: a) bufferless design, b) buffered design.

b) a design that features its own buffer to store computation results and will fetch its input data from a previous buffer
c) a design without buffer that processes data as a stream, this is assumed to be fast enough such that a buffered design can fetch its input data through a bufferless design

The *autowiring algorithm* will take care of generating vhdl code to correctly connect a graph of buffered and bufferless designs.

A bufferless design features the following signals:

| name |direction | type | meaning |
|------|----------|----------------|-------------------------------------------------|
| x | in |std_logic_vector| input data for this layer |
| y | out |std_logic_vector| output data of this layer |
| clock| in |std_logic | clock signal, possibly shared with other layers |


For a buffered design we define the following signals:

| name |direction | type | meaning |
|------|----------|----------------|-------------------------------------------------|
| x | in |std_logic_vector| input data for this layer |
| x_address | out | std_logic_vector | used by this layer to address the previous buffer and fetch data, we address per input data point (this typically corresponds to the number of input features) |
| y | out |std_logic_vector| output data of this layer |
| y_address | in | std_logic_vector | used by the following buffered layer to address this layers output buffer (connected to the following layers x_address). |
| clock| in |std_logic | clock signal, possibly shared with other layers |
|done | out | std_logic | set to "1" when computation is finished |
|enable | in | std_logic | compute while set to "1" |
3 changes: 2 additions & 1 deletion commitlint.config.js
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
module.exports = {
extends: ['@commitlint/config-conventional'],
rules: {
'type-enum': [2, "always", ['feat', 'fix', 'docs', 'style', 'refactor', 'revert', 'chore', 'wip', 'perf']]
'type-enum': [2, "always", ['feat', 'fix', 'docs', 'style', 'refactor', 'revert', 'chore', 'wip', 'perf']],
'header-max-length': [1, 'always', 100]
}
}
Empty file.
Empty file.
19 changes: 19 additions & 0 deletions elasticai/creator/nn/fixed_point/lstm/design/_common_imports.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
from functools import partial

from elasticai.creator.file_generation.savable import Path
from elasticai.creator.file_generation.template import (
InProjectTemplate,
module_to_package,
)
from elasticai.creator.nn.fixed_point._two_complement_fixed_point_config import (
FixedPointConfig,
)
from elasticai.creator.nn.fixed_point.hard_sigmoid import HardSigmoid
from elasticai.creator.nn.fixed_point.hard_tanh.design import HardTanh
from elasticai.creator.nn.fixed_point.linear.design import Linear as FPLinear1d
from elasticai.creator.vhdl.code_generation.addressable import calculate_address_width
from elasticai.creator.vhdl.design import std_signals
from elasticai.creator.vhdl.design.design import Design
from elasticai.creator.vhdl.design.ports import Port
from elasticai.creator.vhdl.design.signal import Signal
from elasticai.creator.vhdl.shared_designs.rom import Rom
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
-- based on xilinx_simple_dual_port_2_clock_ram
-- but we did some custom modifications(chao)
-- Xilinx Simple Dual Port 2 Clock RAM
-- This code implements a parameterizable SDP dual clock memory.
-- If a reset or enable is not necessary, it may be tied off or removed from the code.

library ieee;
use ieee.std_logic_1164.all;


library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;

USE std.textio.all;

entity dual_port_2_clock_ram_${name} is
generic (
RAM_WIDTH : integer := 64; -- Specify RAM data width
RAM_DEPTH_WIDTH : integer := 8; -- Specify RAM depth (number of entries)
RAM_PERFORMANCE : string := "LOW_LATENCY"; -- Select "HIGH_PERFORMANCE" or "LOW_LATENCY"
INIT_FILE : string := "" -- Specify name/location of RAM initialization file if using one (leave blank if not)
);

port (
addra : in std_logic_vector((RAM_DEPTH_WIDTH-1) downto 0); -- Write address bus, width determined from RAM_DEPTH
addrb : in std_logic_vector((RAM_DEPTH_WIDTH-1) downto 0); -- Read address bus, width determined from RAM_DEPTH
dina : in std_logic_vector(RAM_WIDTH-1 downto 0); -- RAM input data
clka : in std_logic; -- Write Clock
clkb : in std_logic; -- Read Clock
wea : in std_logic; -- Write enable
enb : in std_logic; -- RAM Enable, for additional power savings, disable port when not in use
rstb : in std_logic; -- Output reset (does not affect memory contents)
regceb: in std_logic; -- Output register enable
doutb : out std_logic_vector(RAM_WIDTH-1 downto 0) -- RAM output data
);

end dual_port_2_clock_ram_${name};

architecture rtl of dual_port_2_clock_ram_${name} is

constant C_RAM_WIDTH : integer := RAM_WIDTH;
constant C_RAM_DEPTH : integer := 2**RAM_DEPTH_WIDTH;
constant C_RAM_PERFORMANCE : string := RAM_PERFORMANCE;
constant C_INIT_FILE : string := INIT_FILE;


signal doutb_reg : std_logic_vector(C_RAM_WIDTH-1 downto 0) := (others => '0');

type ram_type is array (0 to C_RAM_DEPTH-1) of std_logic_vector (C_RAM_WIDTH-1 downto 0); -- 2D Array Declaration for RAM signal

signal ram_data : std_logic_vector(C_RAM_WIDTH-1 downto 0) ;


function init_from_file_or_zeroes(ramfile : string) return ram_type is
begin
-- if ramfile = "" then --if the file name is empty then init ram with 0
return (others => (others => '0'));
-- else
-- return InitRamFromFile(ramfile) ;
-- end if;
end;
-- Following code defines RAM

signal ram_name : ram_type := init_from_file_or_zeroes(C_INIT_FILE);

begin

process(clka)
begin
if(clka'event and clka = '1') then
if(wea = '1') then
ram_name(to_integer(unsigned(addra))) <= dina;
end if;
end if;
end process;

process(clkb)
begin
if(clkb'event and clkb = '1') then
if(enb = '1') then
ram_data <= ram_name(to_integer(unsigned(addrb)));
end if;
end if;
end process;


-- Following code generates LOW_LATENCY (no output register)
-- Following is a 1 clock cycle read latency at the cost of a longer clock-to-out timing

no_output_register : if C_RAM_PERFORMANCE = "LOW_LATENCY" generate
doutb <= ram_data;
end generate;

-- Following code generates HIGH_PERFORMANCE (use output register)
-- Following is a 2 clock cycle read latency with improved clock-to-out timing

output_register : if C_RAM_PERFORMANCE = "HIGH_PERFORMANCE" generate
process(clkb)
begin
if(clkb'event and clkb = '1') then
if(rstb = '1') then
doutb_reg <= (others => '0');
elsif(regceb = '1') then
doutb_reg <= ram_data;
end if;
end if;
end process;
doutb <= doutb_reg;
end generate;

end rtl;
166 changes: 166 additions & 0 deletions elasticai/creator/nn/fixed_point/lstm/design/fp_lstm_cell.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,166 @@
from collections.abc import Iterable
from functools import partial
from typing import Any, cast

import numpy as np

from ._common_imports import (
Design,
FixedPointConfig,
InProjectTemplate,
Path,
Port,
Rom,
Signal,
calculate_address_width,
module_to_package,
std_signals,
)


class FPLSTMCell(Design):
def __init__(
self,
*,
name: str,
hardtanh: Design,
hardsigmoid: Design,
total_bits: int,
frac_bits: int,
w_ih: list[list[list[int]]],
w_hh: list[list[list[int]]],
b_ih: list[list[int]],
b_hh: list[list[int]],
) -> None:
super().__init__(name=name)
work_library_name: str = "work"

self.input_size = len(w_ih[0])
self.hidden_size = len(w_ih) // 4
self.weights_ih = w_ih
self.weights_hh = w_hh
self.biases_ih = b_ih
self.biases_hh = b_hh
self._config = FixedPointConfig(total_bits=total_bits, frac_bits=frac_bits)
self._htanh = hardtanh
self._hsigmoid = hardsigmoid
self._rom_base_names = ("wi", "wf", "wg", "wo", "bi", "bf", "bg", "bo")
self._ram_base_name = f"dual_port_2_clock_ram_{self.name}"
self._template = InProjectTemplate(
package=module_to_package(self.__module__),
file_name=f"{self.name}.tpl.vhd",
parameters=dict(
name=self.name,
library=work_library_name,
tanh_name=self._htanh.name,
sigmoid_name=self._hsigmoid.name,
data_width=str(total_bits),
frac_width=str(frac_bits),
input_size=str(self.input_size),
hidden_size=str(self.hidden_size),
x_h_addr_width=str(
calculate_address_width(self.input_size + self.hidden_size)
),
hidden_addr_width=str(calculate_address_width(self.hidden_size)),
w_addr_width=str(
calculate_address_width(
(self.input_size + self.hidden_size) * self.hidden_size
)
),
),
)

@property
def total_bits(self) -> int:
return int(cast(str, self._template.parameters["data_width"]))

@property
def frac_bits(self) -> int:
return int(cast(str, self._template.parameters["frac_width"]))

@property
def _hidden_addr_width(self) -> int:
return int(cast(str, self._template.parameters["hidden_addr_width"]))

@property
def _weight_address_width(self) -> int:
return int(cast(str, self._template.parameters["w_addr_width"]))

@property
def port(self) -> Port:
ctrl_signal = partial(Signal, width=0)
return Port(
incoming=[
std_signals.clock(),
# ctrl_signal("clk_hadamard"),
ctrl_signal("reset"),
std_signals.enable(),
ctrl_signal("zero_state"),
Signal("x_data", width=self.total_bits),
ctrl_signal("h_out_en"),
Signal("h_out_addr", width=self._hidden_addr_width),
],
outgoing=[
std_signals.done(),
Signal("h_out_data", self.total_bits),
],
)

def get_file_load_order(self) -> list[str]:
return [f"{file}.vhd" for file in self._get_qualified_rom_names()] + [
f"{self._ram_base_name}.vhd",
f"{self._htanh.name}.vhd",
f"{self._hsigmoid.name}.vhd",
]

def save_to(self, destination: Path) -> None:
destination = destination.create_subpath(self.name)
weights, biases = self._build_weights()

self._save_roms(
destination=destination,
parameters=[*weights, *biases],
)
self._save_dual_port_double_clock_ram(destination)
self._save_hardtanh(destination)
self._save_sigmoid(destination)

destination.create_subpath("lstm_cell").as_file(".vhd").write(self._template)

def _build_weights(self) -> tuple[list[list], list[list]]:
weights = np.concatenate(
(np.array(self.weights_ih), np.array(self.weights_hh)), axis=1
)
w_i, w_f, w_g, w_o = weights.reshape(4, -1).tolist()

bias = np.add(self.biases_ih, self.biases_hh)
b_i, b_f, b_g, b_o = bias.reshape(4, -1).tolist()

return [w_i, w_f, w_g, w_o], [b_i, b_f, b_g, b_o]

def _get_qualified_rom_names(self) -> list[str]:
suffix = f"_rom_{self.name}"
return [name + suffix for name in self._rom_base_names]

def _save_roms(self, destination: Path, parameters: Iterable[Any]) -> None:
for name, values in zip(self._get_qualified_rom_names(), parameters):
rom = Rom(
name=name,
data_width=self.total_bits,
values_as_integers=values,
)
rom.save_to(destination.create_subpath(name))

def _save_hardtanh(self, destination: Path) -> None:
self._htanh.save_to(destination)

def _save_sigmoid(self, destination: Path) -> None:
self._hsigmoid.save_to(destination)

def _save_dual_port_double_clock_ram(self, destination: Path) -> None:
template = InProjectTemplate(
file_name="dual_port_2_clock_ram.tpl.vhd",
package=module_to_package(self.__module__),
parameters=dict(name=self.name),
)
destination.create_subpath(self._ram_base_name).as_file(".vhd").write(template)
Loading