Merge pull request #338 from es-ude/develop

Develop
es-ude · Jan 19, 2024 · 9035010 · 9035010
2 parents a8aaa00 + b577aab
commit 9035010
Show file tree

Hide file tree

Showing 45 changed files with 3,221 additions and 64 deletions.
diff --git a/CONTRIBUTION.md b/CONTRIBUTION.md
@@ -0,0 +1,73 @@
+# Contribution Guide
+## Concepts
+The `elasticai.creator` aims to support
+    1. the design and training of hardware optimization aware neural networks
+    2. the translation of designs from 1. to a neural network accelerator in a hardware definition language
+The first point means that the network architecture, algorithms used during forward as well as backward
+propagation strongly depend on the targeted hardware implementation.
+Since the tool is aimed at researchers we want the translation process to be straight-forward and easy to reason about.
+Opposed to other tools (Apache TVM, FINN, etc.) we prefer flexible prototyping and handwritten
+hardware definitions over a wide range of supported architectures and platforms or highly scalable solutions.
+
+The code-base is composed out of the following packages
+- `file_generation`:
+  - write files to paths on hard disk or to virtual paths (e.g., for testing purposes)
+  - simple template definition
+  - template writer/expander
+- `vhdl`:
+  - helper functions to generate frequently used vhdl constructs
+  - the `Design` interface to facilitate composition of hardware designs
+  - basic vhdl design without a machine learning layer counterpart to be used as dependencies in other designs (e.g., rom modules)
+  - additional vhdl designs to make the neural network accelerator accessible via the elasticai.runtime, also see [skeleton](./elasticai/creator/vhdl/system_integrations/README.md)
+- `base_modules`:
+  - basic machine learning modules that are used as dependencies by translatable layers
+- `nn`:
+  - package for public layer api; hosting translatable layers of different categories
+  - layers within a subpackage of `nn`, e.g. `nn.fixed_point` are supposed to be compatible with each other
+
+## Adding a new translatable layer
+Adding a new layer involves three main tasks:
+1. define the new ml framework module, typically you want to inherit from `pytorch.nn.Module` and optionally use one
+        of our layers from `base_module`
+   - this specifies the forward and backward pass behavior of your layer
+2. define a corresponding `Design` class
+   - this specifies
+     - the hardware implementation (i.e., which files are written to where and what's their content)
+     - the interface (`Port`) of the design, so we can automatically combine it with other designs
+     - to help with the implementation, you can use the template system as well as the `elasticai.creator.vhdl.code_generation` modules
+3. define a trainable `DesignCreator`, typically inheriting from the class defined in 1. and implement the `create_design` method which
+   a. extracts information from the module defined in 1.
+   b. converts that information to native python types
+   c. instantiates the corresponding design from 2. providing the necessary data from a.
+    - this step might involve calling `create_design` on submodules and inject them into the design from 2.
+
+
+### Ports and automatically combining layers
+The algorithm for combining layers lives in `elasticai.creator.vhdl.auto_wire_protocols`.
+Currently, we support two types of interfaces: a) bufferless design, b) buffered design.
+
+b) a design that features its own buffer to store computation results and will fetch its input data from a previous buffer
+c) a design without buffer that processes data as a stream, this is assumed to be fast enough such that a buffered design can fetch its input data through a bufferless design
+
+The *autowiring algorithm* will take care of generating vhdl code to correctly connect a graph of buffered and bufferless designs.
+
+A bufferless design features the following signals:
+
+| name |direction | type           | meaning                                         |
+|------|----------|----------------|-------------------------------------------------|
+| x    | in       |std_logic_vector| input data for this layer                       |
+| y    | out      |std_logic_vector| output data of this layer                       |
+| clock| in       |std_logic       | clock signal, possibly shared with other layers |
+
+
+For a buffered design we define the following signals:
+
+| name |direction | type           | meaning                                         |
+|------|----------|----------------|-------------------------------------------------|
+| x    | in       |std_logic_vector| input data for this layer                       |
+| x_address | out | std_logic_vector | used by this layer to address the previous buffer and fetch data, we address per input data point (this typically corresponds to the number of input features) |
+| y    | out      |std_logic_vector| output data of this layer                       |
+| y_address | in  | std_logic_vector | used by the following buffered layer to address this layers output buffer (connected to the following layers x_address). |
+| clock| in       |std_logic       | clock signal, possibly shared with other layers |
+|done | out | std_logic | set to "1" when computation is finished |
+|enable | in | std_logic | compute while set to "1" |
diff --git a/commitlint.config.js b/commitlint.config.js
@@ -1,6 +1,7 @@
 module.exports = {
     extends: ['@commitlint/config-conventional'],
     rules: {
-        'type-enum': [2, "always", ['feat', 'fix', 'docs', 'style', 'refactor', 'revert', 'chore', 'wip', 'perf']]
+        'type-enum': [2, "always", ['feat', 'fix', 'docs', 'style', 'refactor', 'revert', 'chore', 'wip', 'perf']],
+        'header-max-length': [1, 'always', 100]
     }
 }
diff --git a/elasticai/creator/nn/fixed_point/lstm/__init__.py b/elasticai/creator/nn/fixed_point/lstm/__init__.py
diff --git a/elasticai/creator/nn/fixed_point/lstm/design/__init__.py b/elasticai/creator/nn/fixed_point/lstm/design/__init__.py
diff --git a/elasticai/creator/nn/fixed_point/lstm/design/_common_imports.py b/elasticai/creator/nn/fixed_point/lstm/design/_common_imports.py
@@ -0,0 +1,19 @@
+from functools import partial
+
+from elasticai.creator.file_generation.savable import Path
+from elasticai.creator.file_generation.template import (
+    InProjectTemplate,
+    module_to_package,
+)
+from elasticai.creator.nn.fixed_point._two_complement_fixed_point_config import (
+    FixedPointConfig,
+)
+from elasticai.creator.nn.fixed_point.hard_sigmoid import HardSigmoid
+from elasticai.creator.nn.fixed_point.hard_tanh.design import HardTanh
+from elasticai.creator.nn.fixed_point.linear.design import Linear as FPLinear1d
+from elasticai.creator.vhdl.code_generation.addressable import calculate_address_width
+from elasticai.creator.vhdl.design import std_signals
+from elasticai.creator.vhdl.design.design import Design
+from elasticai.creator.vhdl.design.ports import Port
+from elasticai.creator.vhdl.design.signal import Signal
+from elasticai.creator.vhdl.shared_designs.rom import Rom
diff --git a/elasticai/creator/nn/fixed_point/lstm/design/dual_port_2_clock_ram.tpl.vhd b/elasticai/creator/nn/fixed_point/lstm/design/dual_port_2_clock_ram.tpl.vhd
@@ -0,0 +1,112 @@
+--  based on xilinx_simple_dual_port_2_clock_ram
+--  but we did some custom modifications(chao)
+--  Xilinx Simple Dual Port 2 Clock RAM
+--  This code implements a parameterizable SDP dual clock memory.
+--  If a reset or enable is not necessary, it may be tied off or removed from the code.
+
+library ieee;
+use ieee.std_logic_1164.all;
+
+
+library ieee;
+use ieee.std_logic_1164.all;
+use ieee.numeric_std.all;
+
+USE std.textio.all;
+
+entity dual_port_2_clock_ram_${name} is
+generic (
+    RAM_WIDTH : integer := 64;                      -- Specify RAM data width
+    RAM_DEPTH_WIDTH : integer := 8;                    -- Specify RAM depth (number of entries)
+    RAM_PERFORMANCE : string := "LOW_LATENCY";      -- Select "HIGH_PERFORMANCE" or "LOW_LATENCY"
+    INIT_FILE : string := ""                        -- Specify name/location of RAM initialization file if using one (leave blank if not)
+    );
+
+port (
+        addra : in std_logic_vector((RAM_DEPTH_WIDTH-1) downto 0);     -- Write address bus, width determined from RAM_DEPTH
+        addrb : in std_logic_vector((RAM_DEPTH_WIDTH-1) downto 0);     -- Read address bus, width determined from RAM_DEPTH
+        dina  : in std_logic_vector(RAM_WIDTH-1 downto 0);		  -- RAM input data
+        clka  : in std_logic;                       			  -- Write Clock
+        clkb  : in std_logic;                       			  -- Read Clock
+        wea   : in std_logic;                       			  -- Write enable
+        enb   : in std_logic;                       			  -- RAM Enable, for additional power savings, disable port when not in use
+        rstb  : in std_logic;                       			  -- Output reset (does not affect memory contents)
+        regceb: in std_logic;                       			  -- Output register enable
+        doutb : out std_logic_vector(RAM_WIDTH-1 downto 0)   			  -- RAM output data
+    );
+
+end dual_port_2_clock_ram_${name};
+
+architecture rtl of dual_port_2_clock_ram_${name} is
+
+constant C_RAM_WIDTH : integer := RAM_WIDTH;
+constant C_RAM_DEPTH : integer := 2**RAM_DEPTH_WIDTH;
+constant C_RAM_PERFORMANCE : string := RAM_PERFORMANCE;
+constant C_INIT_FILE : string := INIT_FILE;
+
+
+signal doutb_reg : std_logic_vector(C_RAM_WIDTH-1 downto 0) := (others => '0');
+
+type ram_type is array (0 to C_RAM_DEPTH-1) of std_logic_vector (C_RAM_WIDTH-1 downto 0);          -- 2D Array Declaration for RAM signal
+
+signal ram_data : std_logic_vector(C_RAM_WIDTH-1 downto 0) ;
+
+
+function init_from_file_or_zeroes(ramfile : string) return ram_type is
+begin
+--    if ramfile = "" then --if the file name is empty then init ram with 0
+    return (others => (others => '0'));
+--    else
+--        return InitRamFromFile(ramfile) ;
+--    end if;
+end;
+-- Following code defines RAM
+
+signal ram_name : ram_type := init_from_file_or_zeroes(C_INIT_FILE);
+
+begin
+
+process(clka)
+begin
+    if(clka'event and clka = '1') then
+        if(wea = '1') then
+            ram_name(to_integer(unsigned(addra))) <= dina;
+        end if;
+    end if;
+end process;
+
+process(clkb)
+begin
+    if(clkb'event and clkb = '1') then
+        if(enb = '1') then
+            ram_data <= ram_name(to_integer(unsigned(addrb)));
+        end if;
+    end if;
+end process;
+
+
+--  Following code generates LOW_LATENCY (no output register)
+--  Following is a 1 clock cycle read latency at the cost of a longer clock-to-out timing
+
+no_output_register : if C_RAM_PERFORMANCE = "LOW_LATENCY" generate
+    doutb <= ram_data;
+end generate;
+
+--  Following code generates HIGH_PERFORMANCE (use output register)
+--  Following is a 2 clock cycle read latency with improved clock-to-out timing
+
+output_register : if C_RAM_PERFORMANCE = "HIGH_PERFORMANCE"  generate
+    process(clkb)
+    begin
+        if(clkb'event and clkb = '1') then
+            if(rstb = '1') then
+                doutb_reg <= (others => '0');
+            elsif(regceb = '1') then
+                doutb_reg <= ram_data;
+            end if;
+        end if;
+    end process;
+    doutb <= doutb_reg;
+end generate;
+
+end rtl;
diff --git a/elasticai/creator/nn/fixed_point/lstm/design/fp_lstm_cell.py b/elasticai/creator/nn/fixed_point/lstm/design/fp_lstm_cell.py
@@ -0,0 +1,166 @@
+from collections.abc import Iterable
+from functools import partial
+from typing import Any, cast
+
+import numpy as np
+
+from ._common_imports import (
+    Design,
+    FixedPointConfig,
+    InProjectTemplate,
+    Path,
+    Port,
+    Rom,
+    Signal,
+    calculate_address_width,
+    module_to_package,
+    std_signals,
+)
+
+
+class FPLSTMCell(Design):
+    def __init__(
+        self,
+        *,
+        name: str,
+        hardtanh: Design,
+        hardsigmoid: Design,
+        total_bits: int,
+        frac_bits: int,
+        w_ih: list[list[list[int]]],
+        w_hh: list[list[list[int]]],
+        b_ih: list[list[int]],
+        b_hh: list[list[int]],
+    ) -> None:
+        super().__init__(name=name)
+        work_library_name: str = "work"
+
+        self.input_size = len(w_ih[0])
+        self.hidden_size = len(w_ih) // 4
+        self.weights_ih = w_ih
+        self.weights_hh = w_hh
+        self.biases_ih = b_ih
+        self.biases_hh = b_hh
+        self._config = FixedPointConfig(total_bits=total_bits, frac_bits=frac_bits)
+        self._htanh = hardtanh
+        self._hsigmoid = hardsigmoid
+        self._rom_base_names = ("wi", "wf", "wg", "wo", "bi", "bf", "bg", "bo")
+        self._ram_base_name = f"dual_port_2_clock_ram_{self.name}"
+        self._template = InProjectTemplate(
+            package=module_to_package(self.__module__),
+            file_name=f"{self.name}.tpl.vhd",
+            parameters=dict(
+                name=self.name,
+                library=work_library_name,
+                tanh_name=self._htanh.name,
+                sigmoid_name=self._hsigmoid.name,
+                data_width=str(total_bits),
+                frac_width=str(frac_bits),
+                input_size=str(self.input_size),
+                hidden_size=str(self.hidden_size),
+                x_h_addr_width=str(
+                    calculate_address_width(self.input_size + self.hidden_size)
+                ),
+                hidden_addr_width=str(calculate_address_width(self.hidden_size)),
+                w_addr_width=str(
+                    calculate_address_width(
+                        (self.input_size + self.hidden_size) * self.hidden_size
+                    )
+                ),
+            ),
+        )
+
+    @property
+    def total_bits(self) -> int:
+        return int(cast(str, self._template.parameters["data_width"]))
+
+    @property
+    def frac_bits(self) -> int:
+        return int(cast(str, self._template.parameters["frac_width"]))
+
+    @property
+    def _hidden_addr_width(self) -> int:
+        return int(cast(str, self._template.parameters["hidden_addr_width"]))
+
+    @property
+    def _weight_address_width(self) -> int:
+        return int(cast(str, self._template.parameters["w_addr_width"]))
+
+    @property
+    def port(self) -> Port:
+        ctrl_signal = partial(Signal, width=0)
+        return Port(
+            incoming=[
+                std_signals.clock(),
+                # ctrl_signal("clk_hadamard"),
+                ctrl_signal("reset"),
+                std_signals.enable(),
+                ctrl_signal("zero_state"),
+                Signal("x_data", width=self.total_bits),
+                ctrl_signal("h_out_en"),
+                Signal("h_out_addr", width=self._hidden_addr_width),
+            ],
+            outgoing=[
+                std_signals.done(),
+                Signal("h_out_data", self.total_bits),
+            ],
+        )
+
+    def get_file_load_order(self) -> list[str]:
+        return [f"{file}.vhd" for file in self._get_qualified_rom_names()] + [
+            f"{self._ram_base_name}.vhd",
+            f"{self._htanh.name}.vhd",
+            f"{self._hsigmoid.name}.vhd",
+        ]
+
+    def save_to(self, destination: Path) -> None:
+        destination = destination.create_subpath(self.name)
+        weights, biases = self._build_weights()
+
+        self._save_roms(
+            destination=destination,
+            parameters=[*weights, *biases],
+        )
+        self._save_dual_port_double_clock_ram(destination)
+        self._save_hardtanh(destination)
+        self._save_sigmoid(destination)
+
+        destination.create_subpath("lstm_cell").as_file(".vhd").write(self._template)
+
+    def _build_weights(self) -> tuple[list[list], list[list]]:
+        weights = np.concatenate(
+            (np.array(self.weights_ih), np.array(self.weights_hh)), axis=1
+        )
+        w_i, w_f, w_g, w_o = weights.reshape(4, -1).tolist()
+
+        bias = np.add(self.biases_ih, self.biases_hh)
+        b_i, b_f, b_g, b_o = bias.reshape(4, -1).tolist()
+
+        return [w_i, w_f, w_g, w_o], [b_i, b_f, b_g, b_o]
+
+    def _get_qualified_rom_names(self) -> list[str]:
+        suffix = f"_rom_{self.name}"
+        return [name + suffix for name in self._rom_base_names]
+
+    def _save_roms(self, destination: Path, parameters: Iterable[Any]) -> None:
+        for name, values in zip(self._get_qualified_rom_names(), parameters):
+            rom = Rom(
+                name=name,
+                data_width=self.total_bits,
+                values_as_integers=values,
+            )
+            rom.save_to(destination.create_subpath(name))
+
+    def _save_hardtanh(self, destination: Path) -> None:
+        self._htanh.save_to(destination)
+
+    def _save_sigmoid(self, destination: Path) -> None:
+        self._hsigmoid.save_to(destination)
+
+    def _save_dual_port_double_clock_ram(self, destination: Path) -> None:
+        template = InProjectTemplate(
+            file_name="dual_port_2_clock_ram.tpl.vhd",
+            package=module_to_package(self.__module__),
+            parameters=dict(name=self.name),
+        )
+        destination.create_subpath(self._ram_base_name).as_file(".vhd").write(template)