## Defining a New DNN Model Architecture with NeuSim

While `neusim/npusim/frontend` already includes several popular DNN model architectures (LLMs, DLRM, DiT-XL, GLIGEN), this section provides a brief guide on how to implement new model architectures in NeuSim.

To add support for a new DNN model architecture, two new classes need to be created at minimum: a new ModelConfig class and a new OpsGenerator class. The new files are typically put under the `neusim/configs/models` and `neusim/npusim/frontend` directory; however, for this tutorial, we will create them directly in this notebook for demonstration purposes.

In this tutorial, we will create a simple model with repeated linear layers; each layer has the same hidden dimension, and the input/output dimensions are the same.

In [8]:
# common imports
from typing import Any
from absl import logging

### Creating the new ModelConfig Class

The new ModelConfig class defines the configuration parameters specific to the new model architecture. It should inherit from the base `ModelConfig` class and define any additional parameters needed for the new model.

The `ModelConfig` base class is a pydantic model, so all configuration parameters in `MyModelConfig` should be defined as class attributes with type annotations (and preferably default values).
The base class already extends `ChipConfig` and `SystemConfig`, so all chip and system configuration parameters are also included in `MyModelConfig`.

In [5]:
from neusim.configs.models.ModelConfig import ModelConfig

class MyModelConfig(ModelConfig):
    """
    Configuration class for MyModel architecture.
    """
    model_name: str = "MyModel"
    model_type: str = "my_model"
    num_layers: int = 12
    hidden_dim: int = 8192
    input_dim: int = 1024

### Creating the OpsGenerator Class

Next, we create the MyModelOpsGenerator class, which defines all tensor operators in MyModel.
There is not strict requirement on how to implement the OpsGenerator class, but it should at least implement the following two methods:
- `generate()`: This method generates all tensor operators in the model and returns them as a list of `Operator` objects.
- `compute_memory_footprint_bytes()`: This method computes the memory footprint of all operators in bytes. It is used to estimate the memory requirements of the model during simulation and filter out invalid configurations during design space exploration.

In addition, it is recommended to implement `dump_to_file()` method to dump all generated operators into a CSV file for easier debugging and analysis.

In [24]:
from neusim.npusim.frontend import Operator
import neusim.npusim.frontend.llm_ops_lib as ops_lib
import neusim.npusim.frontend.op_analysis_lib as analysis_lib
import csv
import os

class MyModelOpsGenerator:
    def __init__(self, config: dict[str, Any] | MyModelConfig):
        if isinstance(config, dict):
            self.config: MyModelConfig = MyModelConfig.model_validate(config)
        else:
            self.config = config
        # assert self.config.use_flash_attention == True
        assert self.config.model_type == "my_model", f"Invalid config: {self.config}"

    def generate(
        self, fusion_id_start: int = 2, dump_to_file: bool = True, **kwargs
    ) -> list[Operator.Operator]:
        """
        Generate the MyModel ops and return them as a list of Operator objects.
        This function generates a simple model with repeated linear layers.
        """
        ops: list[Operator.Operator] = []
        fusion_id = fusion_id_start

        for layer_id in range(self.config.num_layers):
            # Add a linear layer
            batch_dim = self.config.global_batch_size
            input_dim = self.config.input_dim
            output_dim = self.config.input_dim
            ops.append(
                ops_lib.create_einsum_op(
                    input_a_shape=(batch_dim, input_dim),
                    input_b_shape=(input_dim, self.config.hidden_dim),
                    einsum_expr="bi;ih->bh",
                    name=f"MyModel_Layer{layer_id}_Linear1",
                    fusion_id=fusion_id,
                )
            )
            fusion_id += 1
            ops.append(
                ops_lib.create_einsum_op(
                    input_a_shape=(batch_dim, self.config.hidden_dim),
                    input_b_shape=(self.config.hidden_dim, output_dim),
                    einsum_expr="bh;ho->bo",
                    name=f"MyModel_Layer{layer_id}_Linear2",
                    fusion_id=fusion_id,
                )
            )
            fusion_id += 1

        # invoke NeuSim backend to fill out the performance stats for each operator
        ops = analysis_lib.fill_operators_execution_info(ops, self.config)

        if dump_to_file:
            self.dump_to_file(ops)

        return ops

    def compute_memory_footprint_bytes(self) -> int:
        """
        Compute the memory footprint of the MyModel ops in bytes.
        """
        return 1024  # just a placeholder value

    def dump_to_file(self, ops: list[Operator.Operator]):
        logging.info(
            "Generating MyModel ops and dumping to %s.",
            os.path.abspath(self.config.output_file_path),
        )
        ops_dict = [Operator.to_csv_dict(op) for op in ops]
        with open(self.config.output_file_path, "w") as f:
            writer = csv.DictWriter(f, fieldnames=ops_dict[0].keys())
            writer.writeheader()
            writer.writerows(ops_dict)

### Running the Simulation with MyModel

Now, we can launch a simulation with the newly defined MyModel architecture.
First, we create a simulation configuration by reading predifined ChipConfigs and SystemConfigs.
Then, we create `model_cfg` dictionary to define a MyModel DNN model.
We merge them into a single configuration dictionary and create a `MyModelConfig` object from it.
Finally, we create a `MyModelOpsGenerator` object with the configuration and generate all tensor operators in the model.

In [25]:
import json
from pathlib import Path

neusim_root = Path("./").resolve().parent.parent
print(f"NeuSim root directory: {neusim_root}")

# Load NPU configuration from JSON file
with open(neusim_root / "configs/chips/tpuv5p.json") as f:
    npu_cfg = json.load(f)

# load system configuration from JSON file
with open(neusim_root / "configs/systems/system_config.json") as f:
    system_cfg = json.load(f)

model_cfg = {
    "num_layers": 8,
    "hidden_dim": 8192,
    "input_dim": 1024,
    "global_batch_size": 16,
}

# Merge all configurations into a single dictionary
config_dict = { **system_cfg, **npu_cfg, **model_cfg }

# If you want to override some parameters, you can do so by directly modifying the dictionary.
# For example, to specify the output file path:
config_dict["output_file_path"] = str(neusim_root / "results/single_model_run/my_model-inference-v5p.csv")

# create output directory if it does not exist
output_dir = Path(config_dict["output_file_path"]).parent
output_dir.mkdir(parents=True, exist_ok=True)

# Create a MyModelConfig object from the dictionary
# This step is optional as our ops generator class can accept a Python dictionary directly
# and automatically convert it to an `MyModelConfig` object internally.
config: MyModelConfig = MyModelConfig.model_validate(config_dict)

NeuSim root directory: /mnt/nvme0n1p1/yuqixue2/neusim/NeuSim


In [26]:
config

MyModelConfig(PUE=1.1, carbon_intensity_kgCO2_per_kWh=0.5, name='5p', num_sa=8, num_vu=6, num_vu_ports=6, hbm_bw_GBps=2765.0, hbm_latency_ns=500, vmem_size_MB=128, freq_GHz=1.7, sa_dim=128, hbm_size_GB=95, ici_bw_GBps=200.0, dcn_bw_GBps=25.0, pcie_bw_GBps=32, ici_latency_ns=3330, dcn_latency_ns=3700, pcie_latency_ns=400, TDP_W=350.0, min_power_W=1.0, avg_power_W=1.0, max_power_W=331.0, HBM_GBps_per_W=123.5, ICI_GBps_per_W=56.583, ICI_topology='TORUS_3D', embodied_carbon_kgCO2=585.0, use_vu_for_small_matmul=True, static_power_W_per_sa=1.35868996, static_power_W_per_vu=0.475076728, static_power_vmem_W=24.21353615, static_power_ici_W=6.114104803, static_power_hbm_mc_W=10.264041296, static_power_hbm_phy_W=15.396061944, static_power_other_W=44.82811018, dynamic_power_W_per_SA=28.19413333, dynamic_power_W_per_VU=2.65216, dynamic_power_vmem_W=50.18368, dynamic_power_ici_W_per_GBps=0.01767315271, dynamic_power_hbm_W_per_GBps=0.01261538462, dynamic_power_other_W=0.0, pg_config='NoPG', enable_dv

In [27]:
# Create an instance of the ops generator
ops_generator = MyModelOpsGenerator(config)

# Run the simulation
ops = ops_generator.generate(dump_to_file=True)

Calculated vmem_time_ns: 1186.9417461320468 for op: einsumMyModelLayer0Linear1
Op: einsumMyModelLayer0Linear1, mxu_time: 4838, vpu_time: 803, compute_time: 4838, memory_time: 5751, ici_time: 0, vmem_time: 1186.9417461320468, exe_time: 5751
Calculated vmem_time_ns: 1186.9417461320468 for op: einsumMyModelLayer0Linear2
Op: einsumMyModelLayer0Linear2, mxu_time: 4838, vpu_time: 803, compute_time: 4838, memory_time: 5751, ici_time: 0, vmem_time: 1186.9417461320468, exe_time: 5751
Calculated vmem_time_ns: 1186.9417461320468 for op: einsumMyModelLayer1Linear1
Op: einsumMyModelLayer1Linear1, mxu_time: 4838, vpu_time: 803, compute_time: 4838, memory_time: 5751, ici_time: 0, vmem_time: 1186.9417461320468, exe_time: 5751
Calculated vmem_time_ns: 1186.9417461320468 for op: einsumMyModelLayer1Linear2
Op: einsumMyModelLayer1Linear2, mxu_time: 4838, vpu_time: 803, compute_time: 4838, memory_time: 5751, ici_time: 0, vmem_time: 1186.9417461320468, exe_time: 5751
Calculated vmem_time_ns: 1186.9417461320

In [28]:
ops

[EinsumOperator(stats=EinsumStatistics(count=1, bounded_by='Memory', execution_time_ns=5751, memory_time_ns=5751, ici_time_ns=0, sa_time_ns=4838, vu_time_ns=803, vmem_time_ns=1186, transpose_time_ns=0, permute_time_ns=0, memory_traffic_bytes=17072128, ici_traffic_outbound_bytes=0, ici_traffic_inbound_bytes=0, dcn_time_ns=0, pcie_time_ns=0, flop_count=268435456, weight_size_bytes=16777216, parsed_op_type='Einsum', static_energy_sa_J=0.0005814940249272558, static_energy_vu_J=0.00011436975053280003, static_energy_sram_J=0.0001609850247383237, static_energy_hbm_J=0.0001639680597036, static_energy_ici_J=5.248092048067613e-05, static_energy_other_J=0.00025780646164518, dynamic_energy_sa_J=0.0012638377030881533, dynamic_energy_vu_J=8.914958288372094e-05, dynamic_energy_sram_J=6.880675662427746e-05, dynamic_energy_hbm_J=0.00022289303085077698, dynamic_energy_ici_J=0.0, dynamic_energy_other_J=0.0, num_setpm_sa=0, num_setpm_vu=0, num_setpm_sram=0, num_setpm_hbm=0, num_setpm_ici=0, activity_facto

The simulation results will also be dumped in `results/single_model_run/my_model-inference-v5p.csv`.

### Power Simulation

In [None]:
import csv
from neusim.npusim.frontend import power_analysis_lib as power_lib

# This can be either a power_lib.PowerGatingConfig or a string representing a pre-defined power gating strategy. See the get_power_gating_config() function in power_analysis_lib.py for more details.
power_gating_strategy = "NoPG"  # NoPG stands for no power gating.

# This can also be done separately for prefill_ops and decode_ops.
for op in ops:
    power_lib.analyze_operator_energy(
        op, config, pg_config=power_gating_strategy
    )

# convert the Operator objects to dictionaries for CSV writing
ops = [op.to_csv_dict() for op in ops]

# Dump the operators to the CSV file.
with open(config.output_file_path, "w") as f:
    writer = csv.DictWriter(f, fieldnames=ops[0].keys())
    writer.writeheader()
    writer.writerows(ops)