This repository contains the complete FPGA prototype implementation and analytical models for the paper:
"The Immutable Tensor Architecture: A Pure Dataflow Approach for Secure, Energy-Efficient AI Inference"
The Immutable Tensor Architecture (ITA) treats neural network weights as physical circuit topology rather than data, eliminating the memory hierarchy entirely. This prototype demonstrates the concept on a Xilinx Zynq-7020 FPGA.
-
FPGA Board: Digilent Zybo Z7-20 (xc7z020clg400-1)
- Available for ~$200 USD
- Xilinx Zynq-7020 SoC
- 53,200 LUTs, 13,300 CARRY4 cells
-
Development Tools:
- Vivado Design Suite 2020.2 or later (WebPACK edition is free)
- Python 3.7+ with numpy, matplotlib
public_release/
├── rtl/ # SystemVerilog RTL files
│ ├── ita_top.sv # Top-level module
│ ├── expand_engine.sv # 64→128 layer (baseline)
│ ├── contract_engine.sv # 128→64 layer (hardwired)
│ └── uart_tx.sv # UART transmitter for output
├── scripts/ # Build and verification scripts
│ ├── build_vivado.tcl # Baseline build script
│ ├── build_hardwired.tcl # Hardwired build script
│ ├── gen_weights.py # Weight generation
│ └── verify_output.py # Output verification
├── sim/ # Analytical simulation models
│ ├── simulation.py # Gate count analysis
│ ├── energy.py # Energy efficiency model
│ └── economic.py # Security cost analysis
├── constraints/ # FPGA constraints
│ └── zybo_z7_20.xdc # Pin assignments and timing
└── docs/ # Documentation
├── instructions.md # Build instructions
└── evaluation_results.md # Detailed results
cd public_release
vivado -mode batch -source scripts/build_vivado.tclThis builds the conventional BRAM-based implementation (fits on device).
vivado -mode batch -source scripts/build_hardwired.tclThis builds the constant-coefficient version (exceeds device capacity, synthesis-only).
# Gate count analysis
python3 sim/simulation.py
# Energy efficiency model
python3 sim/energy.py
# Economic security analysis
python3 sim/economic.py- Baseline: 11,309 LUTs (21% utilization) - Successfully implemented
- Hardwired: 170,502 LUTs (321% utilization) - Synthesis-only proof-of-concept
- LUT Reduction: 1.81× per MAC unit (measured)
- Gate Count Reduction: 4.85× (theoretical) / 1.62× (system-level conservative)
- Energy Efficiency: 50× (device) / 10-15× (system-level including host CPU)
- Die Area (28nm): 520 mm² (TinyLlama-1.1B), 3,680 mm² (Llama-2-7B)
- Manufacturing Cost: $52-165 at 100K+ volume (higher at low volume due to NRE)
All results in the paper can be reproduced with:
- A $200 FPGA development board
- Free Vivado WebPACK software
- Python with standard scientific libraries
Total cost to reproduce: ~$200
If you use this work, please cite:
@inproceedings{ita2025,
title={The Immutable Tensor Architecture: A Pure Dataflow Approach for Secure, Energy-Efficient AI Inference},
author={[Author Name]},
booktitle={[Conference Name]},
year={2025}
}This work is released under the MIT License. See LICENSE file for details.
This work was conducted without external funding at a teaching-focused institution. The FPGA prototype was implemented on a development board purchased by the author. We are grateful to the open-source community for tools that enable hardware architecture research without institutional resources.
For questions or collaboration opportunities, please contact: [your email]
We welcome contributions in the following areas:
- ASIC tape-out on 28nm/40nm nodes
- On-device attention acceleration
- Hybrid architectures (partial hardwiring)
- Alternative quantization schemes
- Security countermeasures for side-channel attacks
Note: This is a research prototype demonstrating architectural concepts. The hardwired version exceeds the FPGA capacity by design, validating that custom ASICs are required for practical deployment.