Skip to content

mmt-at/LEGO

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 

Repository files navigation

LEGO: An LLM-Enabled Hierarchical Optimizer for Tensor Computation Graphs

Artifact for reproducing Table 1 of the paper.

Structure

workloads/
  resnet/          ResNet-18 (CNN)
  qwen/            Qwen3-8B (LLM Decoder)
  sd3_mmdit/       SD3-MMDiT (Diffusion Transformer)
  mamba/           Mamba-2 SSD (State Space Model)
  ds_mhc_moe/      mHC-MoE (Mixture of Experts)

Each workload contains:
  model_ref.py     PyTorch Eager baseline
  model_new.py     LEGO-optimized (Triton kernels + system-level opts)

ab.py              A/B benchmark tool (eager + torch.compile modes)

Quick Start

Requirements

  • GPU: NVIDIA RTX 6000 Ada (48GB) or comparable
  • CUDA 12.9+, PyTorch 2.9+, Triton (bundled with PyTorch)
  • pip install einops

Benchmark a single workload

python ab.py \
  --ref workloads/qwen/model_ref.py \
  --test workloads/qwen/model_new.py \
  --bench-runs 30 --compile-run-times 1

Benchmark all workloads

for w in resnet qwen sd3_mmdit mamba ds_mhc_moe; do
  echo "=== $w ==="
  python ab.py \
    --ref workloads/$w/model_ref.py \
    --test workloads/$w/model_new.py \
    --bench-runs 30 --compile-run-times 1
done

License

Apache 2.0

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages