# DaCe IR Deep Dive: LibraryNodes and Operator Abstractions

This notebook demonstrates how DaCe's Intermediate Representation (IR) handles operator abstractions through LibraryNodes. We'll explore:

1. **LibraryNodes**: High-level operator abstractions in the IR
2. **Semantic Contracts**: How `@dace.library.node` defines compile-time "facts"
3. **Implementation Registry**: How concrete implementations are attached to abstract operators
4. **Expansion System**: How abstract operators become concrete code

## What are LibraryNodes?

LibraryNodes are DaCe's way of representing high-level operations (like GEMM, FFT, reductions) as single nodes in the SDFG. They serve as:
- **Abstractions**: Hide implementation complexity behind semantic interfaces
- **Contracts**: Define what the operation does, not how it's implemented
- **Extension Points**: Allow multiple implementations for different targets (CPU, GPU, FPGA)

In [14]:
import dace
import os
import dace.library
import dace.properties
import dace.sdfg.nodes
from dace.transformation.transformation import ExpandTransformation
from dace import SDFG, SDFGState, symbolic, dtypes
import numpy as np
from dace.config import Config
import ctypes.util, os,sys
print('Python:', sys.version.split()[0])
cp = os.environ.get('CONDA_PREFIX')
print(' CONDA prefix:', cp)
lib = os.path.join(cp, 'lib')

print('\nctypes:', ctypes.__version__)
print('ctypes.find_library...')
for name in ['blas','cblas','lapacke','lapack','openblas','mkl_rt']:
    print(f'  {name}:', ctypes.util.find_library(name))



print(f"\nDaCe version: {dace.__version__}")

current_conf = Config._config



Python: 3.12.11
 CONDA prefix: /opt/homebrew/Caskroom/miniforge/base/envs/dace-dev
Listing known BLAS/LAPACK libs under /opt/homebrew/Caskroom/miniforge/base/envs/dace-dev/lib
  libblas*: 2
   - libblas.3.dylib
   - libblas.dylib
  libcblas*: 2
   - libcblas.3.dylib
   - libcblas.dylib
  liblapack*: 2
   - liblapack.3.dylib
   - liblapack.dylib
  liblapacke*: 0
  libopenblas*: 7
   - libopenblas.0.dylib
   - libopenblas.a
   - libopenblas.dylib
   - libopenblas_armv8p-r0.3.30.dylib
   - libopenblas_vortexp-r0.3.30.a
   ...
  libmkl*: 0

ctypes: 1.1.0
ctypes.find_library...
  blas: /opt/homebrew/Caskroom/miniforge/base/envs/dace-dev/bin/../lib/libblas.dylib
  cblas: /opt/homebrew/Caskroom/miniforge/base/envs/dace-dev/bin/../lib/libcblas.dylib
  lapacke: None
  lapack: /opt/homebrew/Caskroom/miniforge/base/envs/dace-dev/bin/../lib/liblapack.dylib
  openblas: /opt/homebrew/Caskroom/miniforge/base/envs/dace-dev/bin/../lib/libopenblas.dylib
  mkl_rt: None

DaCe version: 1.0.0


In [17]:
current_conf

{'debugprint': 'verbose',
 'frontend': {'cache_size': 0,
  'implicit_recursion_depth': 64,
  'raise_nested_parsing_errors': False,
  'verbose_errors': False,
  'preprocessing_passes': 5,
  'dont_fuse_callbacks': False,
  'typed_callbacks_only': False,
  'unroll_threshold': -1,
  'check_args': False,
  'avoid_wcr': False},
 'compiler': {'build_type': 'Debug',
  'codegen_lineinfo': True,
  'extra_cmake_args': '-DBLA_VENDOR=OpenBLAS -DCMAKE_FIND_FRAMEWORK=LAST -DCMAKE_INCLUDE_PATH=${CONDA_PREFIX}/include -DCMAKE_LIBRARY_PATH=${CONDA_PREFIX}/lib',
  'cpu': {'executable': '',
   'args': '-std=c++14 -fPIC -Wall -Wextra -O3 -march=native -fno-strict-aliasing -Wno-unused-parameter -Wno-unused-label -I/opt/homebrew/Caskroom/miniforge/base/envs/dace-dev/include',
   'openmp_sections': False,
   'libs': ''},
  'linker': {'args': '', 'executable': ''},
  'cuda': {'default_block_size': '64,8,1',
   'backend': 'auto',
   'path': '',
   'args': '-Xcompiler -march=native --use_fast_math -Xcompiler -Wn

In [16]:
Config

dace.config.Config

In [15]:
print('Listing known BLAS/LAPACK libs under', lib)
import glob
for pat in ['libblas*','libcblas*','liblapack*','liblapacke*','libopenblas*','libmkl*']:
    matches = sorted(glob.glob(os.path.join(lib, pat)))
    print(f'  {pat}:', len(matches))
    for m in matches[:5]:
        print('   -', os.path.basename(m))
    if len(matches) > 5:
        print('   ...')

Listing known BLAS/LAPACK libs under /opt/homebrew/Caskroom/miniforge/base/envs/dace-dev/lib
  libblas*: 2
   - libblas.3.dylib
   - libblas.dylib
  libcblas*: 2
   - libcblas.3.dylib
   - libcblas.dylib
  liblapack*: 2
   - liblapack.3.dylib
   - liblapack.dylib
  liblapacke*: 0
  libopenblas*: 7
   - libopenblas.0.dylib
   - libopenblas.a
   - libopenblas.dylib
   - libopenblas_armv8p-r0.3.30.dylib
   - libopenblas_vortexp-r0.3.30.a
   ...
  libmkl*: 0


In [None]:
print('ctypes.find_library:')
for name in ['blas','cblas','lapacke','lapack','openblas','mkl_rt']:
    print(f'  {name}:', ctypes.util.find_library(name))
cp = os.environ.get('CONDA_PREFIX')


In [None]:

print(f"LibraryNode base class: {dace.sdfg.nodes.LibraryNode}")

In [2]:
current_conf.keys() 


dict_keys(['compiler', 'library', 'debugprint', 'optimizer', 'instrumentation', 'frontend', 'progress', 'cache', 'store_history', 'default_build_folder', 'profiling', 'profiling_status', 'treps', 'call_hooks', 'compiled_sdfg_call_hooks', 'external_transformations_path', 'experimental', 'testing'])

In [3]:
Config._config_metadata

{'type': 'dict',
 'title': 'General',
 'description': 'DaCe Preferences',
 'required': {'optimizer': {'type': 'dict',
   'title': 'Optimizer',
   'description': 'Preferences of the SDFG Optimizer',
   'required': {'autospecialize': {'type': 'bool',
     'default': False,
     'title': 'Auto-specialize symbols',
     'description': 'Automatically specialize every SDFG to the symbol values at call-time. Requires all symbols to be set.\n'},
    'autooptimize': {'type': 'bool',
     'default': False,
     'title': 'Run auto-optimization heuristics',
     'description': 'Automatically runs the set of optimizing transformation heuristics on any program called via the Python frontend.\n'},
    'autotile_size': {'type': 'int',
     'default': 128,
     'title': 'Default tile size in auto-optimization',
     'description': 'Sets the default tile size for the optimization heuristics.\n'},
    'autotile_partial_parallelism': {'type': 'bool',
     'default': True,
     'title': 'Prefer partial par

In [4]:

# Print compiler configuration
print("\n=== Compiler Configuration ===")
print(f"Build type: {Config.get('compiler', 'build_type')}")
print(f"CPU compiler: {Config.get('compiler', 'cpu', 'executable')}")
print(f"Default compiler flags: {Config.get('compiler', 'cpu', 'args')}")

# Print CUDA/GPU configuration if available
if Config.get('compiler', 'cuda', 'backend') is not None:
    print("\n=== GPU Configuration ===")
    print(f"GPU backend: {Config.get('compiler', 'cuda', 'backend')}")
    # print(f"CUDA compiler: {Config.get('compiler', 'cuda', 'executable')}")
    # print(f"Default CUDA flags: {Config.get('compiler', 'cuda', 'args')}")
    default_block_sizes = [
        int(b) for b in Config.get('compiler', 'cuda', 'default_block_size').split(',')
    ]

    default_block_ndim = max(1, sum(1 if b != 1 else 0 for b in default_block_sizes))
    print(f"Default block sizes: {default_block_sizes} -> block_ndim = {default_block_ndim}")
    max_streams = int(Config.get('compiler', 'cuda', 'max_concurrent_streams'))
    print(f"Max concurrent streams: {max_streams}")


=== Compiler Configuration ===
Build type: RelWithDebInfo
CPU compiler: /opt/homebrew/opt/llvm/bin/clang++
Default compiler flags: -std=c++14 -fPIC -Wall -Wextra -O3 -march=native -ffast-math -Wno-unused-parameter -Wno-unused-label

=== GPU Configuration ===
GPU backend: auto
Default block sizes: [64, 8, 1] -> block_ndim = 2
Max concurrent streams: 0


In [5]:
Config.get('compiler', 'cuda', 'dynamic_map_block_size')
block_size = [
        int(b) for b in Config.get('compiler', 'cuda', 'dynamic_map_block_size').split(',')
    ]
print(f"block_size: {block_size}")
block_ndim = max(1, sum(1 if b != 1 else 0 for b in block_size))
    # grid_ndim = max(1, sum(1 if g != 1 else 0 for g in grid_size))
print(f"block_ndim: {block_ndim}")

block_size: [128, 1, 1]
block_ndim: 1


In [6]:

print('Synchronous debugging enabled:', Config.get_bool('compiler', 'cuda', 'syncdebug'))

Config.set('frontend', 'unroll_threshold', value=11)
Config.set('debugprint', value=True)

Synchronous debugging enabled: False


In [7]:
# Print debugging configuration
print("\n=== Debug Configuration ===")
print(f"Debug print enabled: {dace.config.Config.get_bool('debugprint')}")
print(f"Cache enabled: {dace.config.Config.get_bool('compiler', 'use_cache')}")
print(f"Frontend verbose errors: {dace.config.Config.get_bool('frontend', 'verbose_errors')}")



=== Debug Configuration ===
Debug print enabled: True
Cache enabled: False
Frontend verbose errors: False


In [8]:
config_data_types = Config.get('compiler', 'default_data_types')
print(f" {config_data_types.lower()}")

 python


In [9]:
# Examples: canonicalization via create_datadescriptor
import dace, numpy as np
from dace import data, dtypes, is_array, is_gpu_array
import textwrap

def h(title):
    print("\n" + "=" * 72)
    print(f" {title}")
    print("=" * 72)

def kv(name, val):
    print(f" - {name:<22}: {val}")

def show_typeclass(tc, label=None):
    label = label or str(tc)
    try:
        npdtype = tc.as_numpy_dtype()
    except Exception:
        npdtype = "-"
    try:
        ctypes = tc.as_ctypes()
    except Exception:
        ctypes = "-"
    print(f"{label:<22}  numpy: {npdtype!s:<12}  ctypes: {ctypes!s}")
def show_interfaces(x):
    print(f"\nPackage: {type(x).__module__}.{type(x).__name__}")
    print(f"dace is_array: {dace.dtypes.is_array(x)}")
    print("has __array_interface__:", hasattr(x, "__array_interface__"))
    print("has __cuda_array_interface__:", hasattr(x, "__cuda_array_interface__"))
    print("has __dlpack__:", hasattr(x, "__dlpack__") or hasattr(x, "__torch_dlpack__"))
def show_desc(obj):
    print(f"\nPackage: {type(obj).__module__}.{type(obj).__name__}, is_array={dace.dtypes.is_array(obj)}")
    try:
        desc = dace.data.create_datadescriptor(obj)
    except Exception as e:
        print(f"Could not create descriptor for {type(obj).__name__}: {e}")
        return
    
    # print(f"Object: {type(obj).__name__}, is_array={dace.dtypes.is_array(obj)}")
    print(f"  Descriptor type: {type(desc).__name__}, dtype={desc.dtype}, shape={getattr(desc,'shape',None)}")


def has_cpu_array_iface(x):  # zero-copy on CPU?
    return hasattr(x, "__array_interface__")

def has_cuda_array_iface(x):  # zero-copy on GPU?
    return hasattr(x, "__cuda_array_interface__")
def dtype_info(obj):
 
    print(f"\n{obj.to_string()}")
    print(f" - type: {obj.type}")
    print(f" - dtype: {obj.dtype}")
    print(f" - ctype: {obj.ctype}")
    print(f" - ctype unaligned: {obj.ctype_unaligned}")
    print(f" - ocl type: {obj.ocltype}")
    # print(f" - numpy: {obj.as_numpy_dtype()}")
    print(f" - veclen: {obj.veclen}")
    print(f" - bytes: {obj.bytes}")
    # print(f".- as_arg: {obj.as_arg()}")

In [10]:
class TypeclassAttributeError(AttributeError):
    def __init__(self, obj, attr, original_exc):
        public_attrs = [a for a in dir(obj) if not a.startswith('_')]
        msg = (
            f"Failed to access attribute '{attr}' on object {obj!r}: {original_exc}\n"
            f"Available public attributes ({len(public_attrs)}): {', '.join(public_attrs)}"
        )
        super().__init__(msg)
        self.obj = obj
        self.attr = attr
        self.available = public_attrs
        self.original_exc = original_exc



def collect_typeclass_rows(typeclasses):
    """
    Return list of dicts describing each DaCe typeclass.
    If accessing any requested attribute fails, raise TypeclassAttributeError
    that lists the object's available public attributes.
    """
    rows = []
    for dt in typeclasses:
        # Access each field via guarded helpers (so any failure produces rich debug info)
        def _safe_get(attr, default=None):
            
            try:
                return getattr(dt, attr)
            except Exception as e:
                if default is not None:
                    return default
                raise TypeclassAttributeError(dt, attr, e) from e
        row = {'obj': dt}
        # Simple attributes
        
        for name in ["type", "dtype","base_type", "veclen", "bytes",
                     "ctype", "ctype_unaligned"]:
            row[name] = _safe_get(name)
        row['ocltype'] = _safe_get('ocltype', '-')
        row['vtype'] = _safe_get('vtype', '-')
        
        try:
            
            row['ctypes'] = dt.as_ctypes()
        except Exception:
            row['ctypes'] = '-'
            
        for name in [ "ocltype"]:
            try:
                row[name] = _safe_get(dt, name)
            except TypeclassAttributeError:
                row[name] = "-"
        # Derived / method-based
        try:
            row["numpy"] = dt.as_numpy_dtype()
        except Exception as e:
            row["numpy"] = '-'
            # print(dt.as_numpy_dtype())
            # raise TypeclassAttributeError(dt, "as_numpy_dtype()", e) from e
        
        try:
            row["json"] = dt.to_json()
        except Exception as e:
            raise TypeclassAttributeError(dt, "to_json()", e) from e
        # try:
        #     row["to_string"] = dt.to_string()
        # except Exception as e:
        #     raise TypeclassAttributeError(dt, "to_string()", e) from e
        
        # try:
        #     row["data_desc"] = dt.create_datadescriptor()
        # except Exception as e:
        #     raise TypeclassAttributeError(dt, "as_arg()", e) from e
        try:
            row["vtype"] = dt.vtype
        except Exception as e:
            row['vtype'] = '-'
        rows.append(row)
    return rows

def print_typeclass_rows(rows, *columns):
    """
    Print a table of selected columns (in given order).
    Args:
        rows: list[dict] from collect_typeclass_rows
        *columns: column names; if empty, all columns in row order.
    """
    if not rows:
        print("(no rows)")
        return
    if not columns:
        columns = tuple(rows[0].keys())
    missing = [c for c in columns if c not in rows[0]]
    if missing:
        raise ValueError(f"Unknown column(s): {missing}")
    widths = {c: max(len(c), *(len(str(r[c])) for r in rows)) for c in columns}
    fmt = "  ".join(f"{{:{widths[c]}s}}" for c in columns)
    print(fmt.format(*columns))
    print("  ".join("-" * widths[c] for c in columns))
    for r in rows:
        print(fmt.format(*(str(r[c]) for c in columns)))

In [11]:
def get_typeclass_info_row(tc, print_info=False, columns=["obj","base_type", "json","ctypes", "ctype", "numpy","bytes"]):
    """
    Return a dict describing a single DaCe typeclass `tc`.

    Args:
        tc: a DaCe typeclass instance
        print_info (bool): if True, print the row nicely
        columns (list[str] | None): list of column names to print (when print_info=True).
                                    If None, print all available columns in row order.

    Returns:
        dict: mapping column name -> value
    """
    def _safe(attr, default='-'):
        try:
            return getattr(tc, attr)
        except Exception:
            return default

    row = {'obj': tc}
    for name in ["type", "dtype", "base_type", "veclen", "bytes", "ctype", "ctype_unaligned"]:
        row[name] = _safe(name)

    # optional / method-based attributes
    row['ocltype'] = _safe('ocltype', '-')
    row['vtype'] = _safe('vtype', '-')
    
    try:
        row['ctypes'] = tc.as_ctypes()
    except Exception:
        row['ctypes'] = '-'
    try:
        row['numpy'] = tc.as_numpy_dtype()
    except Exception:
        row['numpy'] = '-'
    
    try:
        row['json'] = tc.to_json()
    except Exception:
        row['json'] = '-'
  
    if print_info:
        if columns is None:
            columns_to_print = list(row.keys())
        else:
            # validate requested columns
            missing = [c for c in columns if c not in row]
            if missing:
                raise ValueError(f"Unknown column(s): {missing}")
            columns_to_print = list(columns)

        widths = {c: max(len(str(c)), len(str(row[c]))) for c in columns_to_print}
        fmt = "  ".join(f"{{:{widths[c]}s}}" for c in columns_to_print)
        print(fmt.format(*columns_to_print))
        print("  ".join("-" * widths[c] for c in columns_to_print))
        print(fmt.format(*(str(row[c]) for c in columns_to_print)))

    return row
# ...existing code...

# Part 1: Primitive types and platform enums (`dace.dtypes`)


### Practical tips & next steps

- Use `dace.dtypes` when declaring element types in annotations or when you need a type-level object you can serialize or inspect. Use `dace.data` descriptors when you need to describe memory layout or add arrays to an SDFG.

- To integrate new external array-like types with DaCe, implement `__descriptor__()` on the wrapper type and return a `dace.data.Array`/`Scalar` as appropriate. This is the most robust integration point.

- For runtime testing, `make_array_from_descriptor()` creates a NumPy/CuPy buffer matching a descriptor—useful for unit tests that validate codegen IO shapes.

Next steps you might want me to do:
- Add a short example that declares a `@dace.program` function that accepts `dace.float32[M,N]` and demonstrate how the runtime buffer maps to the descriptor.
- Add an example showing `dace.vector(...)` in an SDFG or how `dtype_to_typeclass()` maps external dtype objects into DaCe typeclasses.


In [12]:
registered_typeclass_objs = dtypes.dtype_to_typeclass()
dtype_keys = list(registered_typeclass_objs.keys())
registered_dace_dtypes = list(registered_typeclass_objs.values())
# print(registered_dace_dtypes)
print(f"default accepted dtype: {dtype_keys}")
dtype_keys

default accepted dtype: [<class 'bool'>, <class 'int'>, <class 'float'>, <class 'complex'>, <class 'numpy.bool_'>, <class 'numpy.int8'>, <class 'numpy.int16'>, <class 'numpy.int32'>, <class 'numpy.int64'>, <class 'numpy.uint8'>, <class 'numpy.uint16'>, <class 'numpy.uint32'>, <class 'numpy.uint64'>, <class 'numpy.float16'>, <class 'numpy.float32'>, <class 'numpy.float64'>, <class 'numpy.complex64'>, <class 'numpy.complex128'>, <class 'numpy.longlong'>, <class 'numpy.ulonglong'>]


[bool,
 int,
 float,
 complex,
 numpy.bool_,
 numpy.int8,
 numpy.int16,
 numpy.int32,
 numpy.int64,
 numpy.uint8,
 numpy.uint16,
 numpy.uint32,
 numpy.uint64,
 numpy.float16,
 numpy.float32,
 numpy.float64,
 numpy.complex64,
 numpy.complex128,
 numpy.longlong,
 numpy.ulonglong]

In [13]:

for key,val in registered_typeclass_objs.items():
    print(f"{key.__module__}.{key.__name__} ->  {val} (src={val.__module__}.{val})")
    print(f"{key}")

    tc = dace.dtypes.dtype_to_typeclass(key)
    # get_typeclass_info_row(tc, print_info=True)

print_typeclass_rows(collect_typeclass_rows(registered_dace_dtypes), "obj","base_type", "json","ocltype", "ctype","ctypes", "numpy","bytes",)

builtins.bool ->  bool (src=dace.dtypes.bool)
<class 'bool'>
builtins.int ->  int64_t (src=dace.dtypes.int64_t)
<class 'int'>
builtins.float ->  double (src=dace.dtypes.double)
<class 'float'>
builtins.complex ->  dace::complex128 (src=dace.dtypes.dace::complex128)
<class 'complex'>
numpy.bool_ ->  bool (src=dace.dtypes.bool)
<class 'numpy.bool_'>
numpy.int8 ->  char (src=dace.dtypes.char)
<class 'numpy.int8'>
numpy.int16 ->  short (src=dace.dtypes.short)
<class 'numpy.int16'>
numpy.int32 ->  int (src=dace.dtypes.int)
<class 'numpy.int32'>
numpy.int64 ->  int64_t (src=dace.dtypes.int64_t)
<class 'numpy.int64'>
numpy.uint8 ->  uint8_t (src=dace.dtypes.uint8_t)
<class 'numpy.uint8'>
numpy.uint16 ->  uint16_t (src=dace.dtypes.uint16_t)
<class 'numpy.uint16'>
numpy.uint32 ->  dace::uint (src=dace.dtypes.dace::uint)
<class 'numpy.uint32'>
numpy.uint64 ->  uint64_t (src=dace.dtypes.uint64_t)
<class 'numpy.uint64'>
numpy.float16 ->  dace::float16 (src=dace.dtypes.dace::float16)
<class 'numpy.

Python types → DaCe types



print_typeclass_rows(collect_typeclass_rows(registered_dace_dtypes), "obj","base_type", "json","ocltype", "ctype", "numpy","bytes",)


        

h("Primitive dtypes")

Python scalars and numpy scalars are automatically converted to DaCe Scalar descriptors 


print('\nScalar canonicalization:')
print('  create_datadescriptor(7) ->', type(data.create_datadescriptor(7)).__name__)
print('  create_datadescriptor(np.int32(5)) ->', type(data.create_datadescriptor(np.int32(5))).__name__)



In [70]:
for k,v in registered_typeclass_objs.items():
    print(f"{k}: {v}")

<class 'bool'>: bool
<class 'int'>: int64_t
<class 'float'>: double
<class 'complex'>: dace::complex128
<class 'numpy.bool'>: bool
<class 'numpy.int8'>: char
<class 'numpy.int16'>: short
<class 'numpy.int32'>: int
<class 'numpy.int64'>: int64_t
<class 'numpy.uint8'>: uint8_t
<class 'numpy.uint16'>: uint16_t
<class 'numpy.uint32'>: dace::uint
<class 'numpy.uint64'>: uint64_t
<class 'numpy.float16'>: dace::float16
<class 'numpy.float32'>: float
<class 'numpy.float64'>: double
<class 'numpy.complex64'>: dace::complex64
<class 'numpy.complex128'>: dace::complex128
<class 'numpy.longlong'>: int64_t
<class 'numpy.ulonglong'>: uint64_t


In [None]:
def debug_builtin_dtype_info(dt):
    print(f"\n {dt.__module__}.{dt}: {dt} ({type(dt).__name__})")
    try:
        row = get_typeclass_info_row(dt, print_info=True)
    except TypeclassAttributeError as e:
        print(f"Error accessing typeclass attributes: {e}")
        return
    except Exception as e:
        print(f"Unexpected error: {e}")
        return
    return row
builtin_python_scalar_dtypes = [int, float, complex, bool,str, bytes]
# builtin_



h("Built-in types")



 Built-in types


TypeclassAttributeError: Failed to access attribute 'type' on object <class 'int'>: type object 'int' has no attribute 'type'
Available public attributes (11): as_integer_ratio, bit_count, bit_length, conjugate, denominator, from_bytes, imag, is_integer, numerator, real, to_bytes

In [None]:

_pytypes = collect_typeclass_rows(builtin_python_scalar_dtypes)

print_typeclass_rows(_pytypes, "obj", "label", "to_string", "ocltype", "ctype", "ctypes")


default register dtype keys:


[bool,
 int,
 float,
 complex,
 numpy.bool,
 numpy.int8,
 numpy.int16,
 numpy.int32,
 numpy.int64,
 numpy.uint8,
 numpy.uint16,
 numpy.uint32,
 numpy.uint64,
 numpy.float16,
 numpy.float32,
 numpy.float64,
 numpy.complex64,
 numpy.complex128,
 numpy.longlong,
 numpy.ulonglong]

Python types → DaCe types

In [61]:
for key,val in registered_typeclass_objs.items():
    print(f"{key.__module__}.{key.__name__} ->  {val} ({val.__module__}.{val})")

    tc = dace.dtypes.dtype_to_typeclass(key)
    # get_typeclass_info_row(tc, print_info=True)

print_typeclass_rows(collect_typeclass_rows(registered_dace_dtypes), "obj","base_type", "json","ocltype", "ctype", "numpy","bytes",)


builtins.bool ->  bool (dace.dtypes.bool)
builtins.int ->  int64_t (dace.dtypes.int64_t)
builtins.float ->  double (dace.dtypes.double)
builtins.complex ->  dace::complex128 (dace.dtypes.dace::complex128)
numpy.bool ->  bool (dace.dtypes.bool)
numpy.int8 ->  char (dace.dtypes.char)
numpy.int16 ->  short (dace.dtypes.short)
numpy.int32 ->  int (dace.dtypes.int)
numpy.int64 ->  int64_t (dace.dtypes.int64_t)
numpy.uint8 ->  uint8_t (dace.dtypes.uint8_t)
numpy.uint16 ->  uint16_t (dace.dtypes.uint16_t)
numpy.uint32 ->  dace::uint (dace.dtypes.dace::uint)
numpy.uint64 ->  uint64_t (dace.dtypes.uint64_t)
numpy.float16 ->  dace::float16 (dace.dtypes.dace::float16)
numpy.float32 ->  float (dace.dtypes.float)
numpy.float64 ->  double (dace.dtypes.double)
numpy.complex64 ->  dace::complex64 (dace.dtypes.dace::complex64)
numpy.complex128 ->  dace::complex128 (dace.dtypes.dace::complex128)
numpy.longlong ->  int64_t (dace.dtypes.int64_t)
numpy.ulonglong ->  uint64_t (dace.dtypes.uint64_t)
obj     

In [31]:

print_typeclass_rows(collect_typeclass_rows(registered_dace_dtypes), "obj","base_type", "json","ocltype", "ctype", "numpy","bytes",)


obj               base_type         json        ocltype         ctype             numpy       bytes
----------------  ----------------  ----------  --------------  ----------------  ----------  -----
bool              bool              bool        bool            bool              bool        1    
int64_t           int64_t           int64       long            int64_t           int64       8    
double            double            float64     double          double            float64     8    
dace::complex128  dace::complex128  complex128  complex double  dace::complex128  complex128  16   
bool              bool              bool        bool            bool              bool        1    
char              char              int8        char            char              int8        1    
short             short             int16       short           short             int16       2    
int               int               int32       int             int               int32       4    


In [71]:

        
# Primitive typeclasses table with selectable columns helper

primitive_types = [
    dtypes.int8,dtypes.int16,dtypes.int32,dtypes.int64, 
    dtypes.uintp, dtypes.uint32,
    dtypes.float16,dtypes.float32, dtypes.float64,
    dtypes.complex64, dtypes.complex128, dtypes.string
]



h("Primitive dtypes")

_all_rows = collect_typeclass_rows(primitive_types)

# Example usages:
# print("Full set (default):")
# print_typeclass_rows(_all_rows)  # all columns

# print("\nSelected columns (label, bytes, veclen, numpy, ctype):")
# print_typeclass_rows(_all_rows, "label", "bytes", "veclen", "numpy", "ctype")

print_typeclass_rows(_all_rows, "obj","base_type", "json","ocltype", "ctype", "ctypes","bytes",)




 Primitive dtypes


ValueError: Unknown column(s): ['ctypes']

In [None]:
h("Primitive dtypes")


Python scalars and numpy scalars are automatically converted to DaCe Scalar descriptors 


In [1]:

print('\nScalar canonicalization:')
print('  create_datadescriptor(7) ->', type(data.create_datadescriptor(7)).__name__)
print('  create_datadescriptor(np.int32(5)) ->', type(data.create_datadescriptor(np.int32(5))).__name__)



Scalar canonicalization:


NameError: name 'data' is not defined


 Built-in types


TypeclassAttributeError: Failed to access attribute 'type' on object <class 'int'>: type object 'int' has no attribute 'type'
Available public attributes (11): as_integer_ratio, bit_count, bit_length, conjugate, denominator, from_bytes, imag, is_integer, numerator, real, to_bytes

In [63]:
struc =  dtypes.struct('Pair', a=dtypes.int32, b=dtypes.float64)
struc.as_numpy_dtype()

dtype([('a', '<i4'), ('b', '<f8')], align=True)

In [147]:
vector_types = [
    dace.vector(dace.int32, 4),
    dace.vector(dace.float64, 3),
    
   ]
struc_types = [dtypes.struct('Pair', a=dtypes.int32, b=dtypes.float64),
               ]
pointer_types = [dtypes.pointer(dace.int32),
                 dtypes.pointer(dace.float64),
                ]
h("vector typeclasses")
vector_rows = collect_typeclass_rows(vector_types)
print_typeclass_rows(vector_rows, "obj", "type", "json", "ocltype", "veclen","bytes")


 vector typeclasses
obj                   type                     json                                                     ocltype  veclen  bytes
--------------------  -----------------------  -------------------------------------------------------  -------  ------  -----
dace::vec<int, 4>     <class 'numpy.int32'>    {'type': 'vector', 'dtype': 'int32', 'elements': '4'}    int4     4       16   
dace::vec<double, 3>  <class 'numpy.float64'>  {'type': 'vector', 'dtype': 'float64', 'elements': '3'}  double3  3       24   


In [124]:
vec = dace.vector(dace.int32, 4)
vec = dace.vector(dace.float64, 3)
print(vec.as_numpy_dtype())
print(vec.ocltype)
vec.to_json()

('<f8', (3,))
double3


{'type': 'vector', 'dtype': 'float64', 'elements': '3'}

In [None]:
import torch
x = torch.tensor([1])
assert x.byte().dtype ==  torch.uint8

assert x.bool().dtype == torch.bool
assert x.char().dtype == torch.int8
assert x.double().dtype == torch.float64
assert x.float().dtype == torch.float32
assert x.half().dtype == torch.float16
assert x.int().dtype == torch.int32
assert x.bfloat16().dtype == torch.bfloat16

cfloat = x.cfloat()
assert cfloat.dtype == torch.complex64
assert cfloat.real == x.float()
assert cfloat.imag ==  torch.zeros_like(cfloat.imag)

In [201]:
dir(cfloat)

['H',
 'T',
 '__abs__',
 '__add__',
 '__and__',
 '__annotations__',
 '__array__',
 '__array_priority__',
 '__array_wrap__',
 '__bool__',
 '__class__',
 '__complex__',
 '__contains__',
 '__deepcopy__',
 '__delattr__',
 '__delitem__',
 '__dict__',
 '__dir__',
 '__div__',
 '__dlpack__',
 '__dlpack_device__',
 '__doc__',
 '__eq__',
 '__firstlineno__',
 '__float__',
 '__floordiv__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__getstate__',
 '__gt__',
 '__hash__',
 '__iadd__',
 '__iand__',
 '__idiv__',
 '__ifloordiv__',
 '__ilshift__',
 '__imod__',
 '__imul__',
 '__index__',
 '__init__',
 '__init_subclass__',
 '__int__',
 '__invert__',
 '__ior__',
 '__ipow__',
 '__irshift__',
 '__isub__',
 '__iter__',
 '__itruediv__',
 '__ixor__',
 '__le__',
 '__len__',
 '__long__',
 '__lshift__',
 '__lt__',
 '__matmul__',
 '__mod__',
 '__module__',
 '__mul__',
 '__ne__',
 '__neg__',
 '__new__',
 '__nonzero__',
 '__or__',
 '__pos__',
 '__pow__',
 '__radd__',
 '__rand__',
 '__rdiv__',
 '

In [202]:
print(f" cfloat: {cfloat}")
print(f" cfloat.where: {cfloat.where()}")
print(f"type(cfloat.imag): {type(cfloat.imag)}")

 cfloat: tensor([1.+0.j])


TypeError: where() received an invalid combination of arguments - got (), but expected one of:
 * (Tensor condition, Tensor other)
 * (Tensor condition, Number other)


In [175]:
print(f" cfloat.imag.va: {cfloat.imag.var()}")

 cfloat.imag.va: nan


  print(f" cfloat.imag.va: {cfloat.imag.var()}")


In [180]:


print(f" cfloat.imag: {cfloat.imag.untyped_storage()}")
storage = cfloat.imag.untyped_storage()
print(dir(storage))

 cfloat.imag:  0
 0
 128
 63
 0
 0
 0
 0
[torch.storage.UntypedStorage(device=cpu) of size 8]
['__annotations__', '__class__', '__copy__', '__deepcopy__', '__delattr__', '__delitem__', '__dict__', '__dir__', '__doc__', '__eq__', '__firstlineno__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getstate__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__iter__', '__le__', '__len__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__setitem__', '__sizeof__', '__static_attributes__', '__str__', '__subclasshook__', '__weakref__', '_byteswap', '_cdata', '_checkpoint_offset', '_expired', '_fake_device', '_fix_weakref', '_free_weak_ref', '_get_filename', '_get_shared_fd', '_new_shared', '_new_shared_cuda', '_new_shared_fd_cpu', '_new_shared_filename_cpu', '_new_using_fd_cpu', '_new_using_filename_cpu', '_new_with_file', '_new_with_weak_ptr', '_release_ipc_counter', '_release_ipc_counter_cuda', '_set_cdata', '_se

In [191]:
mps_device = torch.device("mps")
cpu_device = torch.device("cpu")
print(f"MPS device: {mps_device}, CPU device: {cpu_device}")

MPS device: mps, CPU device: cpu


In [205]:
torch.backends


<module 'torch.backends' from '/Users/sophieblock/miniforge3/envs/dace/lib/python3.13/site-packages/torch/backends/__init__.py'>

In [203]:
_MPS = torch.backends.mps.is_available()
_MKL = torch.backends.mkl.is_available()
_ACL = torch.backends.mkldnn.is_available() and torch.ops.mkldnn._is_mkldnn_acl_supported()
_XPU = torch.xpu.is_available()

_HPU = True if (hasattr(torch, "hpu") and torch.hpu.is_available()) else False
_CUDA = torch.cuda.is_available()
print(f"_MPS: {_MPS}, \n_MKL: {_MKL}, \n_ACL: {_ACL}, \n_XPU: {_XPU}, \n_HPU: {_HPU}, \n_CUDA: {_CUDA}")

_MPS: True, 
_MKL: False, 
_ACL: False, 
_XPU: False, 
_HPU: False, 
_CUDA: False


In [None]:
bs = (1,)
Mdim = (192,)
Kdim = (196,)
Bdim = (84,)


In [186]:
storage.cuda

cuda = torch.device('cuda')     # Default CUDA device
cuda0 = torch.device('cuda:0')
cuda2 = torch.device('cuda:2')  # GPU 2 (these are 0-indexed)

x = torch.tensor([1., 2.], device=cuda0)
# x.device is device(type='cuda', index=0)
y = torch.tensor([1., 2.]).cuda()
# y.device is device(type='cuda', index=0)

AssertionError: Torch not compiled with CUDA enabled

In [184]:
storage.device

device(type='cpu')

In [173]:
dir(cfloat.imag)

['H',
 'T',
 '__abs__',
 '__add__',
 '__and__',
 '__annotations__',
 '__array__',
 '__array_priority__',
 '__array_wrap__',
 '__bool__',
 '__class__',
 '__complex__',
 '__contains__',
 '__deepcopy__',
 '__delattr__',
 '__delitem__',
 '__dict__',
 '__dir__',
 '__div__',
 '__dlpack__',
 '__dlpack_device__',
 '__doc__',
 '__eq__',
 '__firstlineno__',
 '__float__',
 '__floordiv__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__getstate__',
 '__gt__',
 '__hash__',
 '__iadd__',
 '__iand__',
 '__idiv__',
 '__ifloordiv__',
 '__ilshift__',
 '__imod__',
 '__imul__',
 '__index__',
 '__init__',
 '__init_subclass__',
 '__int__',
 '__invert__',
 '__ior__',
 '__ipow__',
 '__irshift__',
 '__isub__',
 '__iter__',
 '__itruediv__',
 '__ixor__',
 '__le__',
 '__len__',
 '__long__',
 '__lshift__',
 '__lt__',
 '__matmul__',
 '__mod__',
 '__module__',
 '__mul__',
 '__ne__',
 '__neg__',
 '__new__',
 '__nonzero__',
 '__or__',
 '__pos__',
 '__pow__',
 '__radd__',
 '__rand__',
 '__rdiv__',
 '

In [None]:
        
builtin_dtypes = [int, float, complex, bool,str, bytes,list,tuple, set, dict]



h("Built-in types")

_pytypes = collect_typeclass_rows(builtin_dtypes)

print_typeclass_rows(_pytypes, "type", "label", "to_string", "ocltype", "ctype", "bytes")

In [135]:
struc =  dtypes.struct('Pair', a=dtypes.int32, b=dtypes.float64)
struc.as_numpy_dtype()

dtype([('a', '<i4'), ('b', '<f8')], align=True)

In [161]:
p = dtypes.pointer(dtypes.float32)
print(p.as_ctypes())
# p.as_numpy_dtype()

<class 'dace.dtypes.LP_c_float'>


In [160]:
isinstance(p.as_ctypes(), dace.dtypes.LP_c_float)

AttributeError: module 'dace.dtypes' has no attribute 'LP_c_float'

In [136]:
struc.to_json()

{'type': 'struct',
 'name': 'Pair',
 'data': [('a', 'int32'), ('b', 'float64')],
 'length': [],
 'bytes': 12}

In [165]:
import numba.cuda
import numba
print(numba.cuda.gpus)
a = np.arange(8, dtype=np.float32)
da = numba.cuda.to_device(a) 


CudaSupportError: Error at driver init: 

CUDA driver library cannot be found.
If you are sure that a CUDA driver is installed,
try setting environment variable NUMBA_CUDA_DRIVER
with the file path of the CUDA driver shared library.
:

In [14]:

h("create_datadescriptor (canonicalization examples)")
examples = [
    ("Python int 7", 7),
    ("numpy int32(5)", np.int32(5)),
    ("dtypes.int32", dtypes.int32),
    ("None", None),
]
for name, ex in examples:
    try:
        desc = data.create_datadescriptor(ex)
        kv(name, f"{type(desc).__name__} (dtype={getattr(desc,'dtype',None)}, shape={getattr(desc,'shape',None)})")
    except Exception as e:
        kv(name, f"ERROR: {e}")
        
        
h("Declarative type-spec syntax")
spec = dace.float32[4, 8]
kv("spec expression", "dace.float32[4,8]")
kv("result repr", spec)

print("\nNote: composite types and descriptors are different concepts — typeclasses describe element semantics; descriptors describe containers (shape, strides, storage).")


 create_datadescriptor (canonicalization examples)
 - Python int 7          : Scalar (dtype=int64_t, shape=(1,))
 - numpy int32(5)        : Scalar (dtype=int, shape=(1,))
 - dtypes.int32          : Scalar (dtype=int, shape=(1,))
 - None                  : Scalar (dtype=void, shape=(1,))

 Declarative type-spec syntax
 - spec expression       : dace.float32[4,8]
 - result repr           : Array (dtype=float, shape=(4, 8))

Note: composite types and descriptors are different concepts — typeclasses describe element semantics; descriptors describe containers (shape, strides, storage).


In [31]:
# Live example: typeclass -> descriptor -> runtime array
import dace
import numpy as np
from dace import dtypes, data

print('Typeclass example and bridges:')
print('  dace.float32 ->', dtypes.float32)
print('  as_numpy_dtype:', dtypes.float32.as_numpy_dtype())
print('  as_ctypes:', dtypes.float32.as_ctypes())

# Create a container descriptor explicitly
desc = data.Array(dtype=dtypes.float32, shape=[2, 3], strides=None)
print('\nCreated descriptor:', desc)

# Make a runtime NumPy array from the descriptor (helper)
arr = data.make_array_from_descriptor(desc)
print('Runtime array shape/dtype:', arr.shape, arr.dtype)
arr[:] = np.arange(arr.size).reshape(arr.shape)
print('Runtime array contents:\n', arr)

# Small SDFG example that uses a descriptor via sdfg.add_array
sdfg = dace.SDFG('example_part1')
sdfg.add_array('A', [2,3], dtypes.float32)
print('\nSDFG arrays:')
for name, a in sdfg.arrays.items():
    print(' ', name, '->', a)

# Show that the SDFG array descriptor references a typeclass
print('\nArray dtype in SDFG:', sdfg.arrays['A'].dtype)


Typeclass example and bridges:
  dace.float32 -> float
  as_numpy_dtype: float32
  as_ctypes: <class 'ctypes.c_float'>

Created descriptor: Array (dtype=float, shape=(2, 3))
Runtime array shape/dtype: (2, 3) float32
Runtime array contents:
 [[0. 1. 2.]
 [3. 4. 5.]]

SDFG arrays:
  A -> Array (dtype=float, shape=(2, 3))

Array dtype in SDFG: float


In [None]:
import tensorflow as tf, dace, numpy as np

# From tf.DType:
tc = dace.dtypes.typeclass(tf.float32.as_numpy_dtype)

# From a tensor:
np_dtype = tf.constant(0, dtype=tf.float32).dtype.as_numpy_dtype
tc = dace.dtypes.typeclass(np_dtype)

### Part 1 — Types deep dive (expanded)

This section expands the quick overview with concrete responsibilities, small mental models, and pointers to the code that implements the behavior.

Key responsibilities (short):

- Element types (`dace.dtypes.typeclass`) wrap NumPy dtypes so they can be used declaratively and mapped to backends. They expose helpers such as `as_numpy_dtype()` and `as_ctypes()` for interop.
- Composite types (vectors, pointers, structs) are built on the same `typeclass` API but are *not* NumPy dtypes; they let DaCe express vector types and pointers at compile time.
- Container descriptors (`dace.data.Array`, `dace.data.Scalar`, `dace.data.Structure`, ...) describe memory containers and are used by the SDFG IR. They carry shape, strides, storage, lifetime, and dtype (a `typeclass`).

Where to look in the codebase:

- `dace/dace/dtypes.py` — the `typeclass`, `vector`, `pointer`, `struct`, `dtype_to_typeclass`, `is_array` helpers, and platform enums like `StorageType`/`ScheduleType`.
- `dace/dace/data.py` — `create_datadescriptor()`, data descriptor classes (`Array`, `Scalar`), and `make_array_from_descriptor()` / `make_reference_from_descriptor()` helpers.

Why this separation matters:

- Typeclasses define the element semantics and codegen mapping (how to spell the type in C/CUDA/OpenCL), while descriptors tell the compiler where the memory lives and how it is laid out. This separation enables aggressive data-centric optimizations without conflating runtime buffers with IR descriptors.


In [33]:
# Typeclass examples and composite types
import dace
from dace import dtypes

print('Primitive typeclasses:')
print('  dace.float32 ->', dace.float32)
print('  numpy bridge:', dace.float32.as_numpy_dtype())
print('  ctypes bridge:', dace.float32.as_ctypes())


print("7: ", data.create_datadescriptor(7))                 
print("(np.int32(5): ", data.create_datadescriptor(np.int32(5)))       # Scalar(dtype=dace.int32)
print("dtypes.int32: ", data.create_datadescriptor(dtypes.int32))      # Scalar(dtype=dace.int32)  (typeclass -> Scalar)
print(data.create_datadescriptor(None))              # Scalar(pointer(void))  (represents void* / NoneType)
# Declaration-style usage
spec = dace.float32[4, 8]
print('\nDeclarative spec example: dane.float32[4,8] ->', spec)


Primitive typeclasses:
  dace.float32 -> float
  numpy bridge: float32
  ctypes bridge: <class 'ctypes.c_float'>
7:  Scalar (dtype=int64_t)
(np.int32(5):  Scalar (dtype=int)
dtypes.int32:  Scalar (dtype=int)
Scalar (dtype=void)

Declarative spec example: dane.float32[4,8] -> Array (dtype=float, shape=(4, 8))


In [None]:

# Composite types
vec4 = dace.vector(dace.float32, 4)
ptr_to_float = dace.pointer(dace.float32)
mystruct = dace.struct(x=dace.int32, y=dace.float64)
print('\nComposite types:')
print('  vector4:', vec4)
print('  pointer:', ptr_to_float)
print('  struct:', mystruct)

# Typeclass to numpy dtype mapping
print('\nTypeclass -> numpy dtype')
print('  dtype of vec4 base:', vec4.dtype if hasattr(vec4, 'dtype') else getattr(vec4, 'type', None))
print('  as_numpy_dtype for float32:', dace.float32.as_numpy_dtype())



- `dace.dtypes` is DaCe's primitive/typeclass layer: `dace.float32`, `dace.int32`, plus composite helpers like `dace.vector`, `dace.pointer`, and `dace.struct`. These objects wrap NumPy dtypes and expose helpers such as `as_numpy_dtype()` and `as_ctypes()`.

- `dace.data` contains the *container descriptors* used by the IR: `Array`, `Scalar`, `Stream`, `Structure`, `Tensor` (sparse). These describe memory (shape, layout, storage) and are *not* runtime arrays.

This part shows the key concepts and small examples so you can see how DaCe represents element types and how containers are canonicalized from common Python objects.

### Behavior by input kind
1. Objects that already are DaCe descriptors / Custom adapter API
    - If `obj` is an instance of `dace.data.Data`, `create_datadescriptor` returns it unchanged.
    - If `obj` has a __descriptor__() method or a `.descriptor` attribute, that descriptor is returned. This is the recommended extension point for custom types.
  
2. Torch tensors
    - Special-cased: detected by module/class name checks (type module "torch" and class name "Tensor"). The function maps torch dtypes -> DaCe typeclasses and sets `storage` to GPU when `obj.device.type == 'cuda'`.

3. Objects with array interfaces (NumPy/CuPy/array protocol)

    - If dtypes.is_array(obj) is True and the object exposes __array_interface__ or __cuda_array_interface__, the code reads typestr, shape, strides and constructs a dace.data.Array. If the interface shape is empty (0-d), it may be treated as a `Scalar`.

4. Python lists / tuples

    - `create_datadescriptor` casts them to a NumPy array first (`obj = numpy.array(obj)`) and then follows the array-path above — so lists/tuples canonicalize to a `dace.data.Array `with NumPy-inferred dtype/shape/strides.

5. CuPy (ndarray) special-case

    - If the object originates from cupy (module/class check), it's handled similarly to NumPy arrays, with storage set to GPU.

6. Scalars / numeric types
    a. symbolic.issymbolic(...) → Scalar(symbolic.symtype(obj)).
    b. If `obj` is an instance of dace.dtypes.typeclass → Scalar(obj).
    c. If `obj` is int, float, complex, bool, None, or a Python numeric instance (Number, numpy.number, numpy.bool_), create_datadescriptor returns a Scalar whose dtype is dtypes.typeclass(type(obj)) (i.e., DaCe's typeclass built from that Python/numpy type).
    d. If obj is a type and is a NumPy number subclass, it's wrapped into a Scalar(dtypes.typeclass(obj)).

7. Strings and callables

    a. `str` →` Scalar(dace.string)`.
    b. `callable(obj)` → returns a `Scalar(dtypes.callback(None))` (a callback descriptor).



In [28]:
@dace.program
def simple_constant_conversion():
    return dace.float64(0)

A = simple_constant_conversion()
print(f"A.dtype: {A.dtype}, type: {type(A.dtype)}")
print(f"A[0]: {A[0]}")



A.dtype: float64, type: <class 'numpy.dtypes.Float64DType'>
A[0]: 0.0


If the interface shape is empty (0-d), it may be treated as a `Scalar`

ex:
```python

```


In [None]:


a = np.arange(12, dtype=np.int32).reshape(3, 4)




*Python builtin sequence types*

In [15]:


lst = [1,2,3]
show_desc(lst)



Package: builtins.list, is_array=False
  Descriptor type: Array, dtype=int64_t, shape=(3,)


#### Torch tensors
Special-cased: detected by module/class name checks (type module "torch" and class name "Tensor"). The function maps torch dtypes -> DaCe typeclasses and sets `storage` to GPU when `obj.device.type == 'cuda'`.

In [None]:


t = torch.zeros((2,3))
show_desc(t)



In [14]:
# ctypes array
import ctypes
carr = (ctypes.c_float*6)(*range(6))
show_desc(carr)


Package: __main__.c_float_Array_6, is_array=True
Could not create descriptor for c_float_Array_6: Could not create a DaCe data descriptor from object <__main__.c_float_Array_6 object at 0x12160eed0>. If this is a custom object, consider creating a `__descriptor__` adaptor method to the type hint or object itself.


In [7]:

spec = dace.float32[2, 3]
print(spec)
print(type(spec))

Array (dtype=float, shape=(2, 3))
<class 'dace.data.Array'>


## Part 2: Exploring Existing LibraryNodes

Let's start by examining what LibraryNodes are already available in DaCe and what information they expose.

In [31]:
# Let's look at the BLAS MatMul library node as an example
from dace.libraries.blas.nodes.matmul import MatMul

print("=== MatMul LibraryNode Analysis ===")
print(f"Class: {MatMul}")
print(f"Base classes: {MatMul.__bases__}")
print(f"Is a LibraryNode: {issubclass(MatMul, dace.sdfg.nodes.LibraryNode)}")

# Create an instance to inspect
mm = MatMul("example_matmul")
print(f"\nInstance name: {mm.name}")
print(f"Instance label: {mm.label}")
print(f"Alpha: {mm.alpha}")
print(f"Beta: {mm.beta}")
print(f"Implementation: {mm.implementation}")
print(f"Schedule: {mm.schedule}")

# Check input/output interface
print(f"\nInput connectors: {mm.in_connectors}")
print(f"Output connectors: {mm.out_connectors}")

=== MatMul LibraryNode Analysis ===
Class: <class 'dace.libraries.blas.nodes.matmul.MatMul'>
Base classes: (<class 'dace.sdfg.nodes.LibraryNode'>,)
Is a LibraryNode: True

Instance name: example_matmul
Instance label: example_matmul
Alpha: 1
Beta: 0
Implementation: None
Schedule: ScheduleType.Default

Input connectors: {'_a': void, '_b': void}
Output connectors: {'_c': void}


In [32]:
# Check what "facts" are established at compile-time
print(f"\n=== Compile-time Facts (Properties) ===")

# Inspect MatMul properties by examining the instance
mm_instance = MatMul("test_matmul")
print(f"MatMul instance properties:")
for prop_name in dir(mm_instance):
    if not prop_name.startswith('_') and isinstance(getattr(MatMul, prop_name, None), dace.properties.Property):
        prop_val = getattr(mm_instance, prop_name)
        print(f"  {prop_name}: {prop_val}")

# Let's also check class attributes
print(f"\nMatMul class attributes:")
for attr_name in dir(MatMul):
    if not attr_name.startswith('_') and not callable(getattr(MatMul, attr_name)):
        attr_val = getattr(MatMul, attr_name)
        if isinstance(attr_val, dace.properties.Property):
            print(f"  {attr_name}: {attr_val}")

# Check available implementations
print(f"\n=== Available Implementations ===")
for impl_name, impl_class in MatMul.implementations.items():
    print(f"  {impl_name}: {impl_class}")

print(f"\nDefault implementation: {MatMul.default_implementation}")

# Check library registration
print(f"\n=== Library Registration ===")
print(f"Library name: {getattr(MatMul, '_dace_library_name', 'Not registered')}")
print(f"Is library node: {getattr(MatMul, '_dace_library_node', False)}")


=== Compile-time Facts (Properties) ===
MatMul instance properties:
  alpha: 1
  beta: 0
  debuginfo: <dace.dtypes.DebugInfo object at 0x125a80d10>
  environments: frozenset()
  guid: f3f8ff1b-a526-46f8-9ef1-5a01893a4b41
  implementation: None
  in_connectors: {'_a': void, '_b': void}
  label: test_matmul
  location: {}
  name: test_matmul
  out_connectors: {'_c': void}
  schedule: ScheduleType.Default

MatMul class attributes:
  alpha: <dace.properties.Property object at 0x125e96890>
  beta: <dace.properties.Property object at 0x125e966d0>
  debuginfo: <dace.properties.DebugInfoProperty object at 0x125aa4c50>
  environments: <dace.properties.SetProperty object at 0x125a0a3c0>
  guid: <dace.properties.Property object at 0x125a2eb30>
  implementation: <dace.properties.LibraryImplementationProperty object at 0x125a0aa50>
  in_connectors: <dace.properties.DictProperty object at 0x125853390>
  label: <dace.properties.Property object at 0x125a2f310>
  location: <dace.properties.DictPropert

In [33]:
# Let's examine the GEMM (GEneral Matrix Multiplication) library node
from dace.libraries.blas.nodes.gemm import Gemm

# Create an instance with specific compile-time facts
gemm_node = Gemm("example_gemm", 
                  alpha=2.0,  # Scalar coefficient for AB
                  beta=0,   # Scalar coefficient for C
                  transA=True,  # Whether to transpose A
                  transB=False)  # Whether to transpose B

print("=== LibraryNode Instance Analysis ===")
print(f"Node name: {gemm_node.name}")
print(f"Node label: {gemm_node.label}")
print(f"Current implementation: {gemm_node.implementation}")

print(f"\n=== Semantic Properties (Compile-time Facts) ===")
print(f"Alpha coefficient: {gemm_node.alpha}")
print(f"Beta coefficient: {gemm_node.beta}")
print(f"Transpose A: {gemm_node.transA}")
print(f"Transpose B: {gemm_node.transB}")

print(f"\n=== Connector Information ===")
print(f"Input connectors: {gemm_node.in_connectors}")
print(f"Output connectors: {gemm_node.out_connectors}")

print(f"\n=== What the node 'knows' about itself ===")
print(f"Has side effects: {gemm_node.has_side_effects}")
print(f"Free symbols: {gemm_node.free_symbols}")  # Symbolic dependencies

# Create a simple SDFG using this node
print(f"\n=== Using Gemm in an SDFG ===")
sdfg = SDFG("gemm_example")
state = sdfg.add_state()

# Add arrays with specific sizes
M, N, K = 3, 4, 5
sdfg.add_array("A", [M, K], dace.float32)
sdfg.add_array("B", [K, N], dace.float32)
sdfg.add_array("C", [M, N], dace.float32)

# Add access nodes
a_access = state.add_read("A")
b_access = state.add_read("B")
c_access = state.add_access("C")  # Both read and write

# Connect to GEMM node (C = alpha*A*B + beta*C)
state.add_edge(a_access, None, gemm_node, "_a", dace.Memlet("A[0:M, 0:K]"))
state.add_edge(b_access, None, gemm_node, "_b", dace.Memlet("B[0:K, 0:N]"))
state.add_edge(c_access, None, gemm_node, "_c", dace.Memlet("C[0:M, 0:N]"))
state.add_edge(gemm_node, "_o", c_access, None, dace.Memlet("C[0:M, 0:N]"))

print(f"SDFG with GEMM node created: {len(state.nodes())} nodes, {len(state.edges())} edges")
print(f"GEMM performs: C = {gemm_node.alpha}*A{'ᵀ' if gemm_node.transA else ''}*B{'ᵀ' if gemm_node.transB else ''} + {gemm_node.beta}*C")

=== LibraryNode Instance Analysis ===
Node name: example_gemm
Node label: example_gemm
Current implementation: None

=== Semantic Properties (Compile-time Facts) ===
Alpha coefficient: 2.0
Beta coefficient: 0
Transpose A: True
Transpose B: False

=== Connector Information ===
Input connectors: {'_a': void, '_b': void}
Output connectors: {'_c': void}

=== What the node 'knows' about itself ===
Has side effects: False
Free symbols: set()

=== Using Gemm in an SDFG ===
SDFG with GEMM node created: 4 nodes, 4 edges
GEMM performs: C = 2.0*Aᵀ*B + 0*C


In [34]:

# Code to inspect Gemm connectors
gemm = Gemm("example_gemm", beta=0)
print("=== With beta=0 ===")
print("Input connectors:", gemm.in_connectors)  # Should be {'_a', '_b'}
print("Output connectors:", gemm.out_connectors)  # Should be {'_c'}

gemm_with_c = Gemm("example_gemm_with_c", beta=1)
print("\n=== With beta=1 ===")
print("Input connectors:", gemm_with_c.in_connectors)  # Should be {'_a', '_b', '_c'}
print("Output connectors:", gemm_with_c.out_connectors)  # Should be {'_c'}

=== With beta=0 ===
Input connectors: {'_a': void, '_b': void}
Output connectors: {'_c': void}

=== With beta=1 ===
Input connectors: {'_a': void, '_c': void, '_b': void}
Output connectors: {'_c': void}


## Part 2: Creating a Custom LibraryNode

Now let's create our own LibraryNode to understand how the `@dace.library.node` decorator establishes semantic contracts. We'll define a custom Vector Addition node as an example.

In [35]:
# Let's create a simple vector addition LibraryNode
# This will demonstrate the semantic contract system

# Define a custom Library Node class
@dace.library.node
class VectorAdd(dace.sdfg.nodes.LibraryNode):
    """
    A LibraryNode that represents vector addition: C = A + B
    
    This demonstrates how LibraryNodes establish semantic contracts:
    - What the operation does (vector addition)
    - What parameters it needs (vector size, scalar coefficient)
    - What data it consumes/produces (two input vectors, one output)
    """
    
    # Required: Define possible implementations (initially empty)
    implementations = {}
    
    # Required: Define default implementation (can be None initially)
    default_implementation = None
    
    # Semantic properties (compile-time "facts")
    n = dace.properties.SymbolicProperty(
        desc="Vector size",
        default=symbolic.symbol("n")
    )
    
    scale_factor = dace.properties.Property(
        dtype=float,
        default=1.0,
        desc="Scalar multiplier for the result"
    )
    
    def __init__(self, name, n=None, scale_factor=1.0, **kwargs):
        # Define the semantic interface: 2 inputs, 1 output
        super().__init__(name, 
                         inputs={"_a", "_b"},  # Input connectors
                         outputs={"_c"},       # Output connectors
                         **kwargs)
        
        # Set semantic properties
        self.n = n or symbolic.symbol("n")
        self.scale_factor = scale_factor
    
    def validate(self, sdfg, state):
        """
        Semantic validation - ensure the node usage is correct
        This is where compile-time "facts" are checked
        """
        # Check that we have the right connectors
        assert "_a" in self.in_connectors, "VectorAdd requires _a input connector"
        assert "_b" in self.in_connectors, "VectorAdd requires _b input connector"
        assert "_c" in self.out_connectors, "VectorAdd requires _c output connector"
        
        # Additional semantic checks could go here
        # (e.g., vector sizes match, dtypes are compatible, etc.)
        pass

# After decoration, let's see what the decorator added
print("=== After @dace.library.node decoration ===")
print(f"_dace_library_node: {getattr(VectorAdd, '_dace_library_node', 'Missing')}")
print(f"Implementations dictionary: {VectorAdd.implementations}")
print(f"Default implementation: {VectorAdd.default_implementation}")

# Inspect the properties
print(f"\nDeclared properties:")
for attr_name in dir(VectorAdd):
    attr_val = getattr(VectorAdd, attr_name)
    if isinstance(attr_val, dace.properties.Property):
        print(f"  {attr_name}: {attr_val}")

# Create an instance and check its properties
vector_add = VectorAdd("my_vector_add", n=1024, scale_factor=2.5)
print(f"\nInstance properties:")
print(f"  n = {vector_add.n}")
print(f"  scale_factor = {vector_add.scale_factor}")
print(f"  inputs = {vector_add.in_connectors}")
print(f"  outputs = {vector_add.out_connectors}")

=== After @dace.library.node decoration ===
_dace_library_node: True
Implementations dictionary: {}
Default implementation: None

Declared properties:
  debuginfo: <dace.properties.DebugInfoProperty object at 0x125aa4c50>
  environments: <dace.properties.SetProperty object at 0x125a0a3c0>
  guid: <dace.properties.Property object at 0x125a2eb30>
  implementation: <dace.properties.LibraryImplementationProperty object at 0x125a0aa50>
  in_connectors: <dace.properties.DictProperty object at 0x125853390>
  label: <dace.properties.Property object at 0x125a2f310>
  location: <dace.properties.DictProperty object at 0x125a25350>
  n: <dace.properties.SymbolicProperty object at 0x125e2b4d0>
  name: <dace.properties.Property object at 0x125ada5f0>
  out_connectors: <dace.properties.DictProperty object at 0x125853b10>
  scale_factor: <dace.properties.Property object at 0x127346190>
  schedule: <dace.properties.EnumProperty object at 0x125aa4b50>

Instance properties:
  n = 1024
  scale_factor = 2.

In [None]:
# Create an instance and inspect the compile-time facts
vector_add_node = VectorAdd("my_vector_add", n=1024, scale_factor=2.5)

print("=== Compile-time Facts Available to DaCe ===")
print(f"Operation: Vector addition with scaling")
print(f"Semantic interface: {len(vector_add_node.in_connectors)} inputs → {len(vector_add_node.out_connectors)} outputs")
print(f"Vector size: {vector_add_node.n}")
print(f"Scale factor: {vector_add_node.scale_factor}")
print(f"Input connectors: {vector_add_node.in_connectors}")
print(f"Output connectors: {vector_add_node.out_connectors}")

print(f"\n=== What DaCe Can Reason About ===")
print(f"Free symbols (symbolic dependencies): {vector_add_node.free_symbols}")

# Try to use it in an SDFG (this will fail because no implementation exists yet)
print(f"\n=== Trying to Use Without Implementation ===")
try:
    sdfg = SDFG("test_vector_add")
    state = sdfg.add_state()
    
    # Add arrays
    sdfg.add_array("A", [1024], dace.float32)
    sdfg.add_array("B", [1024], dace.float32) 
    sdfg.add_array("C", [1024], dace.float32)
    
    # Add nodes
    a_read = state.add_read("A")
    b_read = state.add_read("B")
    c_write = state.add_write("C")
    vector_add = state.add_node(vector_add_node)
    
    # Add edges
    state.add_edge(a_read, None, vector_add, "_a", dace.Memlet("A[0:1024]"))
    state.add_edge(b_read, None, vector_add, "_b", dace.Memlet("B[0:1024]"))
    state.add_edge(vector_add, "_c", c_write, None, dace.Memlet("C[0:1024]"))
    
    print("SDFG created successfully!")
    print(f"Trying to compile...")
    # This will fail because we have no implementation
    sdfg.validate()
    compiled = sdfg.compile()
    
except Exception as e:
    print(f"Expected error: {type(e).__name__}: {e}")
    print("This is expected - we need to define an implementation!")

=== Compile-time Facts Available to DaCe ===
Operation: Vector addition with scaling
Semantic interface: 2 inputs → 1 outputs
Vector size: 1024
Scale factor: 2.5
Input connectors: {'_b': void, '_a': void}
Output connectors: {'_c': void}

=== What DaCe Can Reason About ===
Free symbols (symbolic dependencies): set()


AttributeError: 'VectorAdd' object has no attribute '_dace_properties'

## Part 3: Registering Implementations (Expansions)

Now we'll create concrete implementations for our LibraryNode. This is where abstract operations become executable code. Each implementation defines how the operation will be performed.

In [None]:
# Create a simple CPU implementation using @dace.library.expansion
@dace.library.expansion
class ExpandVectorAddCPU(ExpandTransformation):
    """
    A concrete implementation of VectorAdd for CPU execution.
    
    This demonstrates how abstract LibraryNodes get translated into
    executable SDFG subgraphs.
    """
    
    environments = []  # No special environment requirements for CPU
    
    @staticmethod
    def expansion(node: VectorAdd, parent_state: SDFGState, parent_sdfg: SDFG, **kwargs):
        """
        This method gets called when DaCe needs to expand the LibraryNode
        into concrete operations.
        
        Here's what information is available during expansion:
        - node: The LibraryNode instance with all its compile-time facts
        - parent_state: The SDFG state containing the node
        - parent_sdfg: The parent SDFG for context
        """
        
        print(f"=== Expansion Context ===")
        print(f"Expanding node: {node.name}")
        print(f"Node properties available:")
        print(f"  - n (vector size): {node.n}")
        print(f"  - scale_factor: {node.scale_factor}")
        print(f"  - input connectors: {node.in_connectors}")
        print(f"  - output connectors: {node.out_connectors}")
        
        # Access the memlets/data arrays connected to this node
        input_edges = parent_state.in_edges(node)
        output_edges = parent_state.out_edges(node)
        
        print(f"Input edges: {len(input_edges)}")
        print(f"Output edges: {len(output_edges)}")
        
        # Create the expansion: a simple map with a tasklet
        # This replaces the LibraryNode with concrete SDFG operations
        
        # Create a map over the vector elements
        map_entry, map_exit = parent_state.add_map(
            'vector_add_map',
            dict(i=f'0:{node.n}')
        )
        
        # Create a tasklet that does the actual computation
        tasklet = parent_state.add_tasklet(
            'add_task',
            inputs={'a_in', 'b_in'},
            outputs={'c_out'},
            code=f'c_out = ({node.scale_factor} * (a_in + b_in))'
        )
        
        # Get input and output array names from edges
        a_array = None
        b_array = None
        c_array = None
        
        for e in input_edges:
            if e.dst_conn == '_a':
                a_array = e.data.data
            elif e.dst_conn == '_b':
                b_array = e.data.data
        
        for e in output_edges:
            if e.src_conn == '_c':
                c_array = e.data.data
        
        # Create read/write nodes within the map
        a_read = parent_state.add_read(a_array)
        b_read = parent_state.add_read(b_array)
        c_write = parent_state.add_write(c_array)
        
        # Connect everything
        parent_state.add_edge(map_entry, None, a_read, None, dace.Memlet())
        parent_state.add_edge(map_entry, None, b_read, None, dace.Memlet())
        parent_state.add_edge(map_entry, None, tasklet, None, dace.Memlet())
        
        # Connect reads to tasklet
        parent_state.add_edge(a_read, None, tasklet, 'a_in', dace.Memlet(f"{a_array}[i]"))
        parent_state.add_edge(b_read, None, tasklet, 'b_in', dace.Memlet(f"{b_array}[i]"))
        
        # Connect tasklet to write
        parent_state.add_edge(tasklet, 'c_out', c_write, None, dace.Memlet(f"{c_array}[i]"))
        
        # Connect to map exit
        parent_state.add_edge(c_write, None, map_exit, None, dace.Memlet())
        
        # Connect the incoming edges from the LibraryNode to our subgraph
        for e in input_edges:
            if e.dst_conn == '_a':
                parent_state.add_edge(e.src, e.src_conn, a_read, None, e.data)
            elif e.dst_conn == '_b':
                parent_state.add_edge(e.src, e.src_conn, b_read, None, e.data)
        
        # Connect the outgoing edges from our subgraph
        for e in output_edges:
            if e.src_conn == '_c':
                parent_state.add_edge(c_write, None, e.dst, e.dst_conn, e.data)
        
        # Remove the library node from the graph - it's replaced by our implementation
        parent_state.remove_node(node)
        
        print("Expansion created: Map + Tasklet for CPU execution")
        return map_entry  # Return the entry point of the expansion

# Register this implementation with our LibraryNode
VectorAdd.register_implementation('cpu', ExpandVectorAddCPU)
VectorAdd.default_implementation = 'cpu'

print(f"Registered 'cpu' implementation for VectorAdd")
print(f"Available implementations: {list(VectorAdd.implementations.keys())}")
print(f"Default implementation: {VectorAdd.default_implementation}")

In [None]:
# Alternative way: Using the @dace.library.register_expansion decorator
# This is more convenient for adding implementations after node definition

@dace.library.register_expansion(VectorAdd, 'vectorized')
class ExpandVectorAddVectorized(ExpandTransformation):
    """
    An alternative implementation that demonstrates vectorized operations.
    
    This shows how multiple implementations can target different execution
    strategies for the same semantic operation.
    """
    
    environments = []
    
    @staticmethod
    def expansion(node: VectorAdd, parent_state: SDFGState, parent_sdfg: SDFG, **kwargs):
        """
        This expansion demonstrates accessing more compile-time information
        and creating a different implementation strategy.
        """
        
        print(f"=== Vectorized Expansion ===")
        print(f"Available compile-time facts during expansion:")
        
        # All the semantic properties are available
        print(f"  - Vector size (n): {node.n}")
        print(f"  - Scale factor: {node.scale_factor}")
        print(f"  - Node schedule: {node.schedule}")
        print(f"  - Implementation being used: {node.implementation}")
        
        # We can also access the parent SDFG context
        print(f"  - Parent SDFG name: {parent_sdfg.name}")
        print(f"  - Parent state name: {parent_state.label}")
        
        # Find connected arrays to get more type information
        in_edges = parent_state.in_edges(node)
        out_edges = parent_state.out_edges(node)
        
        # Get array names and descriptors
        a_array = None
        b_array = None
        c_array = None
        
        for e in in_edges:
            if e.dst_conn == '_a':
                a_array = e.data.data
                array_desc = parent_sdfg.arrays[a_array]
                print(f"  - Input array '{a_array}': shape={array_desc.shape}, dtype={array_desc.dtype}")
            elif e.dst_conn == '_b':
                b_array = e.data.data
                array_desc = parent_sdfg.arrays[b_array]
                print(f"  - Input array '{b_array}': shape={array_desc.shape}, dtype={array_desc.dtype}")
        
        for e in out_edges:
            if e.src_conn == '_c':
                c_array = e.data.data
                array_desc = parent_sdfg.arrays[c_array]
                print(f"  - Output array '{c_array}': shape={array_desc.shape}, dtype={array_desc.dtype}")
        
        # Create a different expansion - single tasklet (vectorized)
        # This shows that different implementations can have different internal structures
        
        # Create a single vectorized tasklet for the whole operation
        tasklet = parent_state.add_tasklet(
            'vectorized_add',
            inputs={'a_vec', 'b_vec'},
            outputs={'c_vec'},
            code=f'''
            for (int i = 0; i < {node.n}; ++i) {{
                c_vec[i] = {node.scale_factor} * (a_vec[i] + b_vec[i]);
            }}
            ''',
            language=dace.dtypes.Language.CPP
        )
        
        # Create read/write nodes
        a_read = parent_state.add_read(a_array)
        b_read = parent_state.add_read(b_array)
        c_write = parent_state.add_write(c_array)
        
        # Connect everything with array memlets instead of scalar ones
        parent_state.add_edge(a_read, None, tasklet, 'a_vec', dace.Memlet(f"{a_array}[0:{node.n}]"))
        parent_state.add_edge(b_read, None, tasklet, 'b_vec', dace.Memlet(f"{b_array}[0:{node.n}]"))
        parent_state.add_edge(tasklet, 'c_vec', c_write, None, dace.Memlet(f"{c_array}[0:{node.n}]"))
        
        # Connect the incoming edges from the library node
        for e in in_edges:
            if e.dst_conn == '_a':
                parent_state.add_edge(e.src, e.src_conn, a_read, None, e.data)
            elif e.dst_conn == '_b':
                parent_state.add_edge(e.src, e.src_conn, b_read, None, e.data)
        
        # Connect the outgoing edges 
        for e in out_edges:
            if e.src_conn == '_c':
                parent_state.add_edge(c_write, None, e.dst, e.dst_conn, e.data)
        
        # Remove the original library node
        parent_state.remove_node(node)
        
        print("Created vectorized expansion: Single C++ tasklet with array accesses")
        return tasklet

print(f"Registered 'vectorized' implementation for VectorAdd")
print(f"Available implementations: {list(VectorAdd.implementations.keys())}")

# Show the implementation registry
print(f"\n=== Implementation Registry ===")
for name, impl_class in VectorAdd.implementations.items():
    print(f"  {name}: {impl_class}")
    print(f"    - Environments required: {impl_class.environments}")
    print(f"    - Associated node: {getattr(impl_class, '_dace_library_node', 'Not set')}")

## Part 4: Testing the Implementation System

Now let's test our LibraryNode with its registered implementations to see the expansion in action. We'll create two instances - one using the default CPU implementation and another explicitly using the vectorized implementation.

In [None]:
# Create a test SDFG using our LibraryNode
print("=== Creating Test SDFG ===")

sdfg = SDFG("test_vector_add_library")

# Add arrays
N = 8  # Small size for demonstration
sdfg.add_array("A", [N], dace.float32)
sdfg.add_array("B", [N], dace.float32)
sdfg.add_array("C", [N], dace.float32)
sdfg.add_array("D", [N], dace.float32)  # For second implementation

# Create first state with default implementation
state1 = sdfg.add_state("cpu_implementation")

# Add nodes for first implementation
a_read1 = state1.add_read("A")
b_read1 = state1.add_read("B")
c_write1 = state1.add_write("C")

# Create our LibraryNode instance with specific parameters
vector_add1 = VectorAdd("test_add", n=N, scale_factor=2.0)
vector_add_node1 = state1.add_node(vector_add1)

# Connect the LibraryNode
state1.add_edge(a_read1, None, vector_add_node1, "_a", dace.Memlet("A[0:8]"))
state1.add_edge(b_read1, None, vector_add_node1, "_b", dace.Memlet("B[0:8]"))
state1.add_edge(vector_add_node1, "_c", c_write1, None, dace.Memlet("C[0:8]"))

print(f"State 1 created with LibraryNode: {vector_add_node1.name}")
print(f"Using implementation: {vector_add_node1.implementation or 'default'}")

# Create second state with vectorized implementation
state2 = sdfg.add_state("vectorized_implementation")
a_read2 = state2.add_read("A")
b_read2 = state2.add_read("B") 
d_write2 = state2.add_write("D")

# Create another instance with explicit implementation
vector_add2 = VectorAdd("test_add_vec", n=N, scale_factor=3.0)
vector_add2.implementation = 'vectorized'  # Explicitly set implementation
vector_add_node2 = state2.add_node(vector_add2)

# Connect the second LibraryNode
state2.add_edge(a_read2, None, vector_add_node2, "_a", dace.Memlet("A[0:8]"))
state2.add_edge(b_read2, None, vector_add_node2, "_b", dace.Memlet("B[0:8]"))
state2.add_edge(vector_add_node2, "_c", d_write2, None, dace.Memlet("D[0:8]"))

print(f"State 2 created with LibraryNode: {vector_add_node2.name}")
print(f"Using implementation: {vector_add_node2.implementation}")

# Add an edge between states (sequential execution)
sdfg.add_edge(state1, state2, dace.sdfg.InterstateEdge())

# Now expand the LibraryNodes
print(f"\n=== Expanding LibraryNodes ===")
sdfg.expand_library_nodes()

print(f"\n=== SDFG After Expansion ===")
print(f"Number of states: {len(sdfg.states())}")
print(f"Nodes in state 1: {len(state1.nodes())}")
print(f"Node types in state 1: {[type(node).__name__ for node in state1.nodes()]}")
print(f"Nodes in state 2: {len(state2.nodes())}")
print(f"Node types in state 2: {[type(node).__name__ for node in state2.nodes()]}")

# Run the SDFG with test data
print(f"\n=== Executing SDFG with Test Data ===")
A_data = np.array([1, 2, 3, 4, 5, 6, 7, 8], dtype=np.float32)
B_data = np.array([8, 7, 6, 5, 4, 3, 2, 1], dtype=np.float32)

try:
    # Compile and run the SDFG
    print("Compiling SDFG...")
    compiled_sdfg = sdfg.compile()
    
    print("Running SDFG...")
    result = compiled_sdfg(A=A_data, B=B_data)
    
    print("\nResults:")
    print("Input A:", A_data)
    print("Input B:", B_data)
    print("Output C (CPU impl, scale=2.0):", result['C'])
    print("Output D (Vectorized impl, scale=3.0):", result['D'])
    
    # Verify results
    expected_C = 2.0 * (A_data + B_data)
    expected_D = 3.0 * (A_data + B_data)
    print("\nVerification:")
    print("C matches expected:", np.allclose(result['C'], expected_C))
    print("D matches expected:", np.allclose(result['D'], expected_D))
    
except Exception as e:
    print(f"Error running SDFG: {type(e).__name__}: {e}")

## Part 5: Summary - Compile-time "Facts" in DaCe LibraryNodes

Let's summarize what compile-time information is available and how the abstraction system works. The key insight is the separation between "what" (abstract operation semantics) and "how" (concrete implementation).

In [None]:
print("=== DaCe LibraryNode System Summary ===")
print()

print("1. SEMANTIC CONTRACTS (@dace.library.node)")
print("   - Defines what an operation does, not how")
print("   - Establishes compile-time 'facts' via Properties")
print("   - Specifies input/output interface (connectors)")
print("   - Enables multiple implementations for same abstraction")
print()

print("2. COMPILE-TIME FACTS AVAILABLE:")
print("   a) Semantic Properties:")
print("      - Symbolic parameters (e.g., vector/matrix dimensions)")
print("      - Algorithm parameters (e.g., alpha/beta coefficients)")
print("      - Configuration flags (e.g., transpose flags)")
print()
print("   b) Interface Information:")  
print("      - Input/output connectors")
print("      - Data dependencies") 
print("      - Memory access patterns")
print()
print("   c) Context Information (during expansion):")
print("      - Connected array shapes and types")
print("      - Parent SDFG structure")
print("      - Target execution environment")
print()

print("3. IMPLEMENTATION REGISTRY:")
print("   - Multiple concrete implementations per abstract operation")
print("   - Implementation selection based on:")
print("     * Explicit specification (node.implementation)")
print("     * Default implementation")
print("     * Configuration overrides") 
print("     * Library defaults")
print()

print("4. EXPANSION PROCESS:")
print("   - LibraryNode → Concrete SDFG subgraph")
print("   - Access to all compile-time facts")
print("   - Can generate optimized code based on parameters")
print("   - Environment-specific implementations")
print()

print("5. BENEFITS OF THIS SYSTEM:")
print("   - Clear separation of 'what' vs 'how'")
print("   - Compile-time optimization opportunities")
print("   - Portable algorithms across targets")
print("   - Extensible library ecosystem")
print("   - Type safety and validation")

# Demonstrate the key facts available at different stages
print(f"\n=== Example: Facts Available for Our VectorAdd ===")
example_node = VectorAdd("demo", n=1024, scale_factor=1.5)

print(f"Semantic facts:")
print(f"  - Operation: Vector addition with scaling")
print(f"  - Parameters: n={example_node.n}, scale={example_node.scale_factor}")
print(f"  - Interface: {len(example_node.in_connectors)} inputs, {len(example_node.out_connectors)} outputs")
print(f"  - Available implementations: {list(VectorAdd.implementations.keys())}")
print(f"  - Default implementation: {VectorAdd.default_implementation}")

print(f"\nThese facts enable:")
print(f"  - Compile-time validation")
print(f"  - Implementation selection")  
print(f"  - Code generation optimization")
print(f"  - Cross-platform portability")

## Next Steps: Advanced Topics in DaCe Library Nodes

Now that we understand the basic LibraryNode system, we can explore more advanced topics:

1. **Advanced Library Nodes**: Create complex operations like matrix inversion, FFT, or convolutions
2. **Environment-specific Implementations**: Optimize for different hardware (CPU, GPU, FPGA)
3. **Library Integration**: Connect to external libraries like MKL, cuBLAS, or TensorFlow
4. **Optimization Transformations**: Apply high-level transformations using the semantic information
5. **Front-end Integration**: Connect Python functions to automatically generate library nodes

The key insight is that LibraryNodes establish a **semantic contract** at the IR level, capturing "what" an operation does through compile-time facts, while the implementation system handles "how" it gets executed on specific targets. This separation enables both performance portability and optimization opportunities.

## Using this wheel in another environment

Here are the exact checks and overrides you likely want when installing the wheel in a fresh environment that may have its own wheels/libs already:



1) Install the wheel and BLAS/LAPACK runtime

- If using conda:

  - `conda install libblas libcblas liblapack openblas` (or `mkl` on x86 if you prefer MKL)

- Then install the built wheel:

  - `pip install /path/to/dist/dace-1.0.0-py3-none-any.whl`



2) Verify BLAS/LAPACK are discoverable (like cell 14)

- This lists key libraries and shows what `ctypes.find_library` resolves:


In [18]:
# Verify BLAS/LAPACK availability and ctypes resolution

import ctypes.util, os, sys, glob

print('Python:', sys.version.split()[0])

cp = os.environ.get('CONDA_PREFIX', '(no conda)')

print('CONDA prefix:', cp)

lib = os.path.join(cp, 'lib') if cp != '(no conda)' else None

if lib and os.path.isdir(lib):

    print('Listing known BLAS/LAPACK libs under', lib)

    for pat in ['libblas*','libcblas*','liblapack*','liblapacke*','libopenblas*','libmkl*']:

        matches = sorted(glob.glob(os.path.join(lib, pat)))

        print(f'  {pat}:', len(matches))

        for m in matches[:5]:

            print('   -', os.path.basename(m))

        if len(matches) > 5:

            print('   ...')

print('\nctypes.find_library...')

for name in ['blas','cblas','lapacke','lapack','openblas','mkl_rt']:

    print(f'  {name}:', ctypes.util.find_library(name))


Python: 3.12.11
CONDA prefix: /opt/homebrew/Caskroom/miniforge/base/envs/dace-dev
Listing known BLAS/LAPACK libs under /opt/homebrew/Caskroom/miniforge/base/envs/dace-dev/lib
  libblas*: 2
   - libblas.3.dylib
   - libblas.dylib
  libcblas*: 2
   - libcblas.3.dylib
   - libcblas.dylib
  liblapack*: 2
   - liblapack.3.dylib
   - liblapack.dylib
  liblapacke*: 0
  libopenblas*: 7
   - libopenblas.0.dylib
   - libopenblas.a
   - libopenblas.dylib
   - libopenblas_armv8p-r0.3.30.dylib
   - libopenblas_vortexp-r0.3.30.a
   ...
  libmkl*: 0

ctypes.find_library...
  blas: /opt/homebrew/Caskroom/miniforge/base/envs/dace-dev/bin/../lib/libblas.dylib
  cblas: /opt/homebrew/Caskroom/miniforge/base/envs/dace-dev/bin/../lib/libcblas.dylib
  lapacke: None
  lapack: /opt/homebrew/Caskroom/miniforge/base/envs/dace-dev/bin/../lib/liblapack.dylib
  openblas: /opt/homebrew/Caskroom/miniforge/base/envs/dace-dev/bin/../lib/libopenblas.dylib
  mkl_rt: None


3) Pin compilers and CMake args (like the config in cell 17)

- If you want CMake to use compilers from your env, set CC/CXX or provide extra CMake args.

- The snippet below mirrors what you captured (OpenBLAS vendor + conda include/lib paths).


### Option A: set environment variables (before starting Python)

`export CC="$CONDA_PREFIX/bin/clang"`

`export CXX="$CONDA_PREFIX/bin/clang++"



### Option B: set DaCe extra CMake args programmatically

from dace.config import Config

extra = "-DBLA_VENDOR=OpenBLAS -DCMAKE_FIND_FRAMEWORK=LAST " \

        "-DCMAKE_INCLUDE_PATH=${CONDA_PREFIX}/include " \

        "-DCMAKE_LIBRARY_PATH=${CONDA_PREFIX}/lib"

Config.set('compiler', 'extra_cmake_args', extra)



In [22]:

print('extra_cmake_args:', Config.get('compiler','extra_cmake_args'))

extra_cmake_args: -DBLA_VENDOR=OpenBLAS -DCMAKE_FIND_FRAMEWORK=LAST -DCMAKE_INCLUDE_PATH=${CONDA_PREFIX}/include -DCMAKE_LIBRARY_PATH=${CONDA_PREFIX}/lib


4) Confirm which tools are on PATH (noisy shell inits avoided)

- Use Python to find where llvm/clang live without invoking a login shell.


In [20]:
# Resolve tool locations from Python

import shutil, os

for tool in ['llvm-config','clang','clang++','mlir-opt','mlir-translate']:

    print(f'{tool}:', shutil.which(tool))

print('CONDA_PREFIX:', os.environ.get('CONDA_PREFIX'))


llvm-config: /opt/homebrew/Caskroom/miniforge/base/envs/dace-dev/bin/llvm-config
clang: /opt/homebrew/Caskroom/miniforge/base/envs/dace-dev/bin/clang
clang++: /opt/homebrew/Caskroom/miniforge/base/envs/dace-dev/bin/clang++
mlir-opt: /opt/homebrew/Caskroom/miniforge/base/envs/dace-dev/bin/mlir-opt
mlir-translate: /opt/homebrew/Caskroom/miniforge/base/envs/dace-dev/bin/mlir-translate
CONDA_PREFIX: /opt/homebrew/Caskroom/miniforge/base/envs/dace-dev
