# Tutorial: Working with `data_types.py`

This Jupyter notebook is an **annotated walkthrough** of the _data‑types_ layer in your
workflow / resource‑estimation code‑base.  
The goal is to help new contributors **understand**, **experiment** with, and **extend**
the fundamental data‑type building blocks that underpin:

* **Resource estimation** — e.g. _how many (qu)bits does a register need?_
* **Shape & type checking** — reasoning about symbolic tensor shapes and composite
  data structures.
* **Trace construction** — assembling DAG nodes that manipulate these data objects.

> **Prerequisites**  
> The notebook assumes the repository is on your Python path (or that the notebook
> lives in the repo root).  Adjust `sys.path` below if needed.

In [1]:
sep_str = '-'*60

In [2]:
# -- Python path -----------------------------------------------------------------
import sys, os, pathlib
repo_root = pathlib.Path.cwd()  # adapt if the notebook sits elsewhere
# if str(repo_root) not in sys.path:
#     sys.path.insert(0, str(repo_root))
current_dir = os.getcwd()
sys.path.append(current_dir[:-9])

# -- Core data‑type imports ------------------------------------------------------
from qrew.simulation.data_types import (
    BitNumbering,BitStringView,
    Dyn,                   # symbolic "dynamic" dimension marker
    CBit, CUInt, CInt, CFloat,
    CFxp, String,Struct,
    QBit, QAny, QInt, QUInt, BQUInt, QFxp,
    TensorType, MatrixType,
    is_classical, is_quantum,
)


| Section | Outcome |
|---------|---------|
| Classical scalars | Encode bits, signed/unsigned integers, fixed-point, IEEE-754 floats, strings |
| Classical scalars | Encode bits, signed/unsigned integers, fixed-point, IEEE-754 floats, strings |
| Quantum scalars   | Model qubits, *n*-qubit integers, and fixed-point registers à la Qualtran |
| Dyn & symbolics   | Represent unknown or symbolic widths \(n, m, \ldots\) and the sentinel **Dyn** |
| Composites        | Build `TensorType` ($\mathbb R^{s_1\times\cdots\times s_r}$) and `CStruct` (records) |
| Consistency engine| Use `check_dtypes_consistent` to check two dtypes compatible under ⟨global, C, Q⟩ ladders |
| Resource formulas | Compute qubit/bit counts and memory footprints |


# 1. `DataType` — the universal interface for register element types
_In-house data types <code class="filepath">./data_types.py</code>._

Every concrete dtype (classical *or* quantum) derives from `DataType`. Think of it as the “adapter” that lets high-level operations treat qubits or bits like well-typed scalars.


- **Abstract properties/methods**
  
  - `data_width` → how many fundamental units (bits or qubits) per element
  - `to_bits(x)` → `[int, …]` for a single value
  - `from_bits(bits)` → reconstruct a single value
  - `get_classical_domain()` → iterable of representable classical values (if enumerable)
  - `assert_valid_classical_val(val)` → raise if `val` out of domain
- **Provided helpers**
  
  - `to_bits_array(x_array)` and `from_bits_array(bits_array)` via NumPy-vectorization
  - `assert_valid_classical_val_array(...)`
  - `is_symbolic()` / `iteration_length_or_zero()`
  - `__str__` and `__format__` delegate to the class name + width




## 1.1. Quantum Data Types (`QType`)

_All subclasses inherit from `QType` → `DataType` and expose a common bit-level API (`to_bits`, `from_bits`, `assert_valid_classical_val`, …)._

| QType&nbsp;(constructor)      | `data_width` / `num_qubits` | Classical domain<sup>†</sup>                               | Purpose & notes                                                                                                   |
|-------------------------------|-----------------------------|------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------|
| `QBit()`                      | `1`                         | `{0, 1}`                                                   | Single qubit viewed in the computational basis. Simplest building block for larger registers.                      |
| `QInt(n)`                     | `n`                         | $[-2^{\,n-1},\,2^{\,n-1})$                                 | *Signed* two’s-complement integer. High (bit 0) is the sign-bit. Arithmetic wraps mod $2^n$.                       |
| `QUInt(n)`                    | `n`                         | $[0,\,2^{\,n})$                                            | *Unsigned* integer. Developer manages wrap-around semantics on overflow (C-style).                                 |
| `BQUInt(bitsize, L)`          | `bitsize`                   | $[0,\,L)$ with $L\le 2^{\texttt{bitsize}}$                 | Bounded unsigned integer; ideal for coherent for-loop indices. `iteration_length = L` may be symbolic.             |
| `QAny(n)`                     | `n`                         | *Ambiguous* — delegates to `QUInt(n)` when coerced        | Opaque register of *n* qubits used when the specific dtype is unknown/irrelevant. Avoid when a precise domain matters. |
| `QFxp(w, f, signed=False)`    | `w`                         | Unsigned → $[0,\,2^{w-f})$  <br>Signed → $[-2^{w-f-1},\,2^{w-f-1})$  <br>**Step** $2^{-f}$ | Fixed-point real number with `f` fractional bits. Backed by `QUInt` (unsigned) or `QInt` (signed).  `float = int · 2^{-f}`. |

<sup>†</sup> “Classical domain” lists the set of classically representable values (i.e. basis states, ignoring superposition).  


In [3]:
# TODO
# Make examples of 
# 1. `QBit()`  one logical qubit  
# 2. `QInt(w)`  ($w≥1$) signed integer in two’s-complement  
# 3. `QUInt(w)` unsigned integer in $[0,2^w\!-\!1]$

qbit = QBit()
print("data_width (qubits):", qbit.data_width)
print("is_quantum:", is_quantum(qbit), "| is_classical:", is_classical(qbit))

data_width (qubits): 1
is_quantum: True | is_classical: False


## 1.2 · Classical Data Types (`CType`)

_Concrete classical dtypes all derive from `CType` → `NumericType` → `DataType` and therefore share the universal bit-level API (`to_bits`, `from_bits`, `assert_valid_classical_val`, …)._

- **Primitive bit and integers**: `CBit`, `CInt`, `CUInt`

- **Fixed-point, IEEE float, strings**: `CFxp`, `CFloat`, `String`

### a. Scalar / “Atomic” Classical Dtypes

| CType&nbsp;(constructor)                   | `data_width` / `bit_width` | Classical domain<sup>†</sup>                             | Purpose & notes                                                                                     |
|-------------------------------------------|---------------------------|----------------------------------------------------------|-----------------------------------------------------------------------------------------------------|
| `CBit()`                                  | $n=1$                       | `{0, 1}`                                                 | Single classical bit.                                                                               |
| `CInt(n)`                                 | $n$                       | $[-2^{\,n-1},\,2^{\,n-1})$                               | Signed two’s-complement integer with $n$ bits. Sign-bit at MSB; wraps mod $2^n$.                                  |
| `CUInt(n)`                                | $n$                    | $[0,\,2^{\,n})$                                          | Unsigned integer with $n$ bits. Developer handles overflow (wrap or error).                                       |
| `CFxp(total, frac, signed=False)`         | `w=total`                   | Unsigned → $[0,\,2^{\,\texttt{total-frac}})$<br>Signed → $[-2^{\,\texttt{total-frac-1}},\,2^{\,\texttt{total-frac-1}})$<br>**Step** $2^{-\texttt{frac}}$ | Fixed-point real number (`frac` fractional bits). Re-uses `CInt`/`CUInt` for raw bit access.        |
| `CFloat(n)`                               | $$n\in\{8,16,32,64\}$$       | IEEE-754 range for the chosen width                      | Classical floating-point; (de)serialises via `struct.pack` / `struct.unpack`.                       |
| `String(max_len)`                         | `w = 8 × max_len`             | All ASCII strings of length ≤ `max_len`                  | Null-padded byte string. Each char occupies one byte.                                               |

<sup>†</sup> *“Classical domain” = set of basis-state values that can be represented exactly; superpositions are a quantum artefact and therefore not listed here.*


In [4]:
cbit  = CBit()
cint8 = CInt(8)
cint64 = CInt(16)
cuint32 = CUInt(32)
cuint64 = CUInt(64)
print(f"{sep_str}\nPrimitive bit and integers:")
for dt in (cbit, cint8, cint64, cuint32,cuint64):
    print(f"    {dt:<10}  width = {dt.data_width} bits, nbytes={dt.nbytes}")

print(f"{sep_str}\nFixed-point, IEEE float, string ...")

fxp64  = CFxp(16, 8)     # 16-bit fixed point, 8 frac bits
flt32  = CFloat(32)
string = String(4)

for c in (fxp64, flt32, string):
    print(f"    {c:15}  width={c.data_width} bits  nbytes={c.nbytes}")


------------------------------------------------------------
Primitive bit and integers:
    CBit()      width = 1 bits, nbytes=1
    CInt(8)     width = 8 bits, nbytes=1
    CInt(16)    width = 16 bits, nbytes=2
    CUInt(32)   width = 32 bits, nbytes=4
    CUInt(64)   width = 64 bits, nbytes=8
------------------------------------------------------------
Fixed-point, IEEE float, string ...
    CFxp(16, 8)      width=16 bits  nbytes=2
    CFloat(32)       width=32 bits  nbytes=4
    String(32)       width=32 bits  nbytes=4


# 3. Composite / Container Classical Dtypes
### `Struct(fields: dict[str, DataType])`
Packed **record / struct** akin to a C `struct` or Rust `struct`.

#### Main properties
* `fields` – ordered mapping `name → dtype` (order preserved)  
* `data_width` – sum of field widths  
* `nbytes`, `total_bits` – byte / bit totals (recursive if a field is itself composite)
* `field_order` – list of keys in declaration order

#### Core methods
* `to_bits(value_dict)` – concatenates each field’s bitstring
* `from_bits(bits)` – slices the concatenation back into a `dict`
* `assert_valid_classical_val(value_dict)` – field-wise validation



In [5]:

# A composite structure {id: uint16, mass: float32, label: 8-char string}
particle = Struct(
    fields={
        "id":    CUInt(16),
        "mass":  CFloat(32),
        "label": String(8),
    }
)

print(f"\n{particle}  ––  is_classical: {is_classical(particle)}")
print(f"    - data_width (bits): {particle.data_width}")
print(f"    - nbytes:            {particle.nbytes}")
print(f"    - total_bits:        {particle.total_bits}")


Struct(112)  ––  is_classical: True
    - data_width (bits): 112
    - nbytes:            14
    - total_bits:        112


---
#### <b> N-Dim Data Types </b>
### `TensorType(shape: Tuple[int, …], element_type: DataType | type)`
Represents an **N-dimensional** array whose **elements** are themselves scalar dtypes
(e.g. `CFloat(32)`, `CUInt(8)`). Conceptually similar to a NumPy ndarray that carries a bit-level contract.

##### Properties
* `shape`, `element_type`, optional **`val: np.ndarray`**
* `data_width` – bits **per element**  
* `rank` – `len(shape)`
* `nelement()` – total number of elements  
* `element_size()` / **`bytes_per_element`** – bytes per element (`data_width // 8`)
* **`nbytes`** – total bytes (`nelement × bytes_per_element`)
* **`total_bits`** – `nelement × data_width` (alias `memory_in_bytes × 8`)
##### Methods
* `to_bits(x)` / `from_bits(bits)` – naïve flatten ↔ reconstruct
* `multiply(other)` – broadcast-style shape multiplication
* `assert_valid_classical_val(val)` – validates `val.shape` & `val.dtype`
* Standard `DataType` helpers (`to_bits_array`, `__str__`, …) come for free.


In [6]:
# A 3×4x4 tensor of 32‑bit floats
tensor = TensorType(shape=(3, 4, 4), element_type=CFloat(32))

print(f"{tensor}  ––    rank: {tensor.rank} | is_classical: {is_classical(tensor)} ")
print(f"    - nelements: {tensor.nelement()}")
print(f"    - bytes per element: {tensor.bytes_per_element}")
print(f"    - total bytes = {tensor.nbytes}")
print(f"    - total bits = {tensor.total_bits}")


TensorType((3, 4, 4))  ––    rank: 3 | is_classical: True 
    - nelements: 48
    - bytes per element: 4
    - total bytes = 192
    - total bits = 1536


---
### `MatrixType(rows, cols, element_type=float)`
A **rank-2 convenience subclass** of `TensorType` (will eventually be folded into it).

#### Extras on top of `TensorType`
* Properties: `rows`, `cols`
* `multiply(other)` – checks classic matrix-multiply compatibility (`self.cols == other.rows`)
* Inherits all size helpers (`rank == 2`, `nbytes`, …)


In [7]:

# A 4×4 matrix of ...
mat = MatrixType(3, 4, element_type=CFloat(32))

print(f"\n{mat}  ––    rank: {mat.rank} | is_classical: {is_classical(mat)} ")
print(f"    - nelements: {mat.nelement()}")
print(f"    - bytes per element: {mat.bytes_per_element}")
print(f"    - total bytes = {mat.nbytes}")
print(f"    - total bits = {mat.total_bits}")



Matrix(3, 4)  ––    rank: 2 | is_classical: True 
    - nelements: 12
    - bytes per element: 4
    - total bytes = 48
    - total bits = 384


In [8]:
A = MatrixType(2, 3, element_type=CFloat(32))
B = MatrixType(3, 1, element_type=CFloat(32))

C = A.multiply(B)
print("A:", A, "| B:", B, "| A×B:", C)

A: Matrix(2, 3) | B: Matrix(3, 1) | A×B: Matrix(2, 1)


The struct’s data_width is the sum of its fields: 16 (id) + 32 (mass) + 8×8 (label) = 112 bits = 14 bytes

## 4. Symbolic & Dynamic shapes

A `TensorType` wraps an **element type** (another `DataType`) and a **shape**.

* Use concrete integers for fixed shapes, or  
* `Dyn` / SymPy symbols for symbolic dimensions.

In [9]:
# A 3×4 tensor]
tensor_int8 = TensorType(shape=(3, 4), element_type=CInt(8))

print(f"{tensor_int8}  ––    rank: {tensor_int8.rank} | is_symbolic: {tensor_int8.is_symbolic()} ")
print(f"    - nelements: {tensor_int8.nelement()}")
print(f"    - bytes per element: {tensor_int8.bytes_per_element}")
print(f"    - total bytes = {tensor_int8.nbytes}")
print(f"    - total bits = {tensor_int8.total_bits}")

# A *symbolic* M×N tensor of ints
from sympy import symbols
M, N = symbols('m n', positive=True, integer=True)
sym_tensor = TensorType(shape=(M, N), element_type=CInt(8))

print(f"{sym_tensor}  ––    rank: {sym_tensor.rank} | is_symbolic: {sym_tensor.is_symbolic()}")
print(f"    - nelements: {sym_tensor.nelement()}")
print(f"    - bytes per element: {sym_tensor.bytes_per_element}")
print(f"    - total bytes = {sym_tensor.nbytes}")
print(f"    - total bits = {sym_tensor.total_bits}")

TensorType((3, 4))  ––    rank: 2 | is_symbolic: False 
    - nelements: 12
    - bytes per element: 1
    - total bytes = 12
    - total bits = 96
TensorType((m, n))  ––    rank: 2 | is_symbolic: True
    - nelements: m*n
    - bytes per element: 1
    - total bytes = m*n
    - total bits = 8*m*n


In [10]:
sym_mat =  MatrixType((M, N), element_type=CInt(8))

print(f"{sym_mat}  ––    rank: {sym_mat.rank} | is_symbolic: {sym_mat.is_symbolic()}")
print(f"    - nelements: {sym_mat.nelement()}")
print(f"    - bytes per element: {sym_mat.bytes_per_element}")
print(f"    - total bytes = {sym_mat.nbytes}")
print(f"    - total bits = {sym_mat.total_bits}")

Matrix(m, n)  ––    rank: 2 | is_symbolic: True
    - nelements: m*n
    - bytes per element: 1
    - total bytes = m*n
    - total bits = 8*m*n


--- 
## Bit-level Utilities

These tiny helpers sit **orthogonally** to the `DataType` hierarchy: they let
you view any integer (or list of bits) as a *bit-string* with an explicit width
and ordering, and they give every classical `DataType` a convenient
`to_bitstring` adapter.

### `BitNumbering` &nbsp;–&nbsp; bit-ordering enum
| Numbering flag | int (example) | `binary()` string ( `nbits = 4` ) | `bits()` → `List[int]` |
| -------------- | ------------- | --------------------------------- | ----------------------- |
| `MSB` (big-endian) | **13** | `1101` | `[1, 1, 0, 1]` |
| `LSB` (little-endian) | **13** | `1011` | `[1, 0, 1, 1]` |

*With `MSB` the left-most character is **bit 0** (most significant);  
with `LSB` the **right-most** character is bit 0, so the string is reversed.*


### class `BitStringView`

A tiny, immutable wrapper around:
- `integer: int` — underlying Python int
- `nbits: int` — declared bit-width (auto-grows when you set a larger integer or to fit dtype)
- `numbering: BitNumbering` — ordering of bits (`MSB` or `LSB`)
- `dtype: Optional[DataType]` — if set, validates that the integer fits in that dtype

#### **Class constructors**  — alternate entry points
  - `from_int(int, nbits=?, numbering=?, dtype=?)`:  Quick literal → view (optionally widen). Accepts Python `int` or another bit view obj
  - `from_binary(str, nbits=?, numbering=?)`: User CLI / config files. Binary string (`0b…` prefix optional).
  - `from_array(Sequence[int], nbits=?, numbering=?, dtype=?)`: Interop with bit-lists from other libs
  - `from_bitstring(other, nbits=?, numbering=?, dtype=?)`: Clone while tweaking metadata (Existing `BitStringView`)
  - `msb(…) / lsb(…)`: Convenience helper/ Alias to `from_int` with fixed ordering

#### **Inspectors**
  - `binary()` → Zero-padded, raw bit‐string (ordering respected)
  - `bits()` / `array()` → `List[int]` in display order

#### **Converters**
  - `with_numbering(numbering: BitNumbering)`: *new* `BitStringView` with the same integer/value, nbits and dtype
        but a different bit-ordering
  - `widen_to_dtype(dtype: DataType)`:  Mutates `nbits` to at least `dtype.data_width`, sets new dtype, validates value

**Magic**: `__int__`, `__len__`, `__add__`, `__eq__`, `__hash__`, `__repr__`: Make it behave like an `int` that still “remembers” its width & order


In [None]:
i = 1
bita = BitStringView.from_int(integer=i)
bita_lsb = BitStringView.from_int(integer=i, numbering=BitNumbering.LSB)
bitb = BitStringView.from_int(integer=bita)
bitc = BitStringView.from_int(integer=bita_lsb)
bitd = BitStringView.from_array(array=bita)
bite = BitStringView.from_array(array=bita_lsb)
bitf = BitStringView.from_binary(binary=bita)
bitg = BitStringView.from_binary(binary=bita_lsb)


assert (bita == bitb)
assert (bita == bitc)

In [11]:
from qrew.simulation.data_types import BitStringView, BitNumbering, CUInt
b_msb = BitStringView.from_int(13, nbits=4, numbering=BitNumbering.MSB)
print(b_msb)
print(f"bit array: {b_msb.bits()}")
print(f"binary(): {b_msb.binary()}")
print(f"int(b_msb): {int(b_msb)}")
print(f"len(b_msb) = nbits = {len(b_msb)}")
b_lsb = b_msb.with_numbering(BitNumbering.LSB)
print(b_lsb)
print(f"bit array: {b_lsb.bits()}")
print(f"binary(): {b_lsb.binary()}")
print(f"int(b_lsb): {int(b_lsb)}")
print(f"len(b_lsb) = nbits = {len(b_lsb)}")


BitStringView(integer=13, nbits=4, numbering=<BitNumbering.MSB: 0>, dtype=None)
bit array: [1, 1, 0, 1]
binary(): 1101
int(b_msb): 13
len(b_msb) = nbits = 4
BitStringView(integer=13, nbits=4, numbering=<BitNumbering.LSB: 1>, dtype=None)
bit array: [1, 0, 1, 1]
binary(): 1011
int(b_lsb): 13
len(b_lsb) = nbits = 4


In [12]:
BitStringView.lsb(0xA5, nbits=8)

BitStringView(integer=165, nbits=8, numbering=<BitNumbering.LSB: 1>, dtype=None)


`BitStringView` is a **presentation/transport** layer:  
it can wrap *any* integer—typed or not—without pulling in the whole
`DataType` hierarchy.  That makes it ideal for:

* logging & debugging (`logger.debug("%s", bs.binary())`)
* CLI/GUI fields that accept `0b…` / `0x…` literals
* serialisers/deserialisers that need explicit endianness
* teaching examples where you want to show the raw bits

*Adopting `BitStringView` consistently throughout the codebase eliminates
a whole class of endianness and width-mismatch bugs—treat it as the
binary analogue of `str`.*  

#### Classical ↔ Quantum helpers

`is_classical(dtype)` / `is_quantum(dtype)` make it trivial to branch logic
based on the data‑type _domain_.

In [14]:
for dtype in [CBit(), QBit(), tensor_int8, sym_tensor]:
    print(f"{dtype:20} | classical={is_classical(dtype):5} | quantum={is_quantum(dtype):5}")

CBit()               | classical=    1 | quantum=    0
QBit()               | classical=    0 | quantum=    1
TensorType((3, 4))   | classical=    1 | quantum=    0
TensorType((m, n))   | classical=    1 | quantum=    0



#### Utility Functions & Constants

- `is_symbolic(x)`
  Detects SymPy or user-defined symbolic widths.
- `prod(iterable)`
  Product, supports symbolic multiplication.
- `Dyn`
  A singleton placeholder for “dynamic” (unknown) sizes.
- `_bits_for_dtype(dt)`, `_element_type_converter(et)`, `_to_symbolic_int(v)`
  Internal converters to unify Python/NumPy/Torch types, default element types, and string → SymPy symbol.

In [15]:
from qrew.simulation.schema import RegisterSpec, Flow, Signature


# 2. Compilation Layer <code class="filepath">./schema.py</code> — *From Data Types to Wires*
**Goal**<br>
- Translate *data-type* contracts into concrete **registers** and
- **signatures** that the workflow graph compiler can reason about:
- resource counts, type-checking, fan-out, and symbolic shapes.

---




## `RegisterSpec`
<code class="signature">class RegisterSpec(name: str, dtype: DataType | type | alias, _shape: (), flow: Flow = Flow.THRU, variadic: bool = False)</code>


Describes a single **logical wire** (or bundle of wires) in a workflow graph. It stores *all* compile-time information needed for:

* **type checking** (matching `Data` at runtime),  
* **resource estimation** (qubit / bit counts), and  
* **symbolic-shape reasoning** when some dimensions are dynamic.
#### Properties

| property          | description                                                                  |
|-------------------|------------------------------------------------------------------------------|
| `shape`           | Tuple of concrete wire dimensions (Public accessor to `_shape`). Could be a tuple of `int` / `Dyn` / sympy  dimensions. An  empty `()` means a scalar wire/register          |
| `symbolic_shape`  | Same as `shape` but with  placeholders (`SymInt` for torch‑symbolic sizes)   |
| `bitsize`         | Bits *or qubits* **per single payload** (`dtype.data_width`).                |
| `domain`          | `"Q"` if the register’s dtype (or its element_type) is quantum; else `"C"`.  |
| `is_symbolic`     | `True` if any dimension or dtype size is symbolic.                           |
| `total_bits()`    | `payload_bits × fan_out` — total logical bits conveyed by the wire‑bundle.   |
| `all_idxs()`      | Generator over every index tuple given the rectangular `shape`.              |

---

`Flow` is a tiny `enum.Flag` used **only** to tag a register as input / output:

| Enum `Flow` | Meaning | Bitwise Behaviour |
|-------------|---------|-------------------|
| `Flow.LEFT`   | Register is **input-only** to a process. | `Flow.LEFT` |
| `Flow.RIGHT`     | Register is **output-only** from a process. | `Flow.RIGHT` |
| `Flow.THRU`      | Register is both input **and** output (pass-through). | `Flow.LEFT \| Flow.RIGHT` |


Internally these flags let the `Signature` split registers into **lefts** (inputs) and **rights** (outputs).

#### Behaviour Notes


* **Post-init coercion** – if `dtype` is a *class* and `_shape` is non-empty, the class is **instantiated** with that shape so later code always sees a concrete dtype instance.
* **Equality** (`__eq__`) – compares name, flow, shape and *compatible* dtypes (with `Dyn` wildcards considered equal for `TensorType`).
* **Data matching** – `matches_data()` / `matches_data_list()` validate real `Data` objects against the spec; if `variadic=True` the spec may absorb an arbitrary number of sequential `Data` arguments.



### A. Domain & resource cost
$$
\text{total\_bits} \;=\;
\bigl(\texttt{dtype.total\_bits} \text{ or } \texttt{dtype.data\_width}\bigr)
\times
\prod(\texttt{shape})
$$


In [16]:
qbit_reg  = RegisterSpec("qb", QBit(), flow=Flow.THRU)
tensor_c  = RegisterSpec("tc",
               TensorType((2,3), element_type=CInt(8)),   # 6 × 8 bits
               flow=Flow.THRU)

assert qbit_reg.domain  == "Q"      # quantum
assert tensor_c.domain == "C"       # classical
assert qbit_reg.total_bits()  == 1  # 1 qubit
assert tensor_c.total_bits() == 6*8 # 48 bits


### B. Wire-shape ≠ Data-shape


In [17]:
reg = RegisterSpec(
    "scalar8x4",
    dtype=TensorType((3,), element_type=CInt(8)),
    shape=(2,),                 # fan-out
    flow=Flow.LEFT
)

print(f"Spec: {repr(reg)}")
print(f"__str__: {str(reg)}")
print(f"   – bitsize:            {reg.bitsize}")
print(f"   – total_bits():       {reg.total_bits()}")
print(f"   – shape:              {reg.shape}")
print(f"   – dtype:              {reg.dtype}")
print(f"   – dtype.shape:        {reg.dtype.shape}")


Spec: scalar8x4: TensorType((3,)) (shape=(2,)) Flow.LEFT 
__str__: InSpec(name=scalar8x4, dtype=TensorType((3,)), shape=(2,))
   – bitsize:            8
   – total_bits():       48
   – shape:              (2,)
   – dtype:              TensorType((3,))
   – dtype.shape:        (3,)


> **To Do:** add example when `shape == ()`, a dtype *class* is **not** auto-instantiated.
> Scalars stay scalar.


---

## class `Signature` 

An **ordered collection** of `RegisterSpec`s that partitions a process interface
into **inputs** (`Flow.LEFT`), **outputs** (`Flow.RIGHT`), and **through‑wires**
(`Flow.THRU`), i.e., a `Signature` defines the inputs/outputs of an operational schema. It is functional and does not access "states" like parameters (i.e. no raw data like Data). Should be a property of all `Process` subclasses & `CompositeMod` instances.

#### Construction Helpers

| classmethod                               | purpose                                                 |
|-------------------------------------------|---------------------------------------------------------|
| build(**kwargs)                           | Quick ad-hoc signature from keyword args or specs.      |
| build_from_dtypes(**types)                | Like `build` but forces each value to a concrete dtype. |
| build_from_properties(inp_props, out_props)| Convert Process property dicts to a signature.         |
| build_from_data(inputs, out_props)        | Infer from runtime `Data` + output metadata.            |


#### Core API

| member                              | description                                   |
|-------------------------------------|-----------------------------------------------|
| lefts() / rights()                  | Iterate over input- or output-flow specs.     |
| get_left(name) / get_right(name)    | Dict-style access by register name.           |
| groups()                            | Yield `(name, [spec…])` grouped by identifier |
| validate_data_with_register_specs() | Runtime check of `Data` against left specs.   |
| Sequence helpers                    | `sig[i]`, `len(sig)`, iteration, membership.  |

A `Signature` is pure **metadata**; it never contains actual payloads,
only the rules that payloads must satisfy.


In [22]:
#  two classical pass-through wires
a_reg = RegisterSpec("a", CBit())          # THRU  (default flow)
b_reg = RegisterSpec("b", CUInt(8))        # THRU

#  one output-only quantum wire
c_reg = RegisterSpec("c", QBit(), flow=Flow.RIGHT)

print(a_reg)   # OutSpec(name=a, dtype=CBit(), shape=())
print(b_reg)   # OutSpec(name=b, dtype=CUInt(8), shape=())
print(c_reg)   # OutSpec(name=c, dtype=QBit(), shape=())

sig = Signature([a_reg, b_reg, c_reg])

assert [r.name for r in sig.lefts()]  == ["a", "b"]     # inputs
assert [r.name for r in sig.rights()] == ["a", "b", "c"]  # outputs

ThruSpec(name=a, dtype=CBit(), shape=())
ThruSpec(name=b, dtype=CUInt(8), shape=())
OutSpec(name=c, dtype=QBit(), shape=())


*Key point*: registers with `Flow.THRU` appear on **both** sides.

### C. Variadic registers


In [24]:
var_reg = RegisterSpec("args", CBit(), variadic=True)
ret_reg = RegisterSpec("ret",  CBit(), flow=Flow.RIGHT)

sig = Signature([var_reg, ret_reg])
sig

Signature((args: CBit()  Flow.THRU  (variadic), ret: CBit()  Flow.RIGHT ))

---

### Helper Utilities

* `_sanitize_name(name)` — strips spaces/illegal chars, ensures no leading digit.  
* `canonicalize_dtype(value)` — normalises builtin / NumPy / torch dtypes
  to in‑house `DataType` objects (`TensorType`, etc.).  
* `get_shape(v)` — converts any “shape‑like” value into a canonical `tuple`.  
* `qubit_count_for(reg)` — returns `int(reg.total_bits())` for quantum regs,
  else `0`.|

### 5 · Register equivalence (advanced)

<pre style="background:#272822;color:#f8f8f2;padding:0.8em;">
import numpy as np, torch
reg_np  = RegisterSpec("x", np.ndarray, shape=(2,2))
reg_tt  = RegisterSpec("x", TensorType, shape=(2,2))
reg_pt  = RegisterSpec("x", torch.Tensor, shape=(2,2))

assert reg_np == reg_tt == reg_pt
</pre>

`__eq__` honours `Dyn` wildcards, dtype subclasses, and falls back to
`check_dtypes_consistent`.
---

### 6 · Compiler smoke test (adapted from <code class="filepath">test_schema.py</code>)

<pre style="background:#272822;color:#f8f8f2;padding:0.8em;">
import torch, sympy, numpy as np
from qrew.simulation.schema import *

# Domain tagging
qb  = RegisterSpec("qb", QBit())
tc  = RegisterSpec("tc", TensorType((2,3), element_type=CInt(8)))
assert qb.domain == "Q" and tc.domain == "C"

# Resource cross-check with a real torch tensor
t   = torch.ones((2,2,2), dtype=torch.int8)
dt  = TensorType(t.shape, element_type=torch.int8)
reg = RegisterSpec("torch", dt)
assert reg.total_bits() == t.nbytes * 8
</pre>

In [70]:
reg_np  = RegisterSpec("x", np.ndarray, shape=(2,2))
reg_tt  = RegisterSpec("x", TensorType, shape=(2,2))
reg_pt  = RegisterSpec("x", torch.Tensor, shape=(2,2))

assert reg_np == reg_tt == reg_pt

NameError: name 'np' is not defined

---

### 4.5 Helper shortcuts vs. explicit construction

| use-case                            | idiomatic call                           |
|------------------------------------|------------------------------------------|
| quick literals → dtypes            | `Signature.build(x=1, y=8)` → `CBit`, `CUInt(8)` |
| need custom `flow`, `variadic`     | build `RegisterSpec`s, then `Signature([...])` |
| copy registers from another sig    | `Signature([*other_sig])`                |

> **Rule of thumb**: **`Signature.build` is for simple literals.**  
> For anything with shape, flow, or symbolic dims, create `RegisterSpec`s
> explicitly.

---

### 4.6 Common gotchas

| pattern                                     | what happens / fix                                               |
|---------------------------------------------|------------------------------------------------------------------|
| `Signature.build(c=RegisterSpec(...))`      | ❌ `convert_value` treats the whole object as a literal → error.  |
| symbolic **and** `Dyn` mixed in `shape`     | ❌ converter rejects SymPy symbols in shape. Use symbolic *dtype* |
| arithmetic with raw `Dyn`                   | ❌ `TypeError`. Keep `Dyn` unevaluated; combine only at runtime.  |

---

### 4.7 End-to-end sanity check

<pre style="background:#272822;color:#f8f8f2;padding:0.8em;">
from qrew.simulation.data import Data
import numpy as np

sig = Signature.build(img = TensorType((32, 32, 3), element_type=CUInt(8)))  # THRU
img_reg = sig[0]

# a concrete NumPy value
img_val = np.zeros((32, 32, 3), dtype=np.uint8)
datum   = Data(img_val, {"Usage": "Image"})

# verify the data matches the register spec
assert img_reg.total_bits() == 32*32*3*8
img_reg.dtype.assert_valid_classical_val(img_val)
</pre>

With these building blocks you can define process signatures, perform static
resource estimation, and rely on the compiler layer to catch mismatches long
before execution.
