# Tutorial: Working with `data_types.py`

This Jupyter notebook is an **annotated walkthrough** of the _data‑types_ layer in your
workflow / resource‑estimation code‑base.  
The goal is to help new contributors **understand**, **experiment** with, and **extend**
the fundamental data‑type building blocks that underpin:

* **Resource estimation** — e.g. _how many (qu)bits does a register need?_
* **Shape & type checking** — reasoning about symbolic tensor shapes and composite
  data structures.
* **Trace construction** — assembling DAG nodes that manipulate these data objects.

> **Prerequisites**  
> The notebook assumes the repository is on your Python path (or that the notebook
> lives in the repo root).  Adjust `sys.path` below if needed.

In [8]:
sep_str = '-'*60

In [35]:
# -- Python path -----------------------------------------------------------------
import sys, os, pathlib
repo_root = pathlib.Path.cwd()  # adapt if the notebook sits elsewhere
# if str(repo_root) not in sys.path:
#     sys.path.insert(0, str(repo_root))
current_dir = os.getcwd()
sys.path.append(current_dir[:-9])

# -- Core data‑type imports ------------------------------------------------------
from qrew.simulation.data_types import (
    BitNumbering,BitStringView,
    Dyn,                   # symbolic "dynamic" dimension marker
    CBit, CUInt, CInt, CFloat,
    CFxp, String,Struct,
    QBit, QAny, QInt, QUInt, BQUInt, QFxp,
    TensorType, MatrixType,
    is_classical, is_quantum,
)


| Section | Outcome |
|---------|---------|
| Classical scalars | Encode bits, signed/unsigned integers, fixed-point, IEEE-754 floats, strings |
| Classical scalars | Encode bits, signed/unsigned integers, fixed-point, IEEE-754 floats, strings |
| Quantum scalars   | Model qubits, *n*-qubit integers, and fixed-point registers à la Qualtran |
| Dyn & symbolics   | Represent unknown or symbolic widths \(n, m, \ldots\) and the sentinel **Dyn** |
| Composites        | Build `TensorType` ($\mathbb R^{s_1\times\cdots\times s_r}$) and `CStruct` (records) |
| Consistency engine| Use `check_dtypes_consistent` to check two dtypes compatible under ⟨global, C, Q⟩ ladders |
| Resource formulas | Compute qubit/bit counts and memory footprints |


# 1. `DataType` — the universal interface for register element types
Every concrete dtype (classical *or* quantum) derives from `DataType`. Think of it as the “adapter” that lets high-level operations treat qubits or bits like well-typed scalars.


#### 🔑 **What you *must* implement**

| Method / Property | Descr. |
|-------------------|---------|
| `data_width` | **int** — number of fundamental units (**bits or qubits**) needed to store one value. |
| `to_bits(x)` | Convert a single value `x` → list of `int` bits **[MSB … LSB]**. |
| `from_bits(bits)` | Inverse of `to_bits`: reconstruct the value encoded by `bits`. |
| `get_classical_domain()` | Iterable of all representable classical values (raise if the domain is huge or ambiguous). |
| `assert_valid_classical_val(val, debug_str='val')` | Raise a clear `ValueError` if `val` is outside the domain. |


#### 🛠 **Convenience utilities included for free**

| Helper | What it does |
|--------|--------------|
| `to_bits_array(x_array)` / `from_bits_array(bits_array)` | NumPy-vectorized versions of `to_bits` / `from_bits`. |
| `assert_valid_classical_val_array(...)` | Batch validation analogue of `assert_valid_classical_val`. |
| `is_symbolic()` | Returns **True** when the dtype’s width contains SymPy symbols (handy for circuit templates). |
| `iteration_length_or_zero()` | Safe helper for bounded types (`BQUInt` etc.); returns the concrete iteration length or `0` if symbolic. |


> **Tip:**  When implementing a new dtype, nail down `to_bits`, `from_bits`, and `assert_valid_classical_val` first — the array helpers will then “just work.”
---


## 1.1. Quantum Data Types (`QType`)

_All subclasses inherit from `QType` → `DataType` and expose a common bit-level API (`to_bits`, `from_bits`, `assert_valid_classical_val`, …)._

| QType&nbsp;(constructor)      | `data_width` / `num_qubits` | Classical domain<sup>†</sup>                               | Purpose & notes                                                                                                   |
|-------------------------------|-----------------------------|------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------|
| `QBit()`                      | `1`                         | `{0, 1}`                                                   | Single qubit viewed in the computational basis. Simplest building block for larger registers.                      |
| `QInt(n)`                     | `n`                         | $[-2^{\,n-1},\,2^{\,n-1})$                                 | *Signed* two’s-complement integer. High (bit 0) is the sign-bit. Arithmetic wraps mod $2^n$.                       |
| `QUInt(n)`                    | `n`                         | $[0,\,2^{\,n})$                                            | *Unsigned* integer. Developer manages wrap-around semantics on overflow (C-style).                                 |
| `BQUInt(bitsize, L)`          | `bitsize`                   | $[0,\,L)$ with $L\le 2^{\texttt{bitsize}}$                 | Bounded unsigned integer; ideal for coherent for-loop indices. `iteration_length = L` may be symbolic.             |
| `QAny(n)`                     | `n`                         | *Ambiguous* — delegates to `QUInt(n)` when coerced        | Opaque register of *n* qubits used when the specific dtype is unknown/irrelevant. Avoid when a precise domain matters. |
| `QFxp(w, f, signed=False)`    | `w`                         | Unsigned → $[0,\,2^{w-f})$  <br>Signed → $[-2^{w-f-1},\,2^{w-f-1})$  <br>**Step** $2^{-f}$ | Fixed-point real number with `f` fractional bits. Backed by `QUInt` (unsigned) or `QInt` (signed).  `float = int · 2^{-f}`. |

<sup>†</sup> “Classical domain” lists the set of classically representable values (i.e. basis states, ignoring superposition).  


In [None]:
# TODO
# Make examples of 
# 1. `QBit()`  one logical qubit  
# 2. `QInt(w)`  ($w≥1$) signed integer in two’s-complement  
# 3. `QUInt(w)` unsigned integer in $[0,2^w\!-\!1]$

qbit = QBit()
print("data_width (qubits):", qbit.data_width)
print("is_quantum:", is_quantum(qbit), "| is_classical:", is_classical(qbit))

data_width (qubits): 1
is_quantum: True | is_classical: False


## 1.2 · Classical Data Types (`CType`)

_Concrete classical dtypes all derive from `CType` → `NumericType` → `DataType` and therefore share the universal bit-level API (`to_bits`, `from_bits`, `assert_valid_classical_val`, …)._

- **Primitive bit and integers**: `CBit`, `CInt`, `CUInt`

- **Fixed-point, IEEE float, strings**: `CFxp`, `CFloat`, `String`

### a. Scalar / “Atomic” Classical Dtypes

| CType&nbsp;(constructor)                   | `data_width` / `bit_width` | Classical domain<sup>†</sup>                             | Purpose & notes                                                                                     |
|-------------------------------------------|---------------------------|----------------------------------------------------------|-----------------------------------------------------------------------------------------------------|
| `CBit()`                                  | $n=1$                       | `{0, 1}`                                                 | Single classical bit.                                                                               |
| `CInt(n)`                                 | $n$                       | $[-2^{\,n-1},\,2^{\,n-1})$                               | Signed two’s-complement integer with $n$ bits. Sign-bit at MSB; wraps mod $2^n$.                                  |
| `CUInt(n)`                                | $n$                    | $[0,\,2^{\,n})$                                          | Unsigned integer with $n$ bits. Developer handles overflow (wrap or error).                                       |
| `CFxp(total, frac, signed=False)`         | `w=total`                   | Unsigned → $[0,\,2^{\,\texttt{total-frac}})$<br>Signed → $[-2^{\,\texttt{total-frac-1}},\,2^{\,\texttt{total-frac-1}})$<br>**Step** $2^{-\texttt{frac}}$ | Fixed-point real number (`frac` fractional bits). Re-uses `CInt`/`CUInt` for raw bit access.        |
| `CFloat(n)`                               | $$n\in\{8,16,32,64\}$$       | IEEE-754 range for the chosen width                      | Classical floating-point; (de)serialises via `struct.pack` / `struct.unpack`.                       |
| `String(max_len)`                         | `w = 8 × max_len`             | All ASCII strings of length ≤ `max_len`                  | Null-padded byte string. Each char occupies one byte.                                               |

<sup>†</sup> *“Classical domain” = set of basis-state values that can be represented exactly; superpositions are a quantum artefact and therefore not listed here.*


In [33]:
cbit  = CBit()
cint8 = CInt(8)
cint64 = CInt(16)
cuint32 = CUInt(32)
cuint64 = CUInt(64)
print(f"{sep_str}\nPrimitive bit and integers:")
for dt in (cbit, cint8, cint64, cuint32,cuint64):
    print(f"    {dt:<10}  width = {dt.data_width} bits, nbytes={dt.nbytes}")

print(f"{sep_str}\nFixed-point, IEEE float, string ...")

fxp64  = CFxp(16, 8)     # 16-bit fixed point, 8 frac bits
flt32  = CFloat(32)
string = String(4)

for c in (fxp64, flt32, string):
    print(f"    {c:15}  width={c.data_width} bits  nbytes={c.nbytes}")


------------------------------------------------------------
Primitive bit and integers:
    CBit()      width = 1 bits, nbytes=1
    CInt(8)     width = 8 bits, nbytes=1
    CInt(16)    width = 16 bits, nbytes=2
    CUInt(32)   width = 32 bits, nbytes=4
    CUInt(64)   width = 64 bits, nbytes=8
------------------------------------------------------------
Fixed-point, IEEE float, string ...
    CFxp(16, 8)      width=16 bits  nbytes=2
    CFloat(32)       width=32 bits  nbytes=4
    String(32)       width=32 bits  nbytes=4


# 3. Composite / Container Classical Dtypes
### `Struct(fields: dict[str, DataType])`
Packed **record / struct** akin to a C `struct` or Rust `struct`.

#### Main properties
* `fields` – ordered mapping `name → dtype` (order preserved)  
* `data_width` – sum of field widths  
* `nbytes`, `total_bits` – byte / bit totals (recursive if a field is itself composite)
* `field_order` – list of keys in declaration order

#### Core methods
* `to_bits(value_dict)` – concatenates each field’s bitstring
* `from_bits(bits)` – slices the concatenation back into a `dict`
* `assert_valid_classical_val(value_dict)` – field-wise validation



In [48]:

# A composite structure {id: uint16, mass: float32, label: 8-char string}
particle = Struct(
    fields={
        "id":    CUInt(16),
        "mass":  CFloat(32),
        "label": String(8),
    }
)

print(f"\n{particle}  ––  is_classical: {is_classical(particle)}")
print(f"    - data_width (bits): {particle.data_width}")
print(f"    - nbytes:            {particle.nbytes}")
print(f"    - total_bits:        {particle.total_bits}")


Struct(112)  ––  is_classical: True
    - data_width (bits): 112
    - nbytes:            14
    - total_bits:        112


---
#### <b> N-Dim Data Types </b>
### `TensorType(shape: Tuple[int, …], element_type: DataType | type)`
Represents an **N-dimensional** array whose **elements** are themselves scalar dtypes
(e.g. `CFloat(32)`, `CUInt(8)`). Conceptually similar to a NumPy ndarray that carries a bit-level contract.

##### Properties
* `shape`, `element_type`, optional **`val: np.ndarray`**
* `data_width` – bits **per element**  
* `rank` – `len(shape)`
* `nelement()` – total number of elements  
* `element_size()` / **`bytes_per_element`** – bytes per element (`data_width // 8`)
* **`nbytes`** – total bytes (`nelement × bytes_per_element`)
* **`total_bits`** – `nelement × data_width` (alias `memory_in_bytes × 8`)
##### Methods
* `to_bits(x)` / `from_bits(bits)` – naïve flatten ↔ reconstruct
* `multiply(other)` – broadcast-style shape multiplication
* `assert_valid_classical_val(val)` – validates `val.shape` & `val.dtype`
* Standard `DataType` helpers (`to_bits_array`, `__str__`, …) come for free.


In [42]:
# A 3×4x4 tensor of 32‑bit floats
tensor = TensorType(shape=(3, 4, 4), element_type=CFloat(32))

print(f"{tensor}  ––    rank: {tensor.rank} | is_classical: {is_classical(tensor)} ")
print(f"    - nelements: {tensor.nelement()}")
print(f"    - bytes per element: {tensor.bytes_per_element}")
print(f"    - total bytes = {tensor.nbytes}")
print(f"    - total bits = {tensor.total_bits}")


TensorType((3, 4, 4))  ––    rank: 3 | is_classical: True 
    - nelements: 48
    - bytes per element: 4
    - total bytes = 192
    - total bits = 1536


---
### `MatrixType(rows, cols, element_type=float)`
A **rank-2 convenience subclass** of `TensorType` (will eventually be folded into it).

#### Extras on top of `TensorType`
* Properties: `rows`, `cols`
* `multiply(other)` – checks classic matrix-multiply compatibility (`self.cols == other.rows`)
* Inherits all size helpers (`rank == 2`, `nbytes`, …)


In [43]:

# A 4×4 matrix of ...
mat = MatrixType(3, 4, element_type=CFloat(32))

print(f"\n{mat}  ––    rank: {mat.rank} | is_classical: {is_classical(mat)} ")
print(f"    - nelements: {mat.nelement()}")
print(f"    - bytes per element: {mat.bytes_per_element}")
print(f"    - total bytes = {mat.nbytes}")
print(f"    - total bits = {mat.total_bits}")



Matrix(3, 4)  ––    rank: 2 | is_classical: True 
    - nelements: 12
    - bytes per element: 4
    - total bytes = 48
    - total bits = 384


In [49]:
A = MatrixType(2, 3, element_type=CFloat(32))
B = MatrixType(3, 1, element_type=CFloat(32))

C = A.multiply(B)
print("A:", A, "| B:", B, "| A×B:", C)

A: Matrix(2, 3) | B: Matrix(3, 1) | A×B: Matrix(2, 1)


The struct’s data_width is the sum of its fields: 16 (id) + 32 (mass) + 8×8 (label) = 112 bits = 14 bytes

## 4. Symbolic & Dynamic shapes

A `TensorType` wraps an **element type** (another `DataType`) and a **shape**.

* Use concrete integers for fixed shapes, or  
* `Dyn` / SymPy symbols for symbolic dimensions.

In [47]:
# A 3×4 tensor]
tensor_int8 = TensorType(shape=(3, 4), element_type=CInt(8))

print(f"{tensor_int8}  ––    rank: {tensor_int8.rank} | is_symbolic: {tensor_int8.is_symbolic()} ")
print(f"    - nelements: {tensor_int8.nelement()}")
print(f"    - bytes per element: {tensor_int8.bytes_per_element}")
print(f"    - total bytes = {tensor_int8.nbytes}")
print(f"    - total bits = {tensor_int8.total_bits}")

# A *symbolic* M×N tensor of ints
from sympy import symbols
M, N = symbols('m n', positive=True, integer=True)
sym_tensor = TensorType(shape=(M, N), element_type=CInt(8))

print(f"{sym_tensor}  ––    rank: {sym_tensor.rank} | is_symbolic: {sym_tensor.is_symbolic()}")
print(f"    - nelements: {sym_tensor.nelement()}")
print(f"    - bytes per element: {sym_tensor.bytes_per_element}")
print(f"    - total bytes = {sym_tensor.nbytes}")
print(f"    - total bits = {sym_tensor.total_bits}")

TensorType((3, 4))  ––    rank: 2 | is_symbolic: False 
    - nelements: 12
    - bytes per element: 1
    - total bytes = 12
    - total bits = 96
TensorType((m, n))  ––    rank: 2 | is_symbolic: False
    - nelements: m*n
    - bytes per element: 1
    - total bytes = m*n
    - total bits = 8*m*n


## 4. `MatrixType` – matrix‑specific helpers

`MatrixType(r, c)` specialises `TensorType` and provides helpers like
`.rows`, `.cols`, and a `.multiply()` method that enforces shape rules.

A: Matrix(2, 3) | B: Matrix(3, 1) | A×B: Matrix(2, 1)


## 5. Classical ↔ Quantum helpers

`is_classical(dtype)` / `is_quantum(dtype)` make it trivial to branch logic
based on the data‑type _domain_.

In [None]:
for dtype in [CBit(), QBit(), vec, sym_tensor]:
    print(f"{dtype:20} | classical={is_classical(dtype):5} | quantum={is_quantum(dtype):5}")

---
# Compilation Layer <code class="filepath">./schema.py</code>

## `RegisterSpec`

```python
class RegisterSpec(
    name: str,
    dtype: DataType | type | GenericAlias,
    _shape: Tuple[int | Dyn, …] = (),
    flow: Flow = Flow.THRU,
    variadic: bool = False
)
```

Represents **one logical wire** (or bundle of wires) in a workflow graph.  
It stores *all* compile-time information needed for:

* **type checking** (matching `Data` at runtime),  
* **resource estimation** (qubit / bit counts), and  
* **symbolic-shape reasoning** when some dimensions are dynamic.

### Key properties

| Property           | Meaning |
|--------------------|---------|
| `shape`            | Tuple of concrete wire dimensions (empty `()` ⇢ scalar wire). |
| `symbolic_shape`   | Same as `shape` but may contain `SymInt` placeholders. |
| `bitsize`          | Bits / qubits **per payload** (`dtype.data_width`). |
| `total_bits()`     | `payload_bits × prod(shape)` – total logical bits on the bundle. |
| `domain`           | `"Q"` if the (possibly nested) dtype is quantum, else `"C"`. |
| `is_symbolic`      | `True` if any dimension or dtype size is symbolic. |
| `all_idxs()`       | Generator over every index tuple within `shape`. |

### Flow flags
`Flow` is a tiny `enum.Flag` used **only** to tag a register as input / output:

| Flag            | Typical direction |
|-----------------|-------------------|
| `Flow.LEFT`     | input-only |
| `Flow.RIGHT`    | output-only |
| `Flow.THRU`     | input **and** output (`LEFT \| RIGHT`) |

Internally these flags let the `Signature` split registers into **lefts** (inputs) and **rights** (outputs).

### Behaviour notes

* **Post-init coercion** – if `dtype` is a *class* and `_shape` is non-empty, the class is **instantiated** with that shape so later code always sees a concrete dtype instance.
* **Equality** (`__eq__`) – compares name, flow, shape and *compatible* dtypes (with `Dyn` wildcards considered equal for `TensorType`).
* **Data matching** – `matches_data()` / `matches_data_list()` validate real `Data` objects against the spec; if `variadic=True` the spec may absorb an arbitrary number of sequential `Data` arguments.


---

## `Signature`
An **ordered collection** of `RegisterSpec`s that defines the public interface of a `Process` or `CompositeMod`.

```python
sig = Signature.build(
    x       = QBit(),                 # LEFT (default)
    y       = QBit(),
    result  = RegisterSpec("result", QBit(), flow=Flow.RIGHT)
)
```