# Neural Networks tutorial 01
## **Architecture**

Showcasing some of the neural network architectures available in quantEM, how to initialize them, and how inputs/outputs work. 

Demos of how to train/use the networks are in subsequent notebooks. 

Arthur McCray  
Jan 6, 2026

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import torch
from quantem.core.ml import DenseNN, CNN2d, CNN3d, CNNDense, HSiren, ConvAutoencoder2d
from torchinfo import summary

from importlib.metadata import version # Temporary fix for version check
print(f"quantEM version: {version("quantem")}")

  from .autonotebook import tqdm as notebook_tqdm


quantEM version: 0.1.7


# Dense NN

Fully-connected, i.e. dense NN
- 1D input data to 1D output data
    - `(BATCH_SIZE, input_dim)` -> `(BATCH_SIZE, output_dim)`


In [3]:
input_dim = 10
output_dim = 4
activation = "relu" # Activation function for hidden layers, default "relu"
dtype = torch.complex64 # default torch.float32

## model architecture can be specified in 2 ways, by declaring the size of each layer individually:
hidden_dims = [128, 128, 128] 
## or by declaring the number of layers and the size of each layer:
num_layers = 3
hidden_size = 128 

dropout = 0 # Dropout probability, default 0
use_batchnorm = False # Whether to use batch normalization, default False
final_activation = "identity" # Activation function for output layer, default nn.Identity()

model = DenseNN(
    input_dim=input_dim,
    output_dim=output_dim,
    hidden_dims=hidden_dims, ## if hidden_dims is specified, num_layers and hidden_size are ignored
    # num_layers=num_layers,
    # hidden_size=hidden_size,
    activation=activation,
    dtype=dtype,
    dropout=dropout,
    use_batchnorm=use_batchnorm,
    final_activation=final_activation,
)

batch_size = 5
input = torch.randn(batch_size, input_dim, dtype=dtype)
output = model(input)

print(f"Model type: {type(model)}")
print(f"Input shape (batch_size, input_dim): {input.shape} -> Output shape (batch_size, output_dim): {output.shape}")
summary(model, input_data=input)

Model type: <class 'quantem.core.ml.dense_nn.DenseNN'>
Input shape (batch_size, input_dim): torch.Size([5, 10]) -> Output shape (batch_size, output_dim): torch.Size([5, 4])


Layer (type:depth-idx)                   Output Shape              Param #
DenseNN                                  [5, 4]                    --
├─ModuleList: 1-1                        --                        --
│    └─Sequential: 2-1                   [5, 128]                  --
│    │    └─Linear: 3-1                  [5, 128]                  1,408
│    │    └─Complex_ReLU: 3-2            [5, 128]                  --
│    └─Sequential: 2-2                   [5, 128]                  --
│    │    └─Linear: 3-3                  [5, 128]                  16,512
│    │    └─Complex_ReLU: 3-4            [5, 128]                  --
│    └─Sequential: 2-3                   [5, 128]                  --
│    │    └─Linear: 3-5                  [5, 128]                  16,512
│    │    └─Complex_ReLU: 3-6            [5, 128]                  --
│    └─Linear: 2-4                       [5, 4]                    516
├─Identity: 1-2                          [5, 4]                    --
Tot

# CNN

## CNN2d

This model is a fully convolutional CNN similar to a U-net architecture. It is called `CNN2d` because its intended for working with 2D images, but the input data size is actually 3D as it assumes there will be a channel dimension, so the data shape is `(channels, height, width)`. For greyscale/single-channel images, the shape of a single input will be `(1, height, width)`. 

- 3D input data -> 3D output data
    - `(BATCH_SIZE, channels, height, width)` -> `(BATCH_SIZE, channels, height, width)`
    
One limitation of CNNs is that, along the convolutional dimensions, the input size must be divisible by `2**num_layers`, i.e. for `num_layers=3`, an input size of `(204, 230)` would fail, but `(200, 232)` would work. 

For this and other models in this notebook, we specify all of the arguments for demonstration, but in most cases the specifics beyond input/output dimensionality can be left with the sensible defaults. 


In [4]:
in_channels = 1 # 1 for greyscale, 3 for RGB
out_channels = 2 # None for same number of output channels as input
start_filters = 16 # Number of filters in first layer, doubles for each layer
num_layers = 3 
num_per_layer = 2 # Number of convolutional blocks per layer
use_skip_connections = True # Whether to use skip connections (True by default)
dtype = torch.float32 # summary looks a little odd for complex dtypes, but the model works fine
dropout = 0 # Dropout probability, default 0
use_batchnorm = False # Whether to use batch normalization, default False
final_activation = "identity" # Activation function for output layer, default nn.Identity()

model = CNN2d(
    in_channels=in_channels,
    out_channels=out_channels,
    start_filters=start_filters,
    num_layers=num_layers,
    num_per_layer=num_per_layer,
    use_skip_connections=use_skip_connections,
    dtype=dtype,
    dropout=dropout,
    use_batchnorm=use_batchnorm,
    final_activation=final_activation,
)

batch_size = 5
image_shape = (256, 256)
input = torch.randn(batch_size, in_channels, *image_shape, dtype=dtype)
output = model(input)

print(f"Model type: {type(model)}")
print(f"Input shape (batch_size, channels, height, width): {input.shape}")
print(f"-> Output shape (batch_size, channels, height, width): {output.shape}")

summary(model, input_data=input)

Model type: <class 'quantem.core.ml.cnn.CNN2d'>
Input shape (batch_size, channels, height, width): torch.Size([5, 1, 256, 256])
-> Output shape (batch_size, channels, height, width): torch.Size([5, 2, 256, 256])


Layer (type:depth-idx)                   Output Shape              Param #
CNN2d                                    [5, 2, 256, 256]          --
├─ModuleList: 1-5                        --                        (recursive)
│    └─Conv2dBlock: 2-1                  [5, 16, 256, 256]         --
│    │    └─Sequential: 3-1              [5, 16, 256, 256]         2,480
├─MaxPool2d: 1-2                         [5, 16, 128, 128]         --
├─ModuleList: 1-5                        --                        (recursive)
│    └─Conv2dBlock: 2-2                  [5, 32, 128, 128]         --
│    │    └─Sequential: 3-2              [5, 32, 128, 128]         13,888
├─MaxPool2d: 1-4                         [5, 32, 64, 64]           --
├─ModuleList: 1-5                        --                        (recursive)
│    └─Conv2dBlock: 2-3                  [5, 64, 64, 64]           --
│    │    └─Sequential: 3-3              [5, 64, 64, 64]           55,424
├─MaxPool2d: 1-6                         [5, 64

## CNN3d

Basically the same as `CNN2d` but for an additional input dimension. Now the input/output data is of shape `(channels, depth, height, width)`. Note that now the depth input dimension must also be divisible by `2**num_layers`. 

- 4D input data -> 4D output data
    - `(BATCH_SIZE, channels, depth, height, width)` -> `(BATCH_SIZE, channels, depth, height, width)`

In [5]:
in_channels = 1 # 1 for greyscale, 3 for RGB
out_channels = 2 # None for same number of output channels as input
start_filters = 16 # Number of filters in first layer, doubles for each layer
num_layers = 3 
num_per_layer = 2 # Number of convolutional blocks per layer
use_skip_connections = True # Whether to use skip connections (True by default)
dtype = torch.float32 # summary looks a little odd for complex dtypes, but the model works fine
dropout = 0 # Dropout probability, default 0
use_batchnorm = False # Whether to use batch normalization, default False
final_activation = "identity" # Activation function for output layer, default nn.Identity()

model = CNN3d(
    in_channels=in_channels,
    out_channels=out_channels,
    start_filters=start_filters,
    num_layers=num_layers,
    num_per_layer=num_per_layer,
    use_skip_connections=use_skip_connections,
    dtype=dtype,
    dropout=dropout,
    use_batchnorm=use_batchnorm,
    final_activation=final_activation,
)

batch_size = 5
image_shape = (8, 256, 256)
input = torch.randn(batch_size, in_channels, *image_shape, dtype=dtype)
output = model(input)

print(f"Model type: {type(model)}")
print(f"Input shape (batch_size, channels, depth, height, width): {input.shape}")
print(f"-> Output shape (batch_size, channels, depth, height, width): {output.shape}")

summary(model, input_data=input)

Model type: <class 'quantem.core.ml.cnn.CNN3d'>
Input shape (batch_size, channels, depth, height, width): torch.Size([5, 1, 8, 256, 256])
-> Output shape (batch_size, channels, depth, height, width): torch.Size([5, 2, 8, 256, 256])


Layer (type:depth-idx)                   Output Shape              Param #
CNN3d                                    [5, 2, 8, 256, 256]       --
├─ModuleList: 1-5                        --                        (recursive)
│    └─Conv3dBlock: 2-1                  [5, 16, 8, 256, 256]      --
│    │    └─Sequential: 3-1              [5, 16, 8, 256, 256]      7,376
├─MaxPool3d: 1-2                         [5, 16, 4, 128, 128]      --
├─ModuleList: 1-5                        --                        (recursive)
│    └─Conv3dBlock: 2-2                  [5, 32, 4, 128, 128]      --
│    │    └─Sequential: 3-2              [5, 32, 4, 128, 128]      41,536
├─MaxPool3d: 1-4                         [5, 32, 2, 64, 64]        --
├─ModuleList: 1-5                        --                        (recursive)
│    └─Conv3dBlock: 2-3                  [5, 64, 2, 64, 64]        --
│    │    └─Sequential: 3-3              [5, 64, 2, 64, 64]        166,016
├─MaxPool3d: 1-6                         [5, 6

## CNN -> Dense

3D input `(BATCH_SIZE, channels, height, width)` -> 1D output `(BATCH_SIZE, output_dim)`

Unlike for a fully convolutional CNN (as demo'd above), the model will only work for a single image size, which must be set at initialization. 

In [6]:
in_channels = 1 # 1 for greyscale, 3 for RGB
output_dim = 10
image_shape = (128, 128) 
start_filters = 16 # Number of filters in first layer, doubles for each layer
cnn_num_layers = 4
cnn_num_per_layer = 2 # Number of convolutional blocks per layer
dense_num_layers = 2
dense_hidden_size = 128 # could instead specify dense_hidden_dims, a list of layer sizes
dtype = torch.float32 
dropout = 0 # Dropout probability, default 0
activation = 'relu'
final_activation = "identity" # Activation function for output layer, default nn.Identity()
use_batchnorm = False # Whether to use batch normalization, default False

model = CNNDense(
    in_channels=in_channels,
    output_dim = output_dim,
    image_shape = image_shape,
    start_filters = start_filters,
    cnn_num_layers = cnn_num_layers,
    cnn_num_per_layer = cnn_num_per_layer,
    dense_num_layers = dense_num_layers,
    dense_hidden_size = dense_hidden_size,
    dtype = dtype,
    dropout = dropout,
    activation = activation,
    final_activation = final_activation,
    use_batchnorm = use_batchnorm,
)

batch_size = 5
input = torch.randn(batch_size, in_channels, *image_shape, dtype=dtype)
output = model(input)

print(f"Model type: {type(model)}")
print(f"Input shape (batch_size, channels, height, width): {input.shape}")
print(f"-> Output shape (batch_size, output_dim): {output.shape}")

summary(model, input_data=input)

Model type: <class 'quantem.core.ml.cnn_dense.CNNDense'>
Input shape (batch_size, channels, height, width): torch.Size([5, 1, 128, 128])
-> Output shape (batch_size, output_dim): torch.Size([5, 10])


Layer (type:depth-idx)                   Output Shape              Param #
CNNDense                                 [5, 10]                   --
├─ModuleList: 1-7                        --                        (recursive)
│    └─Conv2dBlock: 2-1                  [5, 16, 128, 128]         --
│    │    └─Sequential: 3-1              [5, 16, 128, 128]         2,480
├─MaxPool2d: 1-2                         [5, 16, 64, 64]           --
├─ModuleList: 1-7                        --                        (recursive)
│    └─Conv2dBlock: 2-2                  [5, 32, 64, 64]           --
│    │    └─Sequential: 3-2              [5, 32, 64, 64]           13,888
├─MaxPool2d: 1-4                         [5, 32, 32, 32]           --
├─ModuleList: 1-7                        --                        (recursive)
│    └─Conv2dBlock: 2-3                  [5, 64, 32, 32]           --
│    │    └─Sequential: 3-3              [5, 64, 32, 32]           55,424
├─MaxPool2d: 1-6                         [5, 64

# Implicit Neural Representation networks (INR)

A neural network that learns to represent a continuous function by mapping coordinates to values.

- Input: Coordinates `(N, in_features)` → Output: Values at those coordinates `(N, out_features)`

- We primarily use `HSiren` networks, but there are many types of INRs. 
- INR networks are basically dense networks with special activation functions
    - `Siren` type networks use sine activations throughout
    - `HSiren` uses hyperbolic sine (`sinh`) in the first layer - better for complex-valued data and certain function classes

- `HSiren` has a simple architecture, but it also includes a couple of additional parameters that typically do not need to be altered but which can be tuned for optimal performance:
    - `first_omega_0`, `hidden_omega_0`: Control frequency of sine activations (higher = more high-frequency details)
    - `alpha`: Weight initialization scaling factor (affects training dynamics)


In [7]:
in_features = 3  # Dimensionality of input coordinates (3 for 3D: z, y, x; 2 for 2D: y, x)
out_features = 1  # Dimensionality of output (1 for scalar field, 3 for vector field, etc.)
hidden_layers = 3  # Number of hidden layers
hidden_features = 256  # Number of features in each hidden layer
first_omega_0 = 30.0  # Frequency scaling for first layer (default: 30)
hidden_omega_0 = 30.0  # Frequency scaling for hidden layers (default: 30)
alpha = 1.0  # Weight initialization scaling factor (default: 1.0)
dtype = torch.complex64  # Can be torch.float32 or torch.complex64
final_activation = "identity"  # Final activation function

model = HSiren(
    in_features=in_features,
    out_features=out_features,
    hidden_layers=hidden_layers,
    hidden_features=hidden_features,
    first_omega_0=first_omega_0,
    hidden_omega_0=hidden_omega_0,
    alpha=alpha,
    dtype=dtype,
    final_activation=final_activation,
)

# Helper method for creating an equispaced grid of coordinates
# The grid dimensionality must match in_features
bounds = ((0, 1), (0, 1), (0, 1))  # Volume bounds (min, max) for each dimension
sampling = (0.1, 0.1, 0.1)  # Sampling interval for each dimension
input = model.make_equispaced_grid(bounds, sampling)  # Returns shape (N, in_features)

# The model can be evaluated at any arbitrary coordinates
output = model(input)

print(f"Model type: {type(model)}")
print(f"Input shape (N, in_features): {input.shape}")
print(f"-> Output shape (N, out_features): {output.shape}")

summary(model, input_data=input)

Model type: <class 'quantem.core.ml.inr.HSiren'>
Input shape (N, in_features): torch.Size([1331, 3])
-> Output shape (N, out_features): torch.Size([1331, 1])


Layer (type:depth-idx)                   Output Shape              Param #
HSiren                                   [1331, 1]                 --
├─Sequential: 1-1                        [1331, 1]                 --
│    └─SineLayer: 2-1                    [1331, 256]               --
│    │    └─Linear: 3-1                  [1331, 256]               1,024
│    └─SineLayer: 2-2                    [1331, 256]               --
│    │    └─Linear: 3-2                  [1331, 256]               65,792
│    └─SineLayer: 2-3                    [1331, 256]               --
│    │    └─Linear: 3-3                  [1331, 256]               65,792
│    └─SineLayer: 2-4                    [1331, 256]               --
│    │    └─Linear: 3-4                  [1331, 256]               65,792
│    └─Linear: 2-5                       [1331, 1]                 257
│    └─Identity: 2-6                     [1331, 1]                 --
Total params: 198,657
Trainable params: 198,657
Non-trainable params:

# Convolutional Autoencoder

Convolutional autoencoder for dimensionality reduction and feature extraction from images. The model encodes an image into a lower-dimensional latent representation, then decodes it back to reconstruct the original image.

- 3D input `(BATCH_SIZE, channels, height, width)` -> 1D latent `(BATCH_SIZE, latent_dim)` -> 3D output `(BATCH_SIZE, channels, height, width)`

- Latent normalization options: Control the structure of the latent space for different clustering algorithms
  - `"none"`: Raw latent vectors
  - `"l2"`: Unit hypersphere (good for DBSCAN/K-means)
  - `"layer"`: Gaussian-like distribution (good for GMM)
  - `"tanh"`: Bounded in [-1, 1]
  
- Forward pass returns both the output and the latent space representation: `output, latent = model(input)`
- Can encode/decode separately: 
    - `latent = model.encode(input)` 
    - `output = model.decode(latent)`

In [None]:
input_channels = 1  # 1 for a greyscale image, 3 for RGB, 4 for RGBA, etc.
image_shape = (256, 256)  # shape of the image input (same as output) (height, width)
shape = (input_channels, *image_shape)
start_filters = 16
latent_dim = 64
latent_normalization = "l2" # "none", "l2", "layer", "tanh"
num_layers = 3
num_per_layer = 2
dtype = torch.float32
activation = "relu"
final_activation = "relu"


model = ConvAutoencoder2d(
    input_size=image_shape,
    start_filters=start_filters,
    latent_dim=latent_dim,
    latent_normalization=latent_normalization,
    num_layers=num_layers,
    num_per_layer=num_per_layer,
    dtype=dtype,
    activation=activation,
    final_activation=final_activation,
)

batch_size = 5
input_shape = (batch_size, *shape)
input = torch.rand(*input_shape, dtype=dtype)

output, latent_repr = model.forward(input)
## model.forward is equivalent to:
## latent = model.encode(input)
## output = model.decode(latent)
## return output, latent

print(f"Model type: {type(model)}")
print(f"Input shape (batch_size, channels, height, width): {input.shape}")
print(f"-> latent shape (batch_size, latent_dim): {latent_repr.shape}")
print(f"-> Output shape (batch_size, channels, height, width): {output.shape}")

summary(model, input_data=input)

Model type: <class 'quantem.core.ml.conv_autoencoder.ConvAutoencoder2d'>
Input shape (batch_size, channels, height, width): torch.Size([5, 1, 256, 256])
-> latent shape (batch_size, latent_dim): torch.Size([5, 64])
-> Output shape (batch_size, channels, height, width): torch.Size([5, 1, 256, 256])


Layer (type:depth-idx)                   Output Shape              Param #
ConvAutoencoder2d                        [5, 1, 256, 256]          --
├─ModuleList: 1-5                        --                        (recursive)
│    └─Conv2dBlock: 2-1                  [5, 16, 256, 256]         --
│    │    └─Sequential: 3-1              [5, 16, 256, 256]         2,544
├─MaxPool2d: 1-2                         [5, 16, 128, 128]         --
├─ModuleList: 1-5                        --                        (recursive)
│    └─Conv2dBlock: 2-2                  [5, 32, 128, 128]         --
│    │    └─Sequential: 3-2              [5, 32, 128, 128]         14,016
├─MaxPool2d: 1-4                         [5, 32, 64, 64]           --
├─ModuleList: 1-5                        --                        (recursive)
│    └─Conv2dBlock: 2-3                  [5, 64, 64, 64]           --
│    │    └─Sequential: 3-3              [5, 64, 64, 64]           55,680
├─MaxPool2d: 1-6                         [5, 64