# TCN MODEL implemented in Torch

[Wiese et al., Quant GANs: Deep Generation of Financial Time Series, 2019](https://arxiv.org/abs/1907.06673)

For both the generator and the discriminator we used TCNs with skip connections. Inside the TCN architecture temporal blocks were used as block modules. A temporal block consists of two dilated causal convolutions and two PReLUs (He et al., 2015) as activation functions. The primary benefit of using temporal blocks is to make the TCN more expressive by increasing the number of non-linear operations in each block module. A complete definition is given below.

**Definition B.1 (Temporal block)**. Let $N_I, N_H, N_O ∈ \Bbb{N}$ denote the input, hidden and output dimension and let $D,K ∈ \mathbb{N}$ denote the dilation and the kernel size. Furthermore, let $w_1, w_2$ be two dilated causal convolutional layers with arguments $(N_I, N_H, K, D)$  and $(N_H,N_O,K,D)$ respectively and
let $φ_1, φ_2 : \mathbb{R} → \mathbb{R}$ be two PReLUs. The function $f : \mathbb{R}^{N_I×(2D(K−1)+1)} → \mathbb{R}^{N_O}$ defined by
$$f(X) = φ_2 ◦ w_2 ◦ φ_1 ◦ w_1(X)$$
is called temporal block with arguments $(N_I,N_H,N_O,K,D)$.

The TCN architecture used for the generator and the discriminator in the pure TCN and C-SVNN model is illustrated in Table 3. Table 4 shows the input, hidden and output dimensions of the different models. Here, G abbreviates the generator and D the discriminator. Note that for all models, except the generator of the C-SVNN, the hidden dimension was set to eighty. The kernel size of each temporal block, except the first one, was two. Each TCN modeled a RFS of 127.

<style type="text/css">
.tg  {border-collapse:collapse;border-spacing:0;}
.tg td{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;
  overflow:hidden;padding:10px 5px;word-break:normal;}
.tg th{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;
  font-weight:normal;overflow:hidden;padding:10px 5px;word-break:normal;}
.tg .tg-ik58{background-color:#2f2f2f;border-color:inherit;text-align:left;vertical-align:top}
.tg .tg-0pky{border-color:inherit;text-align:left;vertical-align:top}
</style>
<h3>Table 3</h3>
<table class="tg">
<thead>
  <tr>
    <th class="tg-ik58">Module Name</th>
    <th class="tg-ik58">Arguments</th>
  </tr>
</thead>
<tbody>
  <tr>
    <td class="tg-0pky">Temporal block 1</td>
    <td class="tg-0pky">(N<sub>I</sub>, N<sub>H</sub>, N<sub>H</sub>, 1, 1)</td>
  </tr>
  <tr>
    <td class="tg-0pky">Temporal block 2</td>
    <td class="tg-0pky">(N<sub>I</sub>, N<sub>H</sub>, N<sub>H</sub>, 2, 1)</td>
  </tr>
  <tr>
    <td class="tg-0pky">Temporal block 3</td>
    <td class="tg-0pky">(N<sub>I</sub>, N<sub>H</sub>, N<sub>H</sub>, 2, 2)</td>
  </tr>
  <tr>
    <td class="tg-0pky">Temporal block 4</td>
    <td class="tg-0pky">(N<sub>I</sub>, N<sub>H</sub>, N<sub>H</sub>, 2, 4)</td>
  </tr>
  <tr>
    <td class="tg-0pky">Temporal block 5</td>
    <td class="tg-0pky">(N<sub>I</sub>, N<sub>H</sub>, N<sub>H</sub>, 2, 8)</td>
  </tr>
  <tr>
    <td class="tg-0pky">Temporal block 6</td>
    <td class="tg-0pky">(N<sub>I</sub>, N<sub>H</sub>, N<sub>H</sub>, 2, 16)</td>
  </tr>
  <tr>
    <td class="tg-0pky">Temporal block 7</td>
    <td class="tg-0pky">(N<sub>I</sub>, N<sub>H</sub>, N<sub>H</sub>, 2, 32)</td>
  </tr>
  <tr>
    <td class="tg-0pky">1 x 1 Convolution</td>
    <td class="tg-0pky">(N<sub>H</sub>, N<sub>O</sub>, 1, 1)</td>
  </tr>
</tbody>
</table>

<style type="text/css">
.tg  {border-collapse:collapse;border-spacing:0;}
.tg td{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;
  overflow:hidden;padding:10px 5px;word-break:normal;}
.tg th{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;
  font-weight:normal;overflow:hidden;padding:10px 5px;word-break:normal;background-color:#2f2f2f;}
.tg .tg-0lax{text-align:left;vertical-align:top}
</style>
<h3>Table 4</h3>
<table class="tg">
<thead>
  <tr>
    <th class="tg-0lax">Models</th>
    <th class="tg-0lax">PureTCN-G</th>
    <th class="tg-0lax">Pure TCN-D<br></th>
    <th class="tg-0lax">C-SVNN-G</th>
    <th class="tg-0lax">C-SVNN_D</th>
  </tr>
</thead>
<tbody>
  <tr>
    <td class="tg-0lax">N<sub>I</sub></td>
    <td class="tg-0lax">3</td>
    <td class="tg-0lax">1</td>
    <td class="tg-0lax">3</td>
    <td class="tg-0lax">1</td>
  </tr>
  <tr>
    <td class="tg-0lax">N<sub>H</sub></td>
    <td class="tg-0lax">80</td>
    <td class="tg-0lax">80</td>
    <td class="tg-0lax">50<br></td>
    <td class="tg-0lax">80</td>
  </tr>
  <tr>
    <td class="tg-0lax">N<sub>O</sub></td>
    <td class="tg-0lax">1</td>
    <td class="tg-0lax">1</td>
    <td class="tg-0lax">2</td>
    <td class="tg-0lax">1</td>
  </tr>
</tbody>
</table>

In [1]:
import torch
import torch.nn as nn
from torch.nn.utils import weight_norm


class TemporalBlock(nn.Module):
    """A Temporal Block module for Temporal Convolutional Networks (TCNs).

    This block consists of two 1D convolutional layers with optional downsampling
    and residual connections. It is designed to capture temporal dependencies in
    sequential data.

    Args:
        n_inputs (int): Number of input channels (features) to the block.
        n_hidden (int): Number of hidden units (channels) in the intermediate layer.
        n_outputs (int): Number of output channels (features) from the block.
        kernel_size (int): Size of the convolutional kernel along the temporal axis.
        dilation (int): Dilation factor for the convolutional layers. Controls the
                       spacing between kernel elements to capture long-range dependencies.

    Returns:
        torch.Tensor: Output tensor of shape (batch_size, n_outputs, sequence_length).
    """
    def __init__(self, n_inputs, n_hidden, n_outputs, kernel_size, dilation):
        super(TemporalBlock, self).__init__()

        # First convolutional layer
        self.conv1 = nn.Conv1d(
            in_channels=n_inputs,
            out_channels=n_hidden,
            kernel_size=kernel_size,
            stride=1,
            dilation=dilation,
            padding='same'  # Ensures the output has the same length as the input
        )

        # Activation function after the first convolution
        self.relu1 = nn.PReLU() 

        # Second convolutional layer
        self.conv2 = nn.Conv1d(
            in_channels=n_hidden,
            out_channels=n_outputs,
            kernel_size=kernel_size,
            stride=1,
            dilation=dilation,
            padding='same' 
        )

        # Activation function after the second convolution
        self.relu2 = nn.PReLU()

        # Main network: Sequence of layers
        self.net = nn.Sequential(self.conv1, self.relu1, self.conv2, self.relu2)

        # Downsample layer (used if input and output channels differ)
        self.downsample = nn.Conv1d(n_inputs, n_outputs, 1) if n_inputs != n_outputs else None

        # Initialize weights
        self.init_weights()

    def init_weights(self):
        """Initialize weights for convolutional layers.

        Weights are initialized using a normal distribution with mean 0 and standard
        deviation 0.01. This helps stabilize training and avoid vanishing/exploding gradients.
        """
        self.conv1.weight.data.normal_(0, 0.01)
        self.conv2.weight.data.normal_(0, 0.01)
        if self.downsample is not None:
            self.downsample.weight.data.normal_(0, 0.01)

    def forward(self, x):
        out = self.net(x)
        # Apply downsampling if necessary (to match input and output dimensions)
        res = x if self.downsample is None else self.downsample(x)

        return out + res


class TCN(nn.Module):
    """
    A Temporal Convolutional Network (TCN) for sequence modeling.

    This implementation stacks multiple causal convolutional blocks 
    (TemporalBlock) in a sequential manner. Each TemporalBlock can include 
    convolutions with increasing dilation factors to capture a broader 
    context in the time dimension.

    Args
    ----------
    input_size : int
        Number of channels (features) in the input sequence.
    output_size : int
        Number of output channels (features) for the final layer.
    n_hidden : int, optional
        Number of hidden channels used in each TemporalBlock. Default is 80.
    """
    def __init__(self, input_size, output_size, n_hidden=80):
        super(TCN, self).__init__()
        layers = []
        for i in range(7):
            num_inputs = input_size if i == 0 else n_hidden
            kernel_size = 2 if i > 0 else 1
            dilation = 2 * dilation if i > 1 else 1
            layers += [TemporalBlock(num_inputs, n_hidden, n_hidden, kernel_size, dilation)]
        self.conv = nn.Conv1d(n_hidden, output_size, 1)
        self.net = nn.Sequential(*layers)
        self.init_weights()

    def init_weights(self):
        self.conv.weight.data.normal_(0, 0.01)
    
    def forward(self, x):
        # x shape: (batch_size, seq_len, input_channels)
        y1 = self.net(x.transpose(1, 2))  # Now shape: (batch_size, n_hidden, seq_len)
        return self.conv(y1).transpose(1, 2)  # Final shape: (batch_size, seq_len, output_channels)


class Generator(nn.Module):
    """Generator: 3 to 1 Causal temporal convolutional network with skip connections.
       This network uses 1D convolutions in order to model multiple timeseries co-dependency.
    """ 
    def __init__(self, input_size, output_size):
        super(Generator, self).__init__()
        self.net = TCN(input_size, output_size)

    def forward(self, x):
        return torch.tanh(self.net(x))


class Discriminator(nn.Module):