The Processor is the core that learns temporal dynamics and produces a coarse forecast for up to 72 hours.

In our design, for simplicity, we assume the model is autoregressive: it takes the current state (at time t0) and then iteratively predicts the next 72 hourly states. The encoder (ViT) has already given us an encoding of the current state

 We feed this encoding into the ConvLSTM as the initial input. The
 ConvLSTM will output a prediction for the next time step, which we then loop back as input for the
 following step, and so on, generating forecasts up to +72h. This approach mimics how NWP uses the
 current analysis to step the model forward in time.

 ------------------------------------------------------------------------------------------------------------------------------------
ConvLSTM Implementation: PyTorch does not have a built-in ConvLSTM in 
nn module, so we
 implement it. A single ConvLSTM cell can be coded as:

In [None]:
class ConvLSTMCell(nn.Module):
def __init__(self, input_channels, hidden_channels, kernel_size=3):
    super().__init__()
    self.hidden_channels = hidden_channels
    # Combine input and hidden for gating
    self.conv = nn.Conv2d(input_channels + hidden_channels, 4 *hidden_channels,
    kernel_size, padding=kernel_size//2)
def forward(self, x, h_prev, c_prev):
    # x: input tensor (B, input_ch, H, W)
    # h_prev, c_prev: previous hidden and cell state (B, hidden_ch, H, W)
    if h_prev is None:
    # initialize with zeros if not provided
        B, _, H, W = x.shape
        h_prev = x.new_zeros(B, self.hidden_channels, H, W)
        c_prev = x.new_zeros(B, self.hidden_channels, H, W)
    combined = torch.cat([x, h_prev], dim=1) # concatenate along channel
    gates = self.conv(combined) # shape (B, 4*hidden_ch, H, W)
    # Split gates
    Ci = self.hidden_channels
    input_gate = torch.sigmoid(gates[:, :Ci])
    forget_gate = torch.sigmoid(gates[:, Ci:2*Ci])
    output_gate = torch.sigmoid(gates[:, 2*Ci:3*Ci])
    candidate = torch.tanh(gates[:, 3*Ci:4*Ci])
    c_new = forget_gate * c_prev + input_gate * candidate
    h_new = output_gate * torch.tanh(c_new)
    return h_new, c_new

 Using this cell, we can build a multi-layer ConvLSTM (stacking cells, where each layer’s hidden state
 feeds into the next). For forecasting, a Decoder loop runs the ConvLSTM step by step. 

In [None]:
class ConvLSTMForecast(nn.Module):
def __init__(self, input_ch, hidden_ch):
    super().__init__()
    self.cell = ConvLSTMCell(input_ch, hidden_ch)
 def forward(self, init_input, steps=48):
    # init_input: (B, input_ch, H, W) at t0
    h, c = None, None # will be initialized in cell
    x = init_input
    outputs = []
 for t in range(steps):
    h, c = self.cell(x, h, c) 
    outputs.append(h)        # use hidden state as output prediction
    x = h                   # feed output as next input (autoregressive)
    outputs = torch.stack(outputs, dim=1) # (B, steps, hidden_ch, H, W)
    return outputs