# Echo state networks

https://www.geeksforgeeks.org/echo-state-network-an-overview/

https://link.springer.com/chapter/10.1007/11840817_86

In [1]:
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
import numpy as np

In [2]:

# Define ESN class
class ESN(nn.Module):
    def __init__(self, input_size, reservoir_size, output_size, spectral_radius=0.9):
        super(ESN, self).__init__()
        self.input_size = input_size
        self.reservoir_size = reservoir_size
        self.output_size = output_size

        # Initialize reservoir weights
        self.Win = nn.Parameter(torch.randn(reservoir_size, input_size))
        self.W = nn.Parameter(torch.randn(reservoir_size, reservoir_size))

        # Scaling W to have spectral radius = spectral_radius
        self.W.data *= spectral_radius / torch.max(torch.abs(torch.linalg.eigvals(self.W)))

        # Output layer
        self.Wout = nn.Linear(reservoir_size, output_size)

    def forward(self, input_data, initial_state=None):
        if initial_state is None:
            state = torch.zeros((input_data.size(0), self.reservoir_size)).to(input_data.device)
        else:
            state = initial_state

        state = torch.tanh(torch.matmul(input_data, self.Win.t()) + torch.matmul(state, self.W.t()))

        output = self.Wout(state)
        return output

In [3]:
# Load MNIST data
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,))])
trainset = torchvision.datasets.MNIST(root="./data", train=True, download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=64, shuffle=True)

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to ./data\MNIST\raw\train-images-idx3-ubyte.gz


100%|██████████| 9912422/9912422 [00:05<00:00, 1787697.93it/s]


Extracting ./data\MNIST\raw\train-images-idx3-ubyte.gz to ./data\MNIST\raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to ./data\MNIST\raw\train-labels-idx1-ubyte.gz


100%|██████████| 28881/28881 [00:00<?, ?it/s]


Extracting ./data\MNIST\raw\train-labels-idx1-ubyte.gz to ./data\MNIST\raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to ./data\MNIST\raw\t10k-images-idx3-ubyte.gz


100%|██████████| 1648877/1648877 [00:01<00:00, 1082975.18it/s]


Extracting ./data\MNIST\raw\t10k-images-idx3-ubyte.gz to ./data\MNIST\raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to ./data\MNIST\raw\t10k-labels-idx1-ubyte.gz


100%|██████████| 4542/4542 [00:00<00:00, 3433765.10it/s]

Extracting ./data\MNIST\raw\t10k-labels-idx1-ubyte.gz to ./data\MNIST\raw






In [4]:
# Instantiate ESN
input_size = 28 * 28  # MNIST image size
reservoir_size = 1000  # Size of reservoir
output_size = 10  # Number of classes
esn = ESN(input_size, reservoir_size, output_size)

In [5]:
# Loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(esn.parameters(), lr=0.001)

In [6]:
# Training loop
for epoch in range(3):
    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        inputs, labels = data
        inputs = inputs.view(-1, 28 * 28)  # Flatten images

        optimizer.zero_grad()

        # Forward pass
        outputs = esn(inputs)

        # Calculate loss
        loss = criterion(outputs, labels)

        # Backward pass
        loss.backward()
        optimizer.step()

        running_loss += loss.item()
        if i % 100 == 99:  # Print every 100 mini-batches
            print("[%d, %5d] loss: %.3f" % (epoch + 1, i + 1, running_loss / 100))
            running_loss = 0.0

print("Finished Training")

[1,   100] loss: 0.796
[1,   200] loss: 0.404
[1,   300] loss: 0.351
[1,   400] loss: 0.303
[1,   500] loss: 0.281
[1,   600] loss: 0.275
[1,   700] loss: 0.272
[1,   800] loss: 0.257
[1,   900] loss: 0.247
[2,   100] loss: 0.207
[2,   200] loss: 0.221
[2,   300] loss: 0.212
[2,   400] loss: 0.221
[2,   500] loss: 0.203
[2,   600] loss: 0.211
[2,   700] loss: 0.206
[2,   800] loss: 0.193
[2,   900] loss: 0.200
[3,   100] loss: 0.179
[3,   200] loss: 0.164
[3,   300] loss: 0.183
[3,   400] loss: 0.172
[3,   500] loss: 0.180
[3,   600] loss: 0.168
[3,   700] loss: 0.168
[3,   800] loss: 0.187
[3,   900] loss: 0.169
[4,   100] loss: 0.140
[4,   200] loss: 0.155
[4,   300] loss: 0.135
[4,   400] loss: 0.145
[4,   500] loss: 0.151
[4,   600] loss: 0.162
[4,   700] loss: 0.141
[4,   800] loss: 0.149
[4,   900] loss: 0.147
[5,   100] loss: 0.118
[5,   200] loss: 0.137
[5,   300] loss: 0.122
[5,   400] loss: 0.128
[5,   500] loss: 0.132
[5,   600] loss: 0.125
[5,   700] loss: 0.132
[5,   800] 

In [7]:
# Test the network
testset = torchvision.datasets.MNIST(root="./data", train=False, download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=4, shuffle=False)

In [8]:

correct = 0
total = 0
with torch.no_grad():
    for data in testloader:
        images, labels = data
        images = images.view(-1, 28 * 28)
        outputs = esn(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

In [9]:
print("Accuracy of the network on the 10000 test images: %d %%" % (100 * correct / total))

Accuracy of the network on the 10000 test images: 94 %


Let's break down how the Echo State Network (ESN) works line by line:

1. **Class Definition**: 
    - `class ESN(nn.Module):`: This line defines a Python class named `ESN`, which is a subclass of `nn.Module` from PyTorch. This means that `ESN` is a neural network module that can be trained using PyTorch's functionalities.

2. **Initialization Method (`__init__`)**:
    - `def __init__(self, input_size, reservoir_size, output_size, spectral_radius=0.9):`: This is the initialization method of the class `ESN`. It takes input parameters such as `input_size`, `reservoir_size`, `output_size`, and `spectral_radius` (with a default value of 0.9). These parameters define the architecture and properties of the ESN.

3. **Initialization of Parameters**:
    - `super(ESN, self).__init__()`: This line calls the constructor of the superclass `nn.Module`.
    - `self.input_size = input_size`: Stores the input size for later use.
    - `self.reservoir_size = reservoir_size`: Stores the reservoir size (number of reservoir neurons).
    - `self.Win = nn.Parameter(torch.randn(reservoir_size, input_size))`: Initializes the input weights (`Win`) as trainable parameters using `nn.Parameter`. These weights are randomly initialized from a Gaussian distribution.
    - `self.W = nn.Parameter(torch.randn(reservoir_size, reservoir_size))`: Initializes the reservoir weights (`W`) in a similar manner.
    
4. **Scaling the Reservoir Weights**:
    - `self.W.data *= spectral_radius / torch.max(torch.abs(torch.linalg.eigvals(self.W)))`: This line scales the reservoir weights (`W`) to ensure the spectral radius (the maximum absolute eigenvalue) is equal to the `spectral_radius` parameter. This is important for stability and proper functioning of the network.

5. **Output Layer**:
    - `self.Wout = nn.Linear(reservoir_size, output_size)`: Defines a linear transformation (`nn.Linear`) to map the reservoir state to the output space. This will be used for prediction.

6. **Forward Method**:
    - `def forward(self, input_data, initial_state=None):`: Defines the forward pass method of the ESN, which computes the output of the network given an input.
    - `if initial_state is None:`: Checks if an initial state is provided. If not, initializes the state to zeros.
    - `else:`: Handles the case where an initial state is provided.
    - `state = torch.tanh(torch.matmul(input_data, self.Win.t()) + torch.matmul(state, self.W.t()))`: This line computes the new state of the reservoir neurons using the input data, the input weights (`Win`), the current state, and the reservoir weights (`W`). It applies the hyperbolic tangent activation function (`torch.tanh`) to the sum of the input and recurrent activations.
    - `output = self.Wout(state)`: Computes the output by passing the reservoir state through the output layer (`Wout`), which is a linear transformation.

This breakdown provides an overview of how an Echo State Network (ESN) is implemented and how it processes input data to produce an output.

### Scaling the reservoir weights 
Crucial for ensuring stability and control over the network's dynamics. Here's why we need to scale the reservoir weights:

1. **Spectral Radius Control**: The spectral radius of the reservoir weight matrix has a significant impact on the dynamics of the network. It determines how information propagates through the network during the recurrent computation. If the spectral radius is too large, the network might become unstable and exhibit chaotic behavior. If it's too small, the network might not capture enough information from the input.

2. **Echo State Property**: The Echo State Property (ESP) is a key characteristic of ESNs. It states that the effect of the initial state on the network's output diminishes over time, and only the input history matters. Scaling the reservoir weights helps maintain this property by controlling the magnitude of the recurrent activations.

3. **Avoiding Saturation**: Large reservoir weights can cause activations to saturate, leading to vanishing or exploding gradients during training. Scaling the weights helps prevent this issue, ensuring that the network can learn effectively.

4. **Stability**: Scaling the weights to have a spectral radius within a desired range ensures that the network operates within stable regions of its activation functions, preventing runaway activations that can lead to numerical instability.

Overall, scaling the reservoir weights is essential for controlling the dynamics of an ESN, ensuring stability during training and effective information processing. It helps maintain the network's Echo State Property and facilitates learning of meaningful representations from the input data.

### The calculation of the state in the forward method

```python
state = torch.tanh(torch.matmul(input_data, self.Win.t()) + torch.matmul(state, self.W.t()))
```

1. **Input Data**: `input_data` represents the input to the ESN. In the context of image classification, `input_data` typically consists of flattened images, where each row corresponds to a single image.

2. **Input Weights (Win)**: `self.Win` represents the input weights of the ESN. It's a parameter of the network that defines the connections from the input to the reservoir neurons. The transpose (`self.Win.t()`) is taken because the input data is multiplied by the transpose of the input weights to match the dimensions.

3. **Reservoir Weights (W)**: `self.W` represents the reservoir weights of the ESN. These weights define the recurrent connections between reservoir neurons. Similar to the input weights, the transpose (`self.W.t()`) is taken because the current state is multiplied by the transpose of the reservoir weights to match the dimensions.

4. **Activation Function (Tanh)**: The hyperbolic tangent function (`torch.tanh`) is applied element-wise to the sum of the input activations (`torch.matmul(input_data, self.Win.t())`) and the recurrent activations (`torch.matmul(state, self.W.t())`). This sum represents the total input to each reservoir neuron.

5. **State Update**: The result of applying the hyperbolic tangent function is the new state of the reservoir neurons. This updated state (`state`) is computed for each sample in the input batch. It represents the activation levels of the reservoir neurons after processing the input data.

In summary, the calculation of the state in the forward method involves computing the total input to each reservoir neuron by combining the input data with the current state through the input and reservoir weights. This total input is then passed through the hyperbolic tangent activation function to produce the updated state of the reservoir neurons. This process is performed for each sample in the input batch.

### The recurrent connections and the Echo State Property

fundamental aspects of the network's architecture and behavior. Let's break down where these occur and how the Echo State Property is established:

1. **Recurrent Connections**:
   - Recurrent connections are established through the reservoir weights (`self.W`). These weights define the connections between reservoir neurons, allowing information to propagate and persist over time within the network.
   - In the forward method of the ESN (`forward(self, input_data, initial_state=None)`), the recurrent connections are applied when computing the new state of the reservoir neurons:
     ```python
     state = torch.tanh(torch.matmul(input_data, self.Win.t()) + torch.matmul(state, self.W.t()))
     ```
   - Here, `torch.matmul(state, self.W.t())` represents the recurrent connections. It computes the contribution of the current state to the next state by multiplying the current state (`state`) by the transpose of the reservoir weights (`self.W.t()`).

2. **Echo State Property**:
   - The Echo State Property (ESP) is a key characteristic of ESNs, ensuring that the effect of the initial state on the network's output diminishes over time, and only the input history matters.
   - The ESP is achieved through the combination of random initialization of the reservoir neurons, fixed recurrent connections, and input-driven training.
   - The random initialization of the reservoir neurons ensures that each neuron has a diverse range of activation levels, contributing to the network's rich dynamics.
   - The fixed recurrent connections, established by the reservoir weights (`self.W`), create a dynamic reservoir that retains and processes information over time.
   - During training, the network learns to map input patterns to desired outputs using a linear readout layer (`self.Wout`), while the recurrent dynamics of the reservoir provide rich temporal representations of the input history.
   - The ESP ensures that the network's internal dynamics effectively encode temporal information from the input sequences, facilitating accurate prediction or classification tasks.

In summary, recurrent connections are established through the reservoir weights, and the Echo State Property is manifested through the dynamic interplay of random initialization, fixed recurrent connections, and input-driven training, allowing the network to effectively process temporal information while exhibiting stable and rich dynamics.

### Features extracted by an Echo State Network (ESN)

When using an Echo State Network (ESN) as a feature extractor, the features extracted from the input data are representations learned by the reservoir neurons based on the input patterns. These features are not explicitly defined by humans but are learned by the network during training. Let's discuss the nature of these features and how they can be interpreted by humans and neural networks:

1. **Nature of Features**:
   - The features extracted by the ESN are abstract representations of the input data that capture relevant information for the task at hand (e.g., classification, prediction).
   - Since the reservoir neurons have nonlinear activation functions and recurrent connections, the features learned by the ESN can be complex and nonlinear transformations of the input data.
   - These features are typically distributed representations, meaning that each feature (neuron activation) may encode information from multiple input dimensions.
   - The features are learned in an unsupervised manner, meaning that the network autonomously discovers patterns and structures in the input data without explicit labeling or supervision.

2. **Interpretation by Humans**:
   - Humans may find it challenging to interpret the features learned by the ESN directly, especially in high-dimensional spaces.
   - However, visualization techniques such as dimensionality reduction (e.g., t-SNE, PCA) can be used to project the high-dimensional feature space into a lower-dimensional space for visualization and interpretation.
   - Interpretation of the features often relies on understanding which input patterns or characteristics are encoded by specific features. This can be inferred by analyzing the patterns of activation across different input samples.

3. **Interpretation by Neural Networks**:
   - Subsequent neural networks (e.g., feedforward neural networks, support vector machines) can interpret the features extracted by the ESN for downstream tasks such as classification or regression.
   - These neural networks treat the features extracted by the ESN as input features and learn to map them to the target outputs through supervised learning.
   - The neural networks can learn complex decision boundaries or relationships between the extracted features and the target outputs, leveraging the representational power of the ESN features for improved performance on the task.

In summary, when using an ESN as a feature extractor, the features extracted are abstract representations of the input data learned by the network. While these features may be difficult for humans to interpret directly, they can be effectively utilized by subsequent neural networks for various tasks, leading to improved performance compared to using raw input data.