In [1]:
import torch
import torchvision
import torch.optim as optim
from torch.utils.data import DataLoader
from torchvision.datasets import MNIST
import torchvision.transforms as transforms
import torch.nn as nn
import torch.nn.functional as F
from torch.nn.functional import relu, softmax
from scipy.signal import convolve2d

# Data loading and Transformers

## Transformers:
* ToTensor(): This transformation converts images to the torch.Tensor() format.
* Normalize(): This transformation normalizes images with a specified mean and standard deviation.
* Lambda(): This transformation applies an arbitrary function to the images.

## Data Loading:

In this section, the training and test data from the MNIST dataset were loaded by **data**() function.

In [2]:
torch.manual_seed(1)
def data(batch_size):
  transform = torchvision.transforms.Compose([transforms.ToTensor(),
                                              transforms.Normalize((0.1,),(0.1,)),
                                                transforms.Lambda(lambda x: torch.flatten(x))])
  train_data = MNIST(root = "./", train= True, transform = transform ,download=True)
  test_data = MNIST(root = "./", train= False, transform = transform ,download=False)

  train_loader = DataLoader(dataset=train_data, batch_size=batch_size, shuffle=True)
  test_loader = DataLoader(dataset=test_data, batch_size=batch_size, shuffle=True)
  return train_loader, test_loader


device = "cuda" if torch.cuda.is_available() else "cpu"


# Mask
the Mask function generates a binary mask for an image by randomly generating an image, applying a blurring filter (based on FFA13 Paper), and then applying a threshold. The function returns the resulting mask as a one-dimensional vector.

In [3]:
def Mask():
    random_iter = torch.randint(5,10,size=(1,))
    random_image = torch.randint(2, size=(1,28,28)).squeeze().float()
    blur_filter = torch.tensor([[1, 2, 1], [2, 4, 2], [1, 2, 1]]) / 16
    for i in range(random_iter):
        random_image = convolve2d(random_image, blur_filter, mode='same', boundary='symm')
    mask = torch.tensor((random_image > 0.50), dtype = torch.float)
    return mask.view(1,-1)

# Learning Class

The Learning class is a custom neural network layer that implements the forward-forward algorithm. The class has two main methods: forward() and learn().

## Forward Method

The forward() method computes the output of the layer for a given input. The input is first normalized and then passed through a ReLU activation function.




```
def forward(self, x):
  x_norm = torch.norm(x, 2, 1, keepdim=True)
  x_ = x/(x_norm + self.eps)
  return relu((x_ @ self.w.T) + self.b.unsqueeze(dim = 0))
```



The x_norm is x L2 norm and epsilon is a value to avoid division by zero.

## Learn Method

The learn() method trains the layer's weights and biases using the forward-forward algorithm. The method takes two input tensors: x_pos and x_neg, representing positive and negative examples, respectively. The method performs the following steps for each epoch:

*   Compute the goodness for positive and negative examples and Concatenate the positive and negative goodness values and compute the loss using the loss function.
$$ \log(1 + e^{[(threshold - g_{pos})\, ,\, (g_{neg} - threshold)]})$$
*   Compute the mean square error (MSE)

*   Zero the gradients of the layer's parameters and Compute the backward (derivatives) pass using the loss function.

*   Update the layer's parameters using the optimizer.

*   Print the current epoch and loss value and Return the updated outputs for positive and negative examples.



---
# Answer the question
The necessary conditions for goodness for positive and negative data are as follows:

*   For positive data, goodness must be positive.

*   For negative data, goodness must be negative.


The loss function defined above can satisfy these conditions by using the exponantial function. If goodness for positive data is positive, the exp function will map it to a large value whose log is also positive. If goodness for negative data is negative, the exp function will map it to a small value whose log is also negative.

In [4]:
class Learning(nn.Module):
    def __init__(self, in_features, out_features, num_epochs = 10 ,threshold = 2.0, lr =0.01):
        super().__init__()
        self.w = torch.nn.Parameter(torch.randn(out_features, in_features))
        self.b = torch.nn.Parameter(torch.randn(out_features))
        self.lr = lr
        self.optimizer = optim.Adam(self.parameters(), lr = self.lr)
        self.threshold = threshold
        self.num_epochs = num_epochs
        self.eps = 1e-4


    def forward(self, x):
        x_norm = torch.norm(x, 2, 1, keepdim=True)
        x_ = x/(x_norm + self.eps)
        return relu((x_ @ self.w.T) + self.b.unsqueeze(dim = 0))

    def learn(self, x_pos, x_neg):
        for epoch in range(self.num_epochs):
            g_pos = torch.pow(self.forward(x_pos), 2).mean(dim = 1)
            g_neg = torch.pow(self.forward(x_neg),2).mean(dim = 1)

            loss = torch.log(1 + torch.exp(torch.cat([self.threshold - g_pos ,
                                                       g_neg - self.threshold]))).mean()
            self.optimizer.zero_grad()
            loss.backward()
            self.optimizer.step()

            print(f"{epoch+1}/{self.num_epochs} --- Loss: {loss.item():.4f}")

        return  (
            self.forward(x_pos).detach(),
              self.forward(x_neg).detach()
              )


# FF class

Learning class defines the forward-forward model for the problem. It first creates the layers according to the **Learning** class described above, based on the number of neurons (hidden_dims) it takes.

The methods are implemented:

* Generate_data( ): The Generate_data method generates a hybrid data sample by masking and combining two copies of the input tensor. The binary mask is generated using the **Mask** function and is applied to both copies of the input tensor.
* Predict( ): In this method, the prediction is performed. First, a dataset is created for each label based on the data passing through the layers and calculating the amount of Goodness. Then the amount of goodness for each label is calculated. And finally, the prediction is made on the data based on which label has the higher goodness value.
* learn( ): Learning takes positive and negative data as argument in this method and use the learn method that implemented FFnet class.

In [5]:
class FF(nn.Module):
    def __init__(self, hidden_dims, device = 'cpu', num_epochs = 10, lr = 0.01, threshold = 2.0):
        super().__init__()
        self.layers = nn.ModuleList()
        self.num_epochs = num_epochs
        self.device = device
        self.lr = lr
        self.num_class = 10
        for i in range(len(hidden_dims)-1):
            self.layers.append(Learning(hidden_dims[i], hidden_dims[i+1], num_epochs = num_epochs, lr = self.lr, threshold = threshold).to(self.device))
        print(f"---- Device: {device} ----")

    def Generate_data(self, x):
      x = x[0]
      indexes =torch.randperm(x.shape[0])
      x1 = x
      x2 = x[indexes]
      mask = Mask()
      x1 = x1*mask
      x2 = x2*(1-mask)
      hybrid_x = x1+x2
      return hybrid_x


    def predict(self, x):
        representations = torch.Tensor([]).to(self.device)
        a = x
        for layer in self.layers:
            a = layer(a)
            representations = torch.cat([representations,a],dim = 1)
        return representations



    def learn(self, x_pos, x_neg):
        a_pos, a_neg = x_pos, x_neg
        for inx in range(len(self.layers)):
            print(f" Layer {inx +1 } : ")
            a_pos, a_neg = self.layers[inx].learn(a_pos, a_neg)

    def inference(self, train_data, test_data):
        Linear_Classifier = LinearClassifier(model, hidden_dims, batch_size, device=device,num_epochs=10, lr = lr)
        Linear_Classifier.learn(train_loader)


        return (
            Linear_Classifier.accuracy(train_data),
            Linear_Classifier.accuracy(test_data)
        )



# LinearClassifier class

forward( ): The forward method defines the forward pass of the model. It normalizes the input data, transforms it through the linear layer, and returns the output.

prediction( ): The predict method prepares the input data, generates representation vectors using the model, applies the forward pass to the representation vectors, and returns the predicted class labels using the softmax function.

learn( ): The learn method trains the model for a specified number of epochs. It iterates over the training data, calculates the loss using the cross-entropy loss function, updates the model's parameters using the Adam optimizer, and tracks the average loss for each epoch.

accuracy( ): The accuracy method evaluates the model's accuracy on the provided data. It iterates over the data, generates predictions, calculates the number of correct predictions, and returns the accuracy.

In [6]:
class LinearClassifier(nn.Module):
    def __init__(self, model, hidden_dims, batch_size, device = 'cpu', num_epochs = 10, lr = 0.01):
      super().__init__()
      self.num_class = 10
      self.num_epochs = num_epochs
      self.device = device
      self.model = model
      self.layer = nn.Linear(sum(hidden_dims[1:]), self.num_class)
      self.opt = optim.Adam(self.parameters(),lr=lr)
      self.criterion = nn.CrossEntropyLoss()
      self.to(device)

    def forward(self,x):
      x = (x - x.mean())/ x.std()
      x = self.layer(x).float()
      return x

    def predict(self,x):
        x = x.view(-1,784).to(self.device)
        representations = self.model.predict(x)
        y_h = self.forward(representations)
        soft_out = softmax(y_h, dim =1)
        return soft_out.argmax(dim = 1)

    def learn(self, data):
      print()
      print("Linear Classifier:")
      losses = []
      for epoch in range(self.num_epochs):
          epoch_losses = []
          for x,y in iter(data):
                  x = x.view(-1,784).to(self.device)
                  # print(x.shape)
                  y = F.one_hot(y, 10).float().to(self.device)
                  # print(y.shape)
                  representations = self.model.predict(x)
                  # print(representations.shape)
                  y_h = self.forward(representations)
                  # print(y_h.shape)
                  loss = self.criterion(y_h,y)
                  self.opt.zero_grad()
                  loss.backward()
                  self.opt.step()
                  epoch_losses.append(loss)
          losses.append(float(sum(epoch_losses)/len(epoch_losses)))
          print(f"{epoch+1}/{self.num_epochs} ---- loss: {losses[epoch]:.4f}")

    def accuracy(self, data):
        n_correct=0
        n_total=0
        for inputs , targets in data:
            inputs, targets= inputs.to(self.device),targets.to(self.device)
            predictions = self.predict(inputs)
            n_correct += (predictions==targets).sum().item()
            n_total += targets.shape[0]
        return n_correct/n_total


# Main


*   Set Hyperparameters
*   Learn model
*   evaluate model



In [7]:
num_epochs = 30
lr = 0.005
threshold = 2.5
input_shape = 1*28*28
hidden_dims = [input_shape, 1000, 1000]
batch_size = 1024
train_loader, test_loader = data(batch_size)
model = FF(hidden_dims ,device=device,num_epochs=num_epochs, lr = lr, threshold=threshold)

c = 0
for x,_ in train_loader:
    c+=1
    print(f"batch [{c}]")
    x_pos = x
    x_neg = model.Generate_data(x)
    x_pos, x_neg = x_pos.view(-1,input_shape).to(device), x_neg.view(-1,input_shape).to(device)
    model.learn(x_pos,x_neg)

train_acc, test_acc = model.inference(train_loader, test_loader)

print("--------------------------------------------------")
print()

print(f"Train Accuracy: {100*train_acc:.2f} %, Test Accuracy: {100*test_acc:.2f} %")



Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to ./MNIST/raw/train-images-idx3-ubyte.gz


100%|██████████| 9912422/9912422 [00:00<00:00, 174659975.48it/s]

Extracting ./MNIST/raw/train-images-idx3-ubyte.gz to ./MNIST/raw






Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to ./MNIST/raw/train-labels-idx1-ubyte.gz


100%|██████████| 28881/28881 [00:00<00:00, 119936330.52it/s]


Extracting ./MNIST/raw/train-labels-idx1-ubyte.gz to ./MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to ./MNIST/raw/t10k-images-idx3-ubyte.gz


100%|██████████| 1648877/1648877 [00:00<00:00, 49563847.04it/s]

Extracting ./MNIST/raw/t10k-images-idx3-ubyte.gz to ./MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz





Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to ./MNIST/raw/t10k-labels-idx1-ubyte.gz


100%|██████████| 4542/4542 [00:00<00:00, 23842964.67it/s]


Extracting ./MNIST/raw/t10k-labels-idx1-ubyte.gz to ./MNIST/raw

---- Device: cuda ----
batch [1]
 Layer 1 : 
1/30 --- Loss: 1.6661
2/30 --- Loss: 1.6018
3/30 --- Loss: 1.5348
4/30 --- Loss: 1.4652
5/30 --- Loss: 1.3931
6/30 --- Loss: 1.3188
7/30 --- Loss: 1.2425
8/30 --- Loss: 1.1646
9/30 --- Loss: 1.0854
10/30 --- Loss: 1.0056
11/30 --- Loss: 0.9256
12/30 --- Loss: 0.8462
13/30 --- Loss: 0.7681
14/30 --- Loss: 0.6921
15/30 --- Loss: 0.6190
16/30 --- Loss: 0.5495
17/30 --- Loss: 0.4842
18/30 --- Loss: 0.4238
19/30 --- Loss: 0.3685
20/30 --- Loss: 0.3187
21/30 --- Loss: 0.2742
22/30 --- Loss: 0.2351
23/30 --- Loss: 0.2010
24/30 --- Loss: 0.1716
25/30 --- Loss: 0.1464
26/30 --- Loss: 0.1250
27/30 --- Loss: 0.1068
28/30 --- Loss: 0.0916
29/30 --- Loss: 0.0787
30/30 --- Loss: 0.0679
 Layer 2 : 
1/30 --- Loss: 1.7854
2/30 --- Loss: 1.6591
3/30 --- Loss: 1.5210
4/30 --- Loss: 1.3724
5/30 --- Loss: 1.2153
6/30 --- Loss: 1.0528
7/30 --- Loss: 0.8889
8/30 --- Loss: 0.7289
9/30 --- Loss: 0.5785

### Inspired by ---> Github: **IsmailKonak**

### c-
Representation vector is a vector of numbers that is calculated from the loss value for each data sample. This vector is directly related to the labels of the data. Representation vector is a feature that can be used to distinguish between positive and negative data.

### d-
In a linear classifier, the output of the linear layer is a vector of numbers that is used to predict the class of the data. This vector is directly calculated from the Representation vector. Representation vector can be added as a new feature to the linear classifier. This helps the classifier to better understand the relationship between the data features and the class of the data.