## Gram matrix

------

In simple words, a Gram matrix (often referred to as a Gramian matrix) is a matrix created by multiplying a matrix with its own transpose.

![Imgur](https://imgur.com/I6JDFEM.png)

------------------------------------------------------------------------

### To Understand Gram Matrix and how they capture Styles of an Image, first we need to re-visit the concept of Dot product of 2 Vectors.

And the Dot product of two vectors can be written as:


![Imgur](https://imgur.com/QDiFTT3.png)


In the above figure, a and b are represented in a plane.

Now, the more correlated a and b are, the more closer the vectors are, i.e. the more similar they are

Which also means if they are closer, the angle between them, the theta will be less.

And by pure Trigonometric rule The Less the Theta, the more is the cosine of theta, i.e. the more the value of the expression $|a|.|b|.cos(theta)$

So ultimately, the more is the dot product between them.

So with the above overall Final Rule that we get is - When the Dot Product of 2 Vectors are larger then we can infer those 2 Vectors are similar.

------------------------------------------------------------------------


### But what does this theory of Vector Dot Products have to do with a Image based neural network ?

And thats because, when an image moves through the various Layers of a Neural Network, the content of an image is represented by the values of the intermediate feature maps. These features maps are all Tensors, and It turns out, the style of an image can be described by the means and correlations across the different feature maps.

Now for Style Extraction from your Input Images, we need to find the correlations between the features in each layer.


The question is ……. How do I find the correlation between these features?

Gram Matrix to the rescue!

So here, Gram Matrix is used to determine if two matrices (in this case, filters) are correlated. It is achieved by calculating the dot-product of the vectors of the two filters. And this matrix obtained with the dot-product is called Gram Matrix.

If the dot-product across the two filters is large then the two are said to be correlated and if it is small then the images are un-correlated.

**********

Consider two vectors(more specifically 2 flattened feature vectors from a convolutional feature map of depth C) representing features of the input space,

![Imgur](https://imgur.com/gOaKrvo.png)


Now take all C feature vectors(flattened) from this convolutional feature map of depth C and compute the dot product with every one of them(including with a feature vector itself). The result is the Gram Matrix(of size CxC).


![Imgur](https://imgur.com/Fziq5By.png)


This dot product of theirs gives us the information about the relation between them. The lesser the product the more different the learned features are and greater the Dot product, the more correlated the features are.

In other words, the lesser the dot product, the lesser the two features co-occur and the greater the dot-product, the more they occur together.

This in a sense gives information about an image’s style(texture) and zero information about its spatial structure, since we already flattened the feature and perform dot product on top of it.

------------------------------------------------------------------------

Let’s see an example. Refer to below figure showing Different channels of feature maps in any particular layer. At this layer, each channel of this feature map represents the different features present in the image. Now if we can anyhow find the correlation between these features, we can get the idea of the style as correlation is nothing but the co-occurrence of the features.

![Imgur](https://imgur.com/sXFCUES.png)

If red channel and yellow channel are fired up with high activation values that means they have hight correlation, meaning they occur together.

These two channels will have a higher correlation than that between red and green channels. We know that this co-occurrence can be calculated by calculating the correlation.

This correlation of all these channels w.r.t each other is given by the Gram Matrix of an image.


Hence, We calculate the Gram Matrix to measure the degree of correlation between channels which later will act as a measure of the style itself.

------------------------------------------------------------------------

### The process of Gram matrix computation.

![Imgur](https://imgur.com/TG3w0Wx.png)

![Imgur](https://imgur.com/EA2d41a.png)

## $$M = F * F^T$$

![Imgur](https://imgur.com/chZBAZI.png)

So in more simpler term, say you have a set of images and you want to calculate the Gram Metrix

So to start the process, say, the images you have are of (m x n) shape.

So first reshape a single one to a (m*n x 1) vector. That is Flatten the shape.

Similarly convert all images to vector and then from all those Vector form a matrix ,say, M.

Then the gram matrix G of these set of images will be

G = M.transpose() * M;

each element G(i,j) will represent the similarity measure between image i and j.

----------------------------------




```py

def gram_matrix(tensor):
  """
    Calculate the Gram matrix from an image tensor.

    Parameters:
    - tensor (torch.Tensor): Input image tensor of shape (batch_size, channels, height, width).

    Returns:
    - gram_matrix (torch.Tensor): Gram matrix of the input tensor.
    """

   # Unwrap the tensor dimensions into respective variables
   # i.e. batch size, distance, height and width
  _,d,h,w = tensor.size()

  # Flatten / Reshaping data into a two dimensional array or two dimensional of tensor
  tensor = tensor.view(d,h*w)

  # Multiplying the original tensor with its own transpose using torch.mm
  # tensor.t() will return the transpose of original tensor

  gram_matrix = torch.mm(tensor,tensor.t())

  #Returning gram matrix
  return gram_matrix

```

The Gram matrix computed in this technique contains dot products of the feature maps at a layer, which is a correlation operation. The entries basically encode activations that co-occur.

Co-Occurrences means that texture (style) exhibits strong locality. So, when you capture activations that co-occur a lot - you capture this locality.