# Exploratory data analysis in retinal bipolar data with autoencoders

In this notebook, we will build a neural network that explores the retinal bipolar dataset for Shekhar et al., 2016 without using the manually annotated cell type labels.

## 1. Imports

In [None]:
!pip install scprep

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt


import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import scprep

## 2. Loading the retinal bipolar data

We'll use the same retinal bipolar data we used for the classifier.

Alternatively, you may load your own data by replacing the Google Drive file ids with your own file ids.

Note that if you do, you will likely not have annotated celltype labels yet. Replace all references to `metadata['CELLTYPE']` with an entry from `metadata`, or your favorite gene.

In [None]:
scprep.io.download.download_google_drive("1GYqmGgv-QY6mRTJhOCE1sHWszRGMFpnf", "data.pickle.gz")
scprep.io.download.download_google_drive("1q1N1s044FGWzYnQEoYMDJOjdWPm_Uone", "metadata.pickle.gz")

In [None]:
data_raw = pd.read_pickle("data.pickle.gz")
metadata = pd.read_pickle("metadata.pickle.gz")

In [None]:
data = scprep.reduce.pca(data_raw, n_components=100, method='dense').to_numpy()
labels, cluster_names = pd.factorize(metadata['CELLTYPE'])

## 3. Building an autoencoder

An **autoencoder** is a network that tries to reproduce its input. 

In this case, we will squeeze the data through a two-dimensional bottleneck (i.e. a extremely low-dimensional hidden layer) which we can use for visualization. Also, reducing the dimension from 100 down to 2 forces the network to only retain the most important information, which intrinsically behaves as a kind of denoising.

#### Create layers


In [None]:
class layer(nn.Module):
    def __init__(self, input_size, output_size, activation=None):
        super(layer, self).__init__()

        self.weight = torch.randn(input_size, output_size).double().requires_grad_()
        self.bias = torch.randn(output_size).double().requires_grad_()
        self.activation = activation

    def forward(self, x):
        output = torch.matmul(x, self.weight) + self.bias
        output = self.activation(output)
        return output

In [None]:
# move data to pytorch tensors
data_tensor = torch.Tensor(data)

### Method 1

In [None]:
# layers will be input -> 100 -> 2 --> 100 -> output
# first hidden layer of size 100
hidden_layer1 = layer(input_size=data_tensor.shape[1], 
                      output_size= 100, 
                      activation=nn.ReLU())

# we won't apply a nonlinear activation to the 2D middle layer
hidden_layer2 = layer(input_size=100, 
                      output_size=2,
                      activation=None)

# last hidden layer of size 100
hidden_layer3 = layer(input_size=2,
                         output_size=100, 
                         activation=nn.ReLU())

# the output should be the same size as the input
output = layer(input_size=100,
              output_size=data_tensor.shape[1], 
               activation=None)

### Method 2

PyTorch provides the linear layers we've been manually defining in its `nn` module (the same place we've been getting our activation functions) as [`nn.Linear()`](https://pytorch.org/docs/stable/nn.html#linear), so let's go ahead and repeat the layer creation step above using this new knowledge.


In [None]:
# layers will be input -> 100 -> 2 --> 100 -> output

# first hidden layer of size 100
hidden_layer1 = nn.Linear(in_features=data_tensor.shape[1], 
                      out_features= 100)

# second middle layer
hidden_layer2 = nn.Linear(in_features=100, 
                      out_features=2)

# last hidden layer of size 100
hidden_layer3 = nn.Linear(in_features=2,
                        out_features=100)

# the output should be the same size as the input
output_layer4 = nn.Linear(in_features=100,
              out_features=data_tensor.shape[1])

As you may have noticed, we did not specify our activation functions this time. Since this is separate from the `nn.Linear` class, we will have to define them outside our layers. 

In [None]:
activation_1 = nn.ReLU()
activation_3 = nn.ReLU()

Now let's use some PyTorch magic and create a model using `nn.Sequential`, which we can just treat as some fancy list for Pytorch layers. One of the benefits of this is that we can use `model.parameters()` to pull out the list of network parameters, rather than having to list them ourselves.

In [None]:
autoencoder1 = nn.Sequential(hidden_layer1,
                            activation_1,
                            hidden_layer2,
                            hidden_layer3,
                            activation_3,
                            output_layer4
                            )

`nn.Sequential` ties together our layers and creates a model. The data passes through the model in the order we place the layers. We can print out the model to see the list of layers.

In [None]:
print(autoencoder1)

#### Define Optimizer

As in the classifier, we'll start with a SGD optimizer.

In [None]:
learning_rate = 0.001

optimizer = optim.SGD(autoencoder1.parameters(),
                       lr=learning_rate)


#### Loss function

Since this is an autoencoder, we don't have prior assumptions on the output (like it being a discrete probability distribution, as it was in classification) so we can't use fancy loss functions like the cross entropy. Instead, we'll just compute the mean squared error of the output compared to the input.

In [None]:
loss_fcn = nn.MSELoss()

#### Train the network

Let's move our hyperparameters to a function that we can reuse to train other models

In [None]:
def train_model(model, n_epochs=10):

    batch_size=100
    learning_rate = 0.001
    optimizer = optim.SGD(model.parameters(),
                        lr=learning_rate)
    loss_fcn = nn.MSELoss()

    # we'll train the network for 10 epochs
    step = 0
    for epoch in range(n_epochs):
        # randomize the order of the data each time through
        random_order = np.random.permutation(data_tensor.shape[0])
        data_randomized = data_tensor[random_order]

        # train the network on batches of size `batch_size`
        for data_batch in np.array_split(data_randomized, data_randomized.shape[0] // batch_size):
            step += 1

            # update the network weights to minimize the loss
            output = model(data_batch)

            # get loss
            loss = loss_fcn(output, data_batch)

            # print the loss every 100 epochs
            if step % 100 == 0:
                print("Step: {} Loss: {:.3f}".format(step, loss.item()))

            # backpropagate the loss
            loss.backward()

            # update parameters
            optimizer.step()

            # reset gradients
            optimizer.zero_grad()

    return model

In [None]:
autoencoder1 = train_model(autoencoder1)

#### Visualize the output

Rather than evaluating our model with our data like we did with the classifier, we can now use our model to evaluate our data (aka exploratory data analysis)!  Autoencoder networks are very useful in exploratory data analysis.

In [None]:
print(autoencoder1[:3])

In [None]:
# let's get the 2D internal hidden layer and visualize it with a scatter plot

with torch.no_grad():
    ae_coordinates = autoencoder1[:3](data_tensor).numpy()


scprep.plot.scatter2d(ae_coordinates, c=cluster_names[labels],
                      label_prefix='AE Coordinate ', discrete=True,
                      legend_anchor=(1,1), figsize=(10,4))

### Exercise

Try retraining the network for more than just 10 epochs and plot it again. 

In [None]:
# =======
# Retrain the network
autoencoder1 = train_model(
    autoencoder1,
    n_epochs =
)
# =======

In [None]:
with torch.no_grad():
    ae_coordinates = autoencoder1[:3](data_tensor).numpy()

scprep.plot.scatter2d(ae_coordinates, c=cluster_names[labels],
                      label_prefix='AE Coordinate ', discrete=True,
                      legend_anchor=(1,1), figsize=(10,4))

### Discussion

1. What do you notice about the visualization? 
2. How does this compare to the visualizations you have seen with PCA, t-SNE, UMAP and PHATE?

## Exercise 4 - Activation functions on the visualization layer

Notice we did not use an acitvation function for the hidden layer we were going to visualize.

Repeat the process with other activation functions like `nn.ReLU`, `nn.Sigmoid`, `nn.Tanh`, etc. You can see more in the [PyTorch activation function documentions](https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity). 

Note how the visualization changes. Has the data changed at all?

In [None]:
# ======
# choose from nn.Sigmoid(), nn.Tanh() and others in the documentation
activation_2 = 
# ======

autoencoder2 = nn.Sequential(hidden_layer1,
                            activation_1,
                            hidden_layer2,
                            activation_2,
                            hidden_layer3,
                            activation_3,
                            output_layer4
                            )

In [None]:
autoencoder2 = train_model(autoencoder2)

In [None]:
# let's get the 2D internal hidden layer and visualize it with a scatter plot

with torch.no_grad():
    ae_coordinates2 = autoencoder2[:4](data_tensor).numpy()

scprep.plot.scatter2d(ae_coordinates2, c=cluster_names[labels],
                      label_prefix='AE Coordinate ', discrete=True,
                      legend_anchor=(1,1), figsize=(10,4))

## Exercise 5 - Activation functions on the wide hidden layers

Now turn the activation for the visualization layer back to None, but experiment with the activation function for the 100-dimensional layers.

Is there a change? Why?

In [None]:
# ======
# choose from nn.Sigmoid(), nn.Tanh() and others in the documentation
activation_1 = 
activation_3 = 
# ======

autoencoder3 = nn.Sequential(hidden_layer1,
                            activation_1,
                            hidden_layer2,
                            hidden_layer3,
                            activation_3,
                            output_layer4
                            )

In [None]:
autoencoder3 = train_model(autoencoder3)

In [None]:
# let's get the 2D internal hidden layer and visualize it with a scatter plot

with torch.no_grad():
    ae_coordinates3 = autoencoder3[:3](data_tensor).numpy()

scprep.plot.scatter2d(ae_coordinates3, c=cluster_names[labels],
                      label_prefix='AE Coordinate ', discrete=True,
                      legend_anchor=(1,1), figsize=(10,4))