# Basics of pytorch_lightning, dataloaders, and tensorboard

This is a basic walkthrough of building, training, and using a simple neural network in [PyTorch](https://pytorch.org/) using the [PyTorch Lightning](https://pytorch-lightning.readthedocs.io/en/latest/) package to let us make better-organized code, and displaying information about training in [TensorBoard](https://www.tensorflow.org/tensorboard). We'll load the classic [MNIST dataset](http://yann.lecun.com/exdb/mnist/) from the [torchvision.datasets](https://pytorch.org/docs/stable/torchvision/datasets.html) library of image datasets and feed the network data using a [dataloader](https://pytorch.org/docs/stable/data.html), which is a standard way of handling data for a PyTorch model.



## Install and import the needed packages

### Install the needed packages.

In [None]:
!pip install torch 
!pip install torchvision
!pip install pytorch_lightning


### Now that the packages are installed, import them to this project.

In [None]:
# -------------------------------------------------
#  Pytorch  
# -------------------------------------------------

# This is the main torch package
import torch 
#Computer vision specific package              
import torchvision
#There are a bunch of standard datasets in torchvision. 
import torchvision.datasets as datasets

# -------------------------------------------------
# Pytorch Lightning 
# -------------------------------------------------

import pytorch_lightning as pl
# this gives us the hooks to connect to TensorBoard
import pytorch_lightning.loggers as pl_loggers

# -------------------------------------------------
# Stuff to show the data using matplotlib
# -------------------------------------------------
#import random
#import numpy as np
#import matplotlib
#import matplotlib.pyplot as plt
# this magic command lets me show plots in the notebook
#%matplotlib inline

# -------------------------------------------------
# Stuff for timestamping 
# -------------------------------------------------
from time import process_time 
import datetime


## Define the [LightningDataModule](https://pytorch-lightning.readthedocs.io/en/latest/datamodules.html) to prepares the data for use by the network.

A datamodule is a shareable, reusable class that encapsulates all the steps needed to process data. 

1. Download / tokenize / process.
1. Clean up data and (maybe) save to disk.
1. Load data into a [Dataset](https://pytorch.org/docs/stable/data.html).
1. Apply [Transforms](https://pytorch.org/docs/stable/torchvision/transforms.html) to the data (rotate, tokenize, etc…).
1. Wrap inside a [DataLoader](https://pytorch.org/docs/stable/data.html).

In [None]:
class MyDataModule(pl.LightningDataModule):

    def __init__(self, data_dir='./data/'):
        super().__init__()
        self.data_dir = data_dir

        # These will be applied to every element in the dataset, to ensure they're 
        # normalized and in the right format.
        self.transform = torchvision.transforms.Compose([
            torchvision.transforms.ToTensor(),
            torchvision.transforms.Normalize((0.5,), (0.5,))])
        
    # Our dataset is in the torchvision library of datasets. Here is where you'd
    # change the code to process a different dataset
    def setup(self, stage=None):
        self.mnist_test = datasets.MNIST(self.data_dir,train=False,download=True, transform=self.transform)
        mnist_full = datasets.MNIST(self.data_dir, train=True, download=True,  transform=self.transform)
        self.mnist_train, self.mnist_val = torch.utils.data.random_split(mnist_full, [55000, 5000])

    # Dataloaders are the things that handle creating batches of data and handing them
    # to the model. You determine whether to randomize data order and the size of the batch
    # when you declare the data loader
    def train_dataloader(self):
        return torch.utils.data.DataLoader(self.mnist_train, batch_size=64, shuffle=True)

    def val_dataloader(self):
        return torch.utils.data.DataLoader(self.mnist_val, batch_size=64, shuffle=True)

    def test_dataloader(self):
        return torch.utils.data.DataLoader(self.mnist_test, batch_size=1)

## Define network architecture and test/train actions in a [PyTorchLightning](https://pytorch-lightning.readthedocs.io/en/latest/lightning_module.html) Module.

PyTorch Lightning builds on top of standard PyTorch. It is a way of organizing code to make it more modular and easier to handle. A LightningModule organizes PyTorch code into these sections:

1. Network architecture (init)
1. Data-flow/computations (forward)
1. Train loop (training_step)
1. Validation loop (validation_step)
1. Test loop (test_step)
1. Optimizers (configure_optimizers)

The first two of these (init and forward) are what you'd find in a typical [torch.nn.Module](https://pytorch.org/docs/stable/generated/torch.nn.Module.html). The LightningModule extends the torch Module to add the train/test/validation and optimizer definitions into the module.

The code you'd normally write for a torch Module's training and testing loops are instead handled externally to your code by a [Trainer](https://pytorch-lightning.readthedocs.io/en/latest/trainer.html). The Trainer handles the event loop for training and testing the model, calling the methods in your LightningModule when it's time to train or test on a batch of data from a dataloader.

In [None]:

class MyLightningModule(pl.LightningModule):

  # Define the model architecture
  def __init__(self):
    super(MyLightningModule, self).__init__()

    # mnist images are (1, 28, 28) (channels, width, height) 
    self.layer_1 = torch.nn.Linear(28 * 28, 64)
    self.layer_2 = torch.nn.Linear(64, 256)
    self.layer_3 = torch.nn.Linear(256, 10)

  def forward(self, x):
      batch_size, channels, width, height = x.size()
      x = x.view(batch_size, -1)
      x = torch.relu(self.layer_1(x))
      x = torch.relu(self.layer_2(x))
      x = torch.log_softmax(self.layer_3(x), dim=1)
      return x

  def configure_optimizers(self):
    optimizer = torch.optim.Adam(self.parameters(), lr=1e-3)
    return optimizer

  def my_loss(self, y_hat,y):
      return torch.nn.functional.nll_loss(y_hat,y)

  def training_step(self, train_batch, batch_idx):
      x, y = train_batch  # Here x = data, y = labels
      output = self.forward(x)
      loss = self.my_loss(output, y)
      
      # Calculate the accuracy of the model on the batch of data
      y_hat =  output.argmax(dim=1)
      accuracy = y_hat.eq(y).sum().item()/len(y)

      # these two lines write the accurcay and loss to TensorBoard
      self.logger.experiment.add_scalar("Accuracy/Train", accuracy)
      self.logger.experiment.add_scalar("Loss/Train", loss)

      return {"loss": loss} 

  def validation_step(self, val_batch, batch_idx):
      x, y = val_batch
      output = self.forward(x)
      loss = self.my_loss(output, y)
      self.logger.experiment.add_scalar("Loss/Val", loss)
      return {"loss":loss}

  def test_step(self, test_batch, batch_idx):
      x, y = test_batch
      output = self.forward(x)
      loss = self.my_loss(output, y)
      return {"loss":loss}

 



## Declare a [TensorBoard](https://www.tensorflow.org/tensorboard) logger. 

Here, we're going to set up logging to [TensorBoard](https://www.tensorflow.org/tensorboard) (the most popular way of displaying data about training your deep net), both before and after traning.  Here is a nice [tutorial on using TensorBoard with PyTorch Lightning](https://www.learnopencv.com/tensorboard-with-pytorch-lightning/).


In [None]:
# To clear out TensorBoard and start totally fresh, you need to
# remove old logs by deleting them from the directory
!rm -rf ./lightning_logs/

# This will help me time-stamp my logs 
mytime = datetime.datetime.now().strftime("%I:%M%p on %B %d, %Y")

# define how to log information about training to tensorboard
tb_logger = pl_loggers.TensorBoardLogger('lightning_logs/','CPU',mytime)




## Declare the model and the data module

In [None]:
# load and format our data
data_module = MyDataModule()
# define our model and how it will train
model  = MyLightningModule()

# saving the weights of the model for later comparison
untrained_model = MyLightningModule()
untrained_model.load_state_dict(model.state_dict())


## Actually train the model

PyTorch Lightning saves you from having to write a training loop because there is a [Trainer](https://pytorch-lightning.readthedocs.io/en/latest/trainer.html?highlight=Trainer) that handles calling the right DataLoader to hand a batch of training data to the Module defining the network model. The trainer is where you define the number of epochs to train for, as an example.

Now that we've definde the network structure and dataflow in the PyTorch LigningModule, and we've defined how to load and format data in the LightningDataModule, we're ready to run the main training loop. This is where we declare the [Trainer](https://pytorch-lightning.readthedocs.io/en/latest/trainer.html?highlight=Trainer), which calls these other modules at the appropriate time.

Note that logging to TensorBoard is happening due to some calls in the LightningModule that declared what happens in a train step and a validation step.

In [None]:

# declare the traininer, which runs the training and validation loops automatically
trainer = pl.Trainer(logger=tb_logger, max_epochs = 2)

# just to measure how long training takes
start_time = process_time()    

# OK. Run the training loop
trainer.fit(model, data_module)

print("Elapsed time in seconds:",  process_time() -start_time) 





## Now put some more stuff on TensorBoard

We logged training and valdation loss to TensorBoard due to the calls made in the LightningModule that defined our model. There are other things you can put on TensorBoard, including dataset images, network structure and histograms of model weights. We'll do that here.

In [None]:
# put a histogram of layer 1's after-training weights on TensorBoard
tb_logger.experiment.add_histogram('histogram of model.layer_1.weight after training', model.layer_1.weight)

# Let's also do the before-training weights
tb_logger.experiment.add_histogram('histogram of model.layer_1.weight before training', untrained_model.layer_1.weight)

# put an example batch of data on TensorBoard
data, target = next(iter(model.val_dataloader()))
show_this = torchvision.utils.make_grid(data)
tb_logger.experiment.add_image('validation images', show_this)

#Putting the network structure on tensorboard
tb_logger.experiment.add_graph(model,data)

## View how training went using TensorBoard. 


Now we can start up [TensorBoard](https://www.tensorflow.org/tensorboard).TensorBoard provides  visualization and tooling for machine learning experimentation. Let's view what we've logged about training accuracy/loss, our data, and the model structure.

Here is a good basic [tutorial on using TensorBoard with Pytorch Lightning](https://www.learnopencv.com/tensorboard-with-pytorch-lightning/)

In [None]:
%load_ext tensorboard
%tensorboard --logdir lightning_logs/

## Check that we can access the GPU

Starting here, we're going to care about the GPU. Let's just check to make sure things are OK on that front. When you run the code below, you should see that CUDA is available = TRUE....if you want to do anything on the GPU. This won't affect CPU processing at all.

Note...if you're running CoLab then, for this test to show that CUDA is available, you'll need to go to the menu bar above and select "Runtime", then "Chang Runtime Type", then select "GPU". Once you've done that, this test may still fail. At that point, try "Runtime", then "Factory Reset Runtime".  Once you've done that, execute this notebook again, starting from the first cell.


In [None]:
# This will show us details about the GPU
!nvidia-smi

# This will tell us the torch version, in case there's an issue there (the latest version
# as of this writing is 1.6)
print("My version of PyTorch is: ", torch.__version__)

# If this turns out to be "true", we're probably in good shape
print("CUDA is available = ", torch.cuda.is_available())


## Train the model again, but this time on the GPU instead of the CPU.

Now...we do train a new model from scratch, but this time with the trainer set to gpus=1. That will tell it there is 1 GPU to use...and it will use that GPU. Here's the ONLY difference between this block of code and the previous block where we declared and ran a Trainer.

Previous:


```
trainer = pl.Trainer(logger=tb_logger, max_epochs = 1, callbacks=[MyCallback()])
```


Runs on GPU:



```
trainer = pl.Trainer(logger=tb_logger, max_epochs = 1, callbacks=[MyCallback()], gpus=[0])
```




In [None]:
# This will help me time-stamp my new log and keep is separate from the old one 
mytime = datetime.datetime.now().strftime("%I:%M%p on %B %d, %Y")
# define how to log information about training to tensorboard
tb_logger = pl_loggers.TensorBoardLogger('lightning_logs/','GPU',mytime)

# declare the traininer, which runs the training and validation loops automatically
trainer = pl.Trainer(logger=tb_logger, max_epochs = 2, gpus=1)

# OK. Run the training loop
start_time = process_time()    

trainer.fit(model, data_module)

print("Elapsed time in seconds:",  process_time() -start_time)  