# Requirements

Installation of required packages:
```
conda install -c pytorch torchvision
conda install -c anaconda pillow==6.2.1
conda install -c conda-forge tensorboard
```

In [1]:
import itertools
import os
import signal
import subprocess
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision.transforms as transforms
import tensorboard_utils

from torchvision.datasets import MNIST
from torch.optim import Adam
from torch.utils.data import DataLoader
from torch.utils.tensorboard import SummaryWriter

ModuleNotFoundError: No module named 'tensorboard_utils'

# Tensorboard setup

Run data will be stored in subdirectories of `TENSORBOARD_LOG_DIR` and named through the name generated by `util.generate_run_name()`.

In [2]:
TENSORBOARD_LOG_DIR = os.path.join(os.getcwd(), 'runs')  # specify where logs should be stored
TENSORBOARD_PORT = 6006  # tensorboard will run on localhost on this port
writer = SummaryWriter(log_dir=os.path.join(TENSORBOARD_LOG_DIR, tensorboard_utils.generate_run_name()))

# Run tensorboard

Following command will run Tensorboard with specified log directory and on specified port **in the background**. `tensorboard` variable stores information about the process so we can shut it down after we're done with training (see last section).

In [3]:
run_tensorboard = f'tensorboard --logdir {TENSORBOARD_LOG_DIR} --port {TENSORBOARD_PORT} &'
tensorboard = subprocess.Popen(run_tensorboard, 
                               shell=True,  # see https://stackoverflow.com/a/19152273 and https://stackoverflow.com/a/9935511
                               preexec_fn=os.setsid)  # see https://stackoverflow.com/a/4791612

Tensorboard should be now accessible on `localhost:TENSORBOARD_PORT`

# Example network training with outputs to writer

Based on [PyTorch tensorboard tutorial](https://pytorch.org/docs/stable/tensorboard.html)

In [4]:
# Dataset
transform = transforms.Compose([transforms.ToTensor()])  # see https://discuss.pytorch.org/t/image-file-reading-typeerror-batch-must-contain-tensors-numbers-dicts-or-lists-found-class-pil-image-image/9909/2
train_dataset = MNIST('./data/', transform=transform, train=True, download=True)
train_loader = DataLoader(train_dataset, batch_size=1)

In [5]:
# Model - LeNet-5 from https://engmrk.com/lenet-5-a-classic-cnn-architecture/
class LeNet(nn.Module):
    def __init__(self):
        super(LeNet, self).__init__()
        self.blocks = nn.Sequential(
            nn.Conv2d(1, 6, 5, padding=2),
            nn.Tanh(),
            nn.AvgPool2d(2),
            nn.Conv2d(6, 16, 5),
            nn.Tanh(),
            nn.AvgPool2d(2),
            nn.Conv2d(16, 120, 5),
            nn.Tanh(),
            nn.Flatten(),
            nn.Linear(120, 84),
            nn.Tanh(),
            nn.Linear(84, 10),
            nn.Softmax(dim=1)
        )

    def forward(self, x):
        return self.blocks(x)

model = LeNet()

In [6]:
# Optimizer
optimizer = Adam(model.parameters())

In [7]:
for epoch in range(10):
    training_loss = 0.0
    for x, gt in itertools.islice(train_loader, 1000):  # subsampling https://stackoverflow.com/a/44982812     
        # Prepare target variable
        batch_size = gt.size()[0]
        target = torch.zeros((batch_size, 10))
        target[torch.arange(0, batch_size, dtype=torch.long), gt] = 1
        
        # Forward & backward pass
        optimizer.zero_grad()
        pred = model(x)
        loss = F.binary_cross_entropy(pred, target)
        loss.backward()
        optimizer.step()
        
        # Update metrics
        training_loss += loss.item()
    
    # Update tensorboard
    writer.add_scalar('Loss/Train', training_loss, epoch)

# Shutdown tensorboard

Once the training is done we can shut down the tensorboard process (actually it's whole process group, since `tensorboard.pid` holds shell pid that spawned tensorboard, see: [this Stackoverflow response](https://stackoverflow.com/a/31048354) and the one it links to)

In [8]:
os.killpg(os.getpgid(tensorboard.pid), signal.SIGTERM)  # see https://stackoverflow.com/a/4791612

# Remote tensorboard

To run tensorboard remotely (on iccluster) and monitor it on a local computer we need to establish ssh tunnel with the following port forwarding configuration (see [this Stack post](https://stackoverflow.com/a/42445070)):

```
ssh -N -f -L localhost:16006:localhost:6006 <gaspar_username>@iccluster135.iccluster.epfl.ch

```

Ports may be adjusted to one's liking, with current configuration in this notebook tensorboard runs on port 6006 and with this command would be forwarded to local port 16006.