# Tutorial: PINA and Pytorch Lightning, training tips and visualizations 


In this tutorial, we will delve deeper into the functionality of the `Trainer` class, which serves as the cornerstone for training **PINA** [Solvers](https://mathlab.github.io/PINA/_rst/_code.html#solvers). 

The `Trainer` class offers a plethora of features aimed at improving model accuracy, reducing training time and memory usage, facilitating logging visualization, and more.

Our leading example will revolve around solving the `SimpleODE` problem, as outlined in the [*Introduction to PINA for Physics Informed Neural Networks training*](https://github.com/mathLab/PINA/blob/master/tutorials/tutorial1/tutorial.ipynb). If you haven't already explored it, we highly recommend doing so before diving into this tutorial.

Let's start by importing useful modules, define the `SimpleODE` problem and the `PINN` solver.

In [1]:
import torch

from pina import Condition, Trainer
from pina.solvers import PINN
from pina.model import FeedForward
from pina.problem import SpatialProblem
from pina.operators import grad
from pina.geometry import CartesianDomain
from pina.equation import Equation, FixedValue

class SimpleODE(SpatialProblem):

    output_variables = ['u']
    spatial_domain = CartesianDomain({'x': [0, 1]})

    # defining the ode equation
    def ode_equation(input_, output_):
        u_x = grad(output_, input_, components=['u'], d=['x'])
        u = output_.extract(['u'])
        return u_x - u

    # conditions to hold
    conditions = {
        'x0': Condition(location=CartesianDomain({'x': 0.}), equation=FixedValue(1)),             # We fix initial condition to value 1
        'D': Condition(location=CartesianDomain({'x': [0, 1]}), equation=Equation(ode_equation)), # We wrap the python equation using Equation
    }

    # defining the true solution
    def truth_solution(self, pts):
        return torch.exp(pts.extract(['x']))
    

# sampling for training
problem = SimpleODE()
problem.discretise_domain(1, 'random', locations=['x0'])
problem.discretise_domain(20, 'lh', locations=['D'])

# build the model
model = FeedForward(
    layers=[10, 10],
    func=torch.nn.Tanh,
    output_dimensions=len(problem.output_variables),
    input_dimensions=len(problem.input_variables)
)

# create the PINN object
pinn = PINN(problem, model)

Till now we just followed the extact step of the previous tutorials. The `Trainer` object
can be initialized by simiply passing the `PINN` solver

In [3]:
trainer = Trainer(solver=pinn)

GPU available: True (mps), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs


## Trainer Accelerator

When creating the trainer, **by defualt** the `Trainer` will choose the most performing `accelerator` for training which is available in your system, ranked as follow:
1. [TPU](https://cloud.google.com/tpu/docs/intro-to-tpu)
2. [IPU](https://www.graphcore.ai/products/ipu)
3. [HPU](https://habana.ai/)
4. [GPU](https://www.intel.com/content/www/us/en/products/docs/processors/what-is-a-gpu.html#:~:text=What%20does%20GPU%20stand%20for,video%20editing%2C%20and%20gaming%20applications) or [MPS](https://developer.apple.com/metal/pytorch/)
5. CPU

For setting manually the `accelerator` run:

* `accelerator = {'gpu', 'cpu', 'hpu', 'mps', 'cpu', 'ipu'}` sets the accelerator to a specific one

In [5]:
trainer = Trainer(solver=pinn, accelerator='cpu')

GPU available: True (mps), used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs


as you can see, even if in the used system `GPU` is available, it is not used since we set `accelerator='cpu'`.

## Trainer Logging

In `PINA` you can logged the metrics in different ways. The simplest one is to use the `MetricTraker` class from `pina.callbacks` as seen in the [*Introduction to PINA for Physics Informed Neural Networks training*](https://github.com/mathLab/PINA/blob/master/tutorials/tutorial1/tutorial.ipynb) tutorial.

However, expecially when we need to train multiple times to get an average of the loss across multiple runs, `pytorch_lightning.loggers` might be useful. Here we will use `TensorBoardLogger` (more on [logging](https://lightning.ai/docs/pytorch/stable/extensions/logging.html) here), but you can choose the one you prefer (or make your own one) thanks to the amazing job done by the PytorchLightning team!

We will now import `TensorBoardLogger`, do three runs of training and then visualize the results. Notice we set `enable_model_summary=False` to avoid model summary specifications (e.g. number of parameters), set it to true if needed.


In [7]:
from pytorch_lightning.loggers import TensorBoardLogger

# three run of training, by default it trains for 1000 epochs
# we reinitialize the model each time otherwise the same parameters will be optimized
for _ in range(3):
    model = FeedForward(
        layers=[10, 10],
        func=torch.nn.Tanh,
        output_dimensions=len(problem.output_variables),
        input_dimensions=len(problem.input_variables)
    )
    pinn = PINN(problem, model)
    trainer = Trainer(solver=pinn, accelerator='cpu', logger=TensorBoardLogger(save_dir='simpleode'), enable_model_summary=False)
    trainer.train()

GPU available: True (mps), used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs


Epoch 999: 100%|██████████| 1/1 [00:00<00:00, 303.14it/s, v_num=3, x0_loss=1.09e-5, D_loss=0.000242, mean_loss=0.000127]

`Trainer.fit` stopped: `max_epochs=1000` reached.


Epoch 999: 100%|██████████| 1/1 [00:00<00:00, 171.98it/s, v_num=3, x0_loss=1.09e-5, D_loss=0.000242, mean_loss=0.000127]


GPU available: True (mps), used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs


Epoch 999: 100%|██████████| 1/1 [00:00<00:00, 269.11it/s, v_num=4, x0_loss=1.97e-5, D_loss=0.000483, mean_loss=0.000251]

`Trainer.fit` stopped: `max_epochs=1000` reached.


Epoch 999: 100%|██████████| 1/1 [00:00<00:00, 156.86it/s, v_num=4, x0_loss=1.97e-5, D_loss=0.000483, mean_loss=0.000251]


GPU available: True (mps), used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs


Epoch 999: 100%|██████████| 1/1 [00:00<00:00, 268.38it/s, v_num=5, x0_loss=2.44e-5, D_loss=0.000865, mean_loss=0.000444]

`Trainer.fit` stopped: `max_epochs=1000` reached.


Epoch 999: 100%|██████████| 1/1 [00:00<00:00, 150.15it/s, v_num=5, x0_loss=2.44e-5, D_loss=0.000865, mean_loss=0.000444]


We can now visualize the logs by simply running `tensorboard --logdir=simpleode/` on terminal, you should obtain a webpage as the one shown below:

<p align=\"center\">
<img src="logging.png" alt=\"Logging API\" width=\"400\"/>
</p>

as you can see, by default, **PINA** logs the losses which are shown in the progress bar, as well as the number of epochs. You can always insert more loggings by either defining a **callback** ([more on callbacks](https://lightning.ai/docs/pytorch/stable/extensions/callbacks.html)), or inheriting the solver and modify the programs with different **hooks** ([more on hooks](https://lightning.ai/docs/pytorch/stable/common/lightning_module.html#hooks)).

## Trainer Callbacks

Whenever we need to access certain steps of the training for logging, do static modifications (i.e. not changing the solver) or updating problem hyperparameters (static variables), we can use `Callabacks`. Notice that `Callbacks` allow you to add arbitrary self-contained programs to your training. At specific points during the flow of execution (hooks), the Callback interface allows you to design programs that encapsulate a full set of functionality. It de-couples functionality that does not need to be in **PINA** `Solver`s.
Lightning has a callback system to execute them when needed. Callbacks should capture NON-ESSENTIAL logic that is NOT required for your lightning module to run.

The following are best practices when using/designing callbacks.

* Callbacks should be isolated in their functionality.
* Your callback should not rely on the behavior of other callbacks in order to work properly.
* Do not manually call methods from the callback.
* Directly calling methods (eg. on_validation_end) is strongly discouraged.
* Whenever possible, your callbacks should not depend on the order in which they are executed.

We will try now to implement a naive version of `MetricTraker` to show how callbacks work. Notice that this is a very easy application of callbacks, fortunately in **PINA** we already provide more advanced callbacks (such as switching optimizer during training).

<!-- Suppose we want to log the accuracy on some validation poit -->

In [23]:
from pytorch_lightning.callbacks import Callback
import torch

# define a simple callback
class NaiveMetricTracker(Callback):
    def __init__(self):
        self.saved_metrics = []

    def on_train_epoch_end(self, trainer, __): # function called at the end of each epoch
        self.saved_metrics.append(
            {key: value for key, value in trainer.logged_metrics.items()}
        )

Let's see the results when applyed to the `SimpleODE` problem. You can define callbacks when initializing the `Trainer` by the `callbacks` argument, which expects a list of callbacks. 

In [25]:
model = FeedForward(
        layers=[10, 10],
        func=torch.nn.Tanh,
        output_dimensions=len(problem.output_variables),
        input_dimensions=len(problem.input_variables)
    )
pinn = PINN(problem, model)
trainer = Trainer(solver=pinn,
                  accelerator='cpu',
                  logger=TensorBoardLogger(save_dir='simpleode'),
                  enable_model_summary=False,
                  callbacks=[NaiveMetricTracker()])  # adding a callbacks
trainer.train()

GPU available: True (mps), used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs


Epoch 999: 100%|██████████| 1/1 [00:00<00:00, 251.26it/s, v_num=1, x0_loss=0.000669, D_loss=0.00561, mean_loss=0.00314]

`Trainer.fit` stopped: `max_epochs=1000` reached.


Epoch 999: 100%|██████████| 1/1 [00:00<00:00, 151.22it/s, v_num=1, x0_loss=0.000669, D_loss=0.00561, mean_loss=0.00314]


We can easily access the data by calling `trainer.callbacks[0].saved_metrics` (notice the zero represents the first callback in the list given in the initialization).

In [27]:
trainer.callbacks[0].saved_metrics[:3] # only the first three epochs

[{'x0_loss': tensor(1.6162),
  'D_loss': tensor(0.1902),
  'mean_loss': tensor(0.9032)},
 {'x0_loss': tensor(1.5920),
  'D_loss': tensor(0.1786),
  'mean_loss': tensor(0.8853)},
 {'x0_loss': tensor(1.5681),
  'D_loss': tensor(0.1675),
  'mean_loss': tensor(0.8678)}]