# Tutorial 5: Trace - training control and debugging

In this tutorial, we will talk about another important concept in FastEstimator - __Trace__. It will mainly be used to control training and for debugging.

`Trace` is a class, which contains 6 event functions listed below. Each event function will be executed on different events of the training loop when the user adds `Trace` inside `Estimator`. If you are a Keras user, you can think of `Trace` as a combination of callbacks and metrics. 
* on_begin
* on_epoch_begin
* on_batch_begin
* on_batch_end
* on_epoch_end
* on_end

`Trace` differs from keras's callback in the following places:
1. Trace has full access to the preprocessing data and prediction data
2. Different traces can pass data among each other
3. Trace is simpler and has fewer event functions than keras callbacks

`Trace` can be used for anything that involves training loop, such as changing learning rate, calculating metrics, writing checkpoints...

In [None]:
# Import libraries
import tempfile
import numpy as np
import tensorflow as tf
import fastestimator as fe

## Using Trace to debug training loop

Since `Trace` can have full access to data used in training loop, one natural usage of `Trace` is debugging training loop, for example, printing network prediction for each batch.

Remember in tutorial 3, we customized an operation that scales the prediction score by 10 and writes a new key, let's see whether the operation is working correctly using `Trace`.

### 1) Define the operation to test,  pipeline and network.

In [2]:
from fastestimator.architecture import LeNet
from fastestimator.op.tensorop.model import ModelOp
from fastestimator.op.tensorop.loss import SparseCategoricalCrossentropy
from fastestimator.op.tensorop import Minmax
from fastestimator.op import TensorOp

# We define the scaling operation.
class Scale(TensorOp):
    def forward(self, data, state):
        data = data * 10
        return data

# We load data, create dictionnaries and prepare the Pipeline.
(x_train, y_train), (x_eval, y_eval) = tf.keras.datasets.mnist.load_data()
train_data = {"x": np.expand_dims(x_train, -1), "y": y_train}
eval_data = {"x": np.expand_dims(x_eval, -1), "y": y_eval}
data = {"train": train_data, "eval": eval_data}
pipeline = fe.Pipeline(batch_size=32, data=data, ops=Minmax(inputs="x", outputs="x"))

# We prepare the model and network, which will use the scaling operation.
model = fe.build(model_def=LeNet, model_name="lenet", optimizer="adam", loss_name="loss")
network = fe.Network(
    ops=[ModelOp(inputs="x", model=model, outputs="y_pred"), 
         SparseCategoricalCrossentropy(inputs=("y", "y_pred"),outputs="loss"), 
         Scale(inputs="y_pred", outputs="y_pred_scaled")])

### 2) Define the trace
We want to display, at the end of each batch during training, the scaled prediction.

We can access the batch_data with `state["batch"]` and then print the information we want to check, here:
- the step ("batch_idx")
- keys of the data (what is contained in each batch data: y, y_pred, y_pred_scaled, loss...)
- scaled prediction ("y_pred_scaled").

In [3]:
from fastestimator.trace import Trace
from fastestimator.trace import Accuracy, ModelSaver

# We define a trace to show the predictions and test the scaling op.
class ShowPred(Trace):

    def on_batch_end(self, state): # We only want to show predictions at the end of the batch
        if state["mode"] == "train": # and only during training
            batch_data = state["batch"] 
            print("step: {}".format(state["batch_idx"]))
            print("batch data has following keys: {}".format(list(batch_data.keys())))
            print("scaled_prediction is:")
            print(batch_data["y_pred_scaled"])

# We finally define the estimator, specifying the trace argument. For debugging, we only use one epoch with one step.
estimator = fe.Estimator(network=network, pipeline=pipeline, epochs=1, traces=ShowPred(), steps_per_epoch=1)

In [4]:
# We launch the training and can see what the scaled prediction looks like.
estimator.fit()

    ______           __  ______     __  _                 __            
   / ____/___ ______/ /_/ ____/____/ /_(_)___ ___  ____ _/ /_____  _____
  / /_  / __ `/ ___/ __/ __/ / ___/ __/ / __ `__ \/ __ `/ __/ __ \/ ___/
 / __/ / /_/ (__  ) /_/ /___(__  ) /_/ / / / / / / /_/ / /_/ /_/ / /    
/_/    \__,_/____/\__/_____/____/\__/_/_/ /_/ /_/\__,_/\__/\____/_/     
                                                                        

FastEstimator-Warn: No ModelSaver Trace detected. Models will not be saved.
FastEstimator-Start: step: 0; total_train_steps: 1; lenet_lr: 0.001; 
step: 0
batch data has following keys: ['x', 'y_pred_scaled', 'y', 'y_pred', 'loss']
scaled_prediction is:
tf.Tensor(
[[0.96793294 0.9402648  0.93156326 0.9278286  0.9109819  0.9401245
  1.0638573  1.0010896  1.091351   1.2250059 ]
 [1.038612   0.9753925  0.9636226  0.9484253  0.89349127 0.95113695
  0.98204666 0.99905026 1.0780157  1.1702076 ]
 [1.029746   0.95729184 0.8991195  1.004261   0.90899205 0.96533054
