# Tutorial 7: Estimator

## Overview
In this tutorial, we will talk about:
* [Estimator API](#t07estimator)
    * [Reducing the number of training steps per epoch](#t07train)
    * [Reducing the number of evaluation steps per epoch](#t07eval)
    * [Changing logging behavior](#t07logging)
    * [Monitoring intermediate results during training](#t07intermediate)
* [Trace](#t07trace)
    * [Concept](#t07concept)
    * [Structure](#t07structure)
    * [Usage](#t07usage)
* [Model Testing](#t07testing)
* [Related Apphub Examples](#t07apphub)

`Estimator` is the API that manages everything related to the training loop. It combines `Pipeline` and `Network` together and provides users with fine-grain control over the training loop. Before we demonstrate different ways to control the training loop let's define a template similar to [tutorial 1](./t01_getting_started.ipynb), but this time we will use a PyTorch model.

In [1]:
import fastestimator as fe
from fastestimator.architecture.pytorch import LeNet
from fastestimator.dataset.data import mnist
from fastestimator.op.numpyop.univariate import ExpandDims, Minmax
from fastestimator.op.tensorop.loss import CrossEntropy
from fastestimator.op.tensorop.model import ModelOp, UpdateOp
import tempfile

def get_estimator(log_steps=100, monitor_names=None, use_trace=False, max_train_steps_per_epoch=None, epochs=2):
    # step 1
    train_data, eval_data = mnist.load_data()
    test_data = eval_data.split(0.5)
    pipeline = fe.Pipeline(train_data=train_data,
                           eval_data=eval_data,
                           test_data=test_data,
                           batch_size=32,
                           ops=[ExpandDims(inputs="x", outputs="x", axis=0), Minmax(inputs="x", outputs="x")])
    # step 2
    model = fe.build(model_fn=LeNet, optimizer_fn="adam", model_name="LeNet")
    network = fe.Network(ops=[
        ModelOp(model=model, inputs="x", outputs="y_pred"),
        CrossEntropy(inputs=("y_pred", "y"), outputs="ce"),
        CrossEntropy(inputs=("y_pred", "y"), outputs="ce1"),
        UpdateOp(model=model, loss_name="ce")
    ])
    # step 3
    traces = None
    if use_trace:
        traces = [Accuracy(true_key="y", pred_key="y_pred"), 
                  BestModelSaver(model=model, save_dir=tempfile.mkdtemp(), metric="accuracy", save_best_mode="max")]
    estimator = fe.Estimator(pipeline=pipeline,
                             network=network,
                             epochs=epochs,
                             traces=traces,
                             max_train_steps_per_epoch=max_train_steps_per_epoch,
                             log_steps=log_steps,
                             monitor_names=monitor_names)
    return estimator

Let's train our model using the default `Estimator` arguments:

In [2]:
est = get_estimator()
est.fit()

    ______           __  ______     __  _                 __            
   / ____/___ ______/ /_/ ____/____/ /_(_)___ ___  ____ _/ /_____  _____
  / /_  / __ `/ ___/ __/ __/ / ___/ __/ / __ `__ \/ __ `/ __/ __ \/ ___/
 / __/ / /_/ (__  ) /_/ /___(__  ) /_/ / / / / / / /_/ / /_/ /_/ / /    
/_/    \__,_/____/\__/_____/____/\__/_/_/ /_/ /_/\__,_/\__/\____/_/     
                                                                        

FastEstimator-Warn: No ModelSaver Trace detected. Models will not be saved.
FastEstimator-Start: step: 1; num_device: 0; logging_interval: 100; 
FastEstimator-Train: step: 1; ce: 2.2985432; 
FastEstimator-Train: step: 100; ce: 0.33677763; steps/sec: 43.98; 
FastEstimator-Train: step: 200; ce: 0.3549296; steps/sec: 45.97; 
FastEstimator-Train: step: 300; ce: 0.17926084; steps/sec: 46.62; 
FastEstimator-Train: step: 400; ce: 0.32462734; steps/sec: 46.91; 
FastEstimator-Train: step: 500; ce: 0.05164891; steps/sec: 47.18; 
FastEstimator-Train: step: 600; ce: 

<a id='t07estimator'></a>

## Estimator API

<a id='t07train'></a>

### Reduce the number of training steps per epoch
In general, one epoch of training means that every element in the training dataset will be visited exactly one time. If evaluation data is available, evaluation happens after every epoch by default. Consider the following two scenarios:

* The training dataset is very large such that evaluation needs to happen multiple times during one epoch.
* Different training datasets are being used for different epochs, but the number of training steps should be consistent between each epoch.

One easy solution to the above scenarios is to limit the number of training steps per epoch. For example, if we want to train for only 300 steps per epoch, with training lasting for 4 epochs (1200 steps total), we would do the following:

In [3]:
est = get_estimator(max_train_steps_per_epoch=300, epochs=4)
est.fit()

    ______           __  ______     __  _                 __            
   / ____/___ ______/ /_/ ____/____/ /_(_)___ ___  ____ _/ /_____  _____
  / /_  / __ `/ ___/ __/ __/ / ___/ __/ / __ `__ \/ __ `/ __/ __ \/ ___/
 / __/ / /_/ (__  ) /_/ /___(__  ) /_/ / / / / / / /_/ / /_/ /_/ / /    
/_/    \__,_/____/\__/_____/____/\__/_/_/ /_/ /_/\__,_/\__/\____/_/     
                                                                        

FastEstimator-Warn: No ModelSaver Trace detected. Models will not be saved.
FastEstimator-Start: step: 1; num_device: 0; logging_interval: 100; 
FastEstimator-Train: step: 1; ce: 2.3073506; 
FastEstimator-Train: step: 100; ce: 0.5364497; steps/sec: 38.56; 
FastEstimator-Train: step: 200; ce: 0.17832895; steps/sec: 42.4; 
FastEstimator-Train: step: 300; ce: 0.2198829; steps/sec: 41.62; 
FastEstimator-Train: step: 300; epoch: 1; epoch_time: 7.42 sec; 
FastEstimator-Eval: step: 300; epoch: 1; ce: 0.15399536; 
FastEstimator-Train: step: 400; ce: 0.13039914; s

<a id='t07eval'></a>

### Reduce the number of evaluation steps per epoch
One may need to reduce the number of evaluation steps for debugging purpose. This can be easily done by setting the `max_eval_steps_per_epoch` argument in `Estimator`.

<a id='t07logging'></a>

### Change logging behavior
When the number of training epochs is large, the log can become verbose. You can change the logging behavior by choosing one of following options:
* set `log_steps` to `None` if you do not want to see any training logs printed.
* set `log_steps` to 0 if you only wish to see the evaluation logs.
* set `log_steps` to some integer 'x' if you want training logs to be printed every 'x' steps.

Let's set the `log_steps` to 0:

In [4]:
est = get_estimator(max_train_steps_per_epoch=300, epochs=4, log_steps=0)
est.fit()

    ______           __  ______     __  _                 __            
   / ____/___ ______/ /_/ ____/____/ /_(_)___ ___  ____ _/ /_____  _____
  / /_  / __ `/ ___/ __/ __/ / ___/ __/ / __ `__ \/ __ `/ __/ __ \/ ___/
 / __/ / /_/ (__  ) /_/ /___(__  ) /_/ / / / / / / /_/ / /_/ /_/ / /    
/_/    \__,_/____/\__/_____/____/\__/_/_/ /_/ /_/\__,_/\__/\____/_/     
                                                                        

FastEstimator-Warn: No ModelSaver Trace detected. Models will not be saved.
FastEstimator-Start: step: 1; num_device: 0; logging_interval: 0; 
FastEstimator-Eval: step: 300; epoch: 1; ce: 0.15603326; 
FastEstimator-Eval: step: 600; epoch: 2; ce: 0.09531953; 
FastEstimator-Eval: step: 900; epoch: 3; ce: 0.06877253; 
FastEstimator-Eval: step: 1200; epoch: 4; ce: 0.05356282; 
FastEstimator-Finish: step: 1200; total_time: 36.81 sec; LeNet_lr: 0.001; 


<a id='t07intermediate'></a>

### Monitor intermediate results
You might have noticed that in our example `Network` there is an op: `CrossEntropy(inputs=("y_pred", "y") outputs="ce1")`. However, the `ce1` never shows up in the training log above. This is because FastEstimator identifies and filters out unused variables to reduce unnecessary communication between the GPU and CPU. On the contrary, `ce` shows up in the log because by default we log all loss values that are used to update models.

But what if we want to see the value of `ce1` throughout training?

Easy: just add `ce1` to `monitor_names` in `Estimator`.

In [5]:
est = get_estimator(max_train_steps_per_epoch=300, epochs=4, log_steps=150, monitor_names="ce1")
est.fit()

    ______           __  ______     __  _                 __            
   / ____/___ ______/ /_/ ____/____/ /_(_)___ ___  ____ _/ /_____  _____
  / /_  / __ `/ ___/ __/ __/ / ___/ __/ / __ `__ \/ __ `/ __/ __ \/ ___/
 / __/ / /_/ (__  ) /_/ /___(__  ) /_/ / / / / / / /_/ / /_/ /_/ / /    
/_/    \__,_/____/\__/_____/____/\__/_/_/ /_/ /_/\__,_/\__/\____/_/     
                                                                        

FastEstimator-Warn: No ModelSaver Trace detected. Models will not be saved.
FastEstimator-Start: step: 1; num_device: 0; logging_interval: 150; 
FastEstimator-Train: step: 1; ce1: 2.30421; ce: 2.30421; 
FastEstimator-Train: step: 150; ce1: 0.35948867; ce: 0.35948867; steps/sec: 38.23; 
FastEstimator-Train: step: 300; ce1: 0.16791707; ce: 0.16791707; steps/sec: 40.98; 
FastEstimator-Train: step: 300; epoch: 1; epoch_time: 7.64 sec; 
FastEstimator-Eval: step: 300; epoch: 1; ce1: 0.2302698; ce: 0.2302698; 
FastEstimator-Train: step: 450; ce1: 0.14853987; ce:

As we can see, both `ce` and `ce1` showed up in the log above. Unsurprisingly, their values are identical because because they have the same inputs and forward function.

<a id='t07trace'></a>

## Trace

<a id='t07concept'></a>

### Concept
Now you might be thinking: 'changing logging behavior and monitoring extra keys is cool, but where is the fine-grained access to the training loop?' 

The answer is `Trace`. `Trace` is a module that can offer you access to different training stages and allow you "do stuff" with them. Here are some examples of what a `Trace` can do:

* print any training data at any training step
* write results to a file during training
* change learning rate based on some loss conditions
* calculate any metrics 
* order you a pizza after training ends
* ...

So what are the different training stages? They are:

* Beginning of training
* Beginning of epoch
* Beginning of batch
* End of batch
* End of epoch
* End of training

<img src="../resources/t07_trace_concept.png" alt="drawing" width="500"/>

As we can see from the illustration above, the training process is essentially a nested combination of batch loops and epoch loops. Over the course of training, `Trace` places 6 different "road blocks" for you to leverage.

<a id='t07structure'></a>

### Structure
If you are familiar with Keras, you will notice that the structure of `Trace` is very similar to the `Callback` in keras.  Despite the structural similarity, `Trace` gives you a lot more flexibility which we will talk about in depth in [advanced tutorial 4](../advanced/t04_trace.ipynb). Implementation-wise, `Trace` is a python class with the following structure:

In [6]:
class Trace:
    def __init__(self, inputs=None, outputs=None, mode=None):
        self.inputs = inputs
        self.outputs = outputs
        self.mode = mode

    def on_begin(self, data):
        """Runs once at the beginning of training"""

    def on_epoch_begin(self, data):
        """Runs at the beginning of each epoch"""

    def on_batch_begin(self, data):
        """Runs at the beginning of each batch"""

    def on_batch_end(self, data):
        """Runs at the end of each batch"""

    def on_epoch_end(self, data):
        """Runs at the end of each epoch"""

    def on_end(self, data):
        """Runs once at the end training"""

Given the structure, users can customize their own functions at different stages and insert them into the training loop. We will leave the customization of `Traces` to the advanced tutorial. For now, let's use some pre-built `Traces` from FastEstimator.

During the training loop in our earlier example, we want 2 things to happen:
1. Save the model weights if the evaluation loss is the best we have seen so far
2. Calculate the model accuracy during evaluation

<a id='t07usage'></a>

In [7]:
from fastestimator.trace.io import BestModelSaver
from fastestimator.trace.metric import Accuracy

est = get_estimator(use_trace=True)
est.fit()

    ______           __  ______     __  _                 __            
   / ____/___ ______/ /_/ ____/____/ /_(_)___ ___  ____ _/ /_____  _____
  / /_  / __ `/ ___/ __/ __/ / ___/ __/ / __ `__ \/ __ `/ __/ __ \/ ___/
 / __/ / /_/ (__  ) /_/ /___(__  ) /_/ / / / / / / /_/ / /_/ /_/ / /    
/_/    \__,_/____/\__/_____/____/\__/_/_/ /_/ /_/\__,_/\__/\____/_/     
                                                                        

FastEstimator-Start: step: 1; num_device: 0; logging_interval: 100; 
FastEstimator-Train: step: 1; ce: 2.317368; 
FastEstimator-Train: step: 100; ce: 0.32270017; steps/sec: 38.37; 
FastEstimator-Train: step: 200; ce: 0.4691573; steps/sec: 41.07; 
FastEstimator-Train: step: 300; ce: 0.16797979; steps/sec: 41.48; 
FastEstimator-Train: step: 400; ce: 0.22231343; steps/sec: 40.29; 
FastEstimator-Train: step: 500; ce: 0.15864769; steps/sec: 40.23; 
FastEstimator-Train: step: 600; ce: 0.21094382; steps/sec: 40.3; 
FastEstimator-Train: step: 700; ce: 0.2174505; 

As we can see from the log, the model is saved in a predefined location and the accuracy is displayed during evaluation.

<a id='t07testing'></a>

## Model Testing

Sometimes you have a separate testing dataset other than training and evaluation data. If you want to evalate the model metrics on test data, you can simply call: 

In [8]:
est.test()

FastEstimator-Test: step: 3750; epoch: 2; accuracy: 0.9844; 


This will feed all of your test dataset through the `Pipeline` and `Network`, and finally execute the traces (in our case, compute accuracy) just like during the training.

<a id='t07apphub'></a>

## Apphub Examples
You can find some practical examples of the concepts described here in the following FastEstimator Apphubs:

* [UNet](../../apphub/semantic_segmentation/unet/unet.ipynb)