[View the runnable example on GitHub](https://github.com/intel-analytics/BigDL/tree/main/python/nano/tutorial/notebook/training/tensorflow/accelerate_tensorflow_training_multi_instance.ipynb)

# Accelerate TensorFlow Keras Training using Multiple Instances

BigDL-Nano provides `bigdl.nano.tf.keras.Model` and `bigdl.nano.tf.keras.Sequential` which extend `tf.keras.Model` and `tf.keras.Sequential` separately with various optimizations. To use multi-instance training on a server with multiple CPU cores or sockets, you just replace `tf.keras.Model`/`Sequential` in your code with `bigdl.nano.tf.keras.Model`/`Sequential`, and call `fit` with specified `num_processes`.

To use multiple instances for TensorFlow Keras training, you need to install BigDL-Nano for TensorFlow:

In [None]:
# install the nightly-built version of bigdl-nano for tensorflow;
# intel-tensorflow will be installed at the meantime with intel's oneDNN optimizations enabled by default
!pip install --pre --upgrade bigdl-nano[tensorflow]
!source bigdl-nano-init  # set environment variables

> 📝 **Note**
>
> Before starting your TensorFlow Keras application, it is highly recommended to run `source bigdl-nano-init` to set several environment variables based on your current hardware. Empirically, these variables will bring big performance increase for most TensorFlow Keras applications on training workloads.

> ⚠️ **Warning**
> 
> For Jupyter Notebook users, we recommend to run the commands above, especially `source bigdl-nano-init` before jupyter kernel is started, or some of the optimizations may not take effect.

In [None]:
# install dependency for the dataset used in the following example
!pip install tensorflow-datasets

First, **import** `Model` **or** `Sequential` **from** `bigdl.nano.tf.keras` **instead of** `tf.keras`. Let’s take the `Model` class here as an example:

In [None]:
# from tf.keras import Model
from bigdl.nano.tf.keras import Model

Suppose we would like to train a [ResNet50 model](https://keras.io/api/applications/resnet/#resnet50-function) (pretrained on ImageNet dataset) on the [imagenette](https://www.tensorflow.org/datasets/catalog/imagenette) dataset, we need to create the corresponding train/test datasets, and define the model:

In [None]:
# Define train/test datasets creator, and the model inputs outputs
 
import tensorflow as tf
import tensorflow_datasets as tfds

def create_datasets(img_size, batch_size):
    (train_ds, test_ds), info = tfds.load('imagenette/320px-v2',
                                          data_dir='/tmp/data',
                                          split=['train', 'validation'],
                                          with_info=True,
                                          as_supervised=True)
    
    num_classes = info.features['label'].num_classes
    
    def preprocessing(img, label):
        return tf.image.resize(img, (img_size, img_size)), \
               tf.one_hot(label, num_classes)

    train_ds = train_ds.repeat().map(preprocessing).batch(batch_size)
    test_ds = test_ds.map(preprocessing).batch(batch_size)
    return train_ds, test_ds, info


from tensorflow.keras import layers
from tensorflow.keras.applications import ResNet50

def define_model_inputs_outputs(num_classes, img_size):
    inputs = tf.keras.layers.Input(shape=(img_size, img_size, 3))
    x = tf.cast(inputs, tf.float32)
    x = tf.keras.applications.resnet50.preprocess_input(x)
    backbone = ResNet50(weights='imagenet')
    backbone.trainable = False
    x = backbone(x)
    x = layers.Dense(512, activation='relu')(x)
    outputs = layers.Dense(num_classes, activation='softmax')(x)

    return inputs, outputs

In [None]:
# create train/test datasets
train_ds, test_ds, ds_info = create_datasets(img_size=224, batch_size=32)

# Model creation steps are the same as using tf.keras.Model
inputs, outputs = define_model_inputs_outputs(num_classes=ds_info.features['label'].num_classes, 
                                              img_size=224)

model = Model(inputs=inputs, outputs=outputs)
model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=['accuracy'])

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; _The definition of_ `create_datasets` _and_ `define_model_inputs_outputs` _can be found in the_ [runnable example](https://github.com/intel-analytics/BigDL/tree/main/python/nano/tutorial/notebook/training/tensorflow/accelerate_tensorflow_training_multi_instance.ipynb).

You could then **call the** `fit` **method with** `num_processes` **set to an integer larger than 1** to launch the specific number of processes for data-parallel training:

In [None]:
model.fit(train_ds,
          epochs=10,
          steps_per_epoch=(ds_info.splits['train'].num_examples // 32),
          num_processes=2)

> 📝 **Note**
>
> By setting `num_processes`, CPU cores will be automatically and evenly distributed among processes to avoid conflicts and maximize training throughput.
> 
> During Nano TensorFlow Keras multi-instance training, the effective batch size is still the `batch_size` specified in datasets (32 in this example). Because we choose to match the semantics of TensorFlow distributed training (`MultiWorkerMirroredStrategy`), which intends to split the batch into multiple sub-batches for different workers.

> 📚 **Related Readings**
> 
> - [How to install BigDL-Nano](https://bigdl.readthedocs.io/en/latest/doc/Nano/Overview/install.html)
> - [How to choose the number of processes for multi-instance training](https://bigdl.readthedocs.io/en/latest/doc/Nano/Howto/Training/General/choose_num_processes_training.html)