## Bigdl-nano Resnet example on CIFAR10 dataset
---
This example illustrates how to apply bigdl-nano optimizations on a image recognition case based on Tensorflow Keras framework. The basic image recognition module is implemented with tf.keras and trained on [CIFAR10](https://www.cs.toronto.edu/~kriz/cifar.html) image recognition Benchmark dataset. 

In [1]:
import os
from time import time

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.callbacks import ModelCheckpoint
from bigdl.nano.tf.keras import Model, Sequential

### CIFAR10 Data Module
---
Import the existing data module from keras.datasets and Normalize the images.
You could access [CIFAR10](https://www.cs.toronto.edu/~kriz/cifar.html) for a view of the whole dataset.

In [2]:
cifar10 = keras.datasets.cifar10
(train_images, train_labels), (test_images, test_labels) = cifar10.load_data()

train_images = train_images.astype('float32') / 255.0
test_images = test_images.astype('float32') / 255.0

# Convert class vectors to binary class matrices.
train_labels = keras.utils.to_categorical(train_labels, 10)
test_labels = keras.utils.to_categorical(test_labels, 10)

from sklearn.model_selection import train_test_split
train_images, val_images, train_labels, val_labels = train_test_split(train_images, train_labels, test_size=0.2, shuffle=True)

### Implement ResNet-18 model
---
Implement the resnet18 model for CIFAR10 dataset.

In [3]:
class BasicBlock(Model):
    """
        A standard resnet block
    """
    def __init__(self, channels:int, downsample = False):
        super().__init__()
        self.downsample = downsample
        self.conv1 = layers.Conv2D(filters=channels, strides=2 if downsample else 1, kernel_size=(3, 3),
                                         padding="same", kernel_initializer="he_normal")
        self.bn1 = layers.BatchNormalization()
        self.relu = layers.ReLU()
        self.conv2 = layers.Conv2D(filters=channels, strides=1, kernel_size=(3, 3), 
                                         padding="same", kernel_initializer="he_normal")
        self.bn2 = keras.layers.BatchNormalization()
        if downsample:
            self.downsample = keras.Sequential([
                keras.layers.Conv2D(filters=channels, strides=2, kernel_size=(1, 1),
                                    padding="same", kernel_initializer="he_normal"),
                keras.layers.BatchNormalization()
            ])
    def call(self, x):
        identity = x
        
        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)
        
        out = self.conv2(out)
        out = self.bn2(out)
        
        if self.downsample:
            identity = self.downsample(x)
        
        out += identity
        out = self.relu(out)
        
        return out

In [4]:
class Resnet18(Model):
    def __init__(self, num_classes, **kwargs):
        """
            num_classes: number of classes in specific classification task.
        """
        super().__init__(**kwargs)
        
        self.conv1 = layers.Conv2D(64, kernel_size=(3, 3), strides=1, padding="same")
        self.bn1 = layers.BatchNormalization()
        self.relu = layers.ReLU()
        
        self.layer1 = keras.Sequential([
            BasicBlock(64),
            BasicBlock(64)
        ])
        self.layer2 = keras.Sequential([
            BasicBlock(128, downsample=True),
            BasicBlock(128)
        ])
        self.layer3 = keras.Sequential([
            BasicBlock(256, downsample=True),
            BasicBlock(256)
        ])
        self.layer4 = keras.Sequential([
            BasicBlock(512, downsample=True),
            BasicBlock(512)
        ])
        self.avgpool = layers.GlobalAveragePooling2D()
        self.flat = layers.Flatten()
        self.fc = layers.Dense(num_classes)
        self.activate = layers.Softmax()
        
    def call(self, x):
        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)
        
        out = self.layer1(out)
        out = self.layer2(out)
        out = self.layer3(out)
        out = self.layer4(out)
        
        out = self.avgpool(out)
        out = self.flat(out)
        out = self.fc(out)
        out = self.activate(out)
        
        return out

### Create tf.data.Dataset
---
The Dataset from tf.data supports writing descriptive and efficient input pipelines.<br>
You could access more details from [tf.data.Dataset](https://tensorflow.google.cn/api_docs/python/tf/data/Dataset)

- Note<br>
You can also call keras.model.fit with Numpy array, TensorFlow tensor or data with other types as input data.<br>
But for multi-process training, the input data must be a tf.data.Dataset 

In [5]:
train_dataset = tf.data.Dataset.from_tensor_slices((train_images, train_labels))
val_dataset = tf.data.Dataset.from_tensor_slices((val_images, val_labels))
train_dataset = train_dataset.batch(64)
val_dataset = val_dataset.batch(64)
STEPS = len(train_images)/64

2022-05-19 02:43:36.089136: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


### Train
---
Use Model.fit from bigdl.nano.tf.keras for BigDl-Nano tf.keras.

This function override tf.keras.Model.fit to add more parameters.


Additional parameters:
```
        :param num_processes:  when num_processes is not None, it specifies how many sub-processes
                               to launch to run pseudo-distributed training; when num_processes is None,
                               training will run in the current process.
                               
        :param backend: Use backend 'multiprocessing', 'horovod', 'ray', defaults to None.
                        when num_processes is not None, it specifies which backend to use when
                       launching sub-processes to run psedu-distributed training; 
                       when num_processes is None, this parameter takes no effect.
```

In [8]:
single_none_model = Resnet18(10)
single_none_model.build(input_shape=(None, 32, 32, 3))

optimer = keras.optimizers.Adam(learning_rate=0.0005)
#use categorical_crossentropy since the label is one-hot encoded
single_none_model.compile(optimizer=optimer,
              loss="categorical_crossentropy",
              metrics=['accuracy'])
single_none_model.summary()

Model: "resnet18_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d_20 (Conv2D)          multiple                  1792      
                                                                 
 batch_normalization_20 (Bat  multiple                 256       
 chNormalization)                                                
                                                                 
 re_lu_9 (ReLU)              multiple                  0         
                                                                 
 sequential_7 (Sequential)   (None, 32, 32, 64)        148736    
                                                                 
 sequential_9 (Sequential)   (None, 16, 16, 128)       527488    
                                                                 
 sequential_11 (Sequential)  (None, 8, 8, 256)         2103552   
                                                        

### Single Process
---

In [13]:
start = time()
single_none_model.fit(train_dataset,
          epochs=10,
          steps_per_epoch=STEPS,
          validation_data=val_dataset)
single_none_train_time = time() - start
single_none_model.evaluate(test_images, test_labels, verbose=1)

Epoch 1/10
Epoch 2/10


tcmalloc: large alloc 1073741824 bytes == 0x555742c54000 @  0x7fe2cf049d3f 0x7fe2cf0800c0 0x7fe2cf083082 0x7fe2cf083243 0x7fe2bf6b5402 0x7fe2b3a4feb0 0x7fe2b3a700b5 0x7fe2b3a739ea 0x7fe2b3a73f69 0x7fe2b3a742d1 0x7fe2b3a68ce3 0x7fe2af12e051 0x7fe2aef8a16a 0x7fe2bb29a73f 0x7fe2baa41216 0x7fe2baa4329d 0x7fe2baa23a59 0x7fe2ba904b0d 0x7fe2ba904d7e 0x7fe2ba8ffb11 0x7fe2af12fd7c 0x7fe2babd8efd 0x7fe2b499b829 0x7fe2b499c2c1 0x7fe2bafdb2aa 0x7fe2b499927e 0x7fe2b4999d10 0x7fe2b46639b9 0x7fe2babe88bf 0x7fe2b44239f5 0x7fe2b4394d5f


Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


[1.4491106271743774, 0.7170000076293945]

### Multiple Processes
---

In [7]:
multi_default_model = Resnet18(10)
optimer = keras.optimizers.Adam(learning_rate=0.0005)
multi_default_model.compile(optimizer=optimer,
                        loss="categorical_crossentropy",
                        metrics=['accuracy'])
start = time()
multi_default_model.fit(train_dataset, 
                      epochs=10, 
                      steps_per_epoch=STEPS,
                      validation_data=val_dataset,
                      num_processes=1,
                      backend="multiprocessing")
multi_default_train_time = time() - start
multi_default_model.evaluate(test_images, test_labels, verbose=1)

2022-05-18 01:25:40.549124: W tensorflow/python/util/util.cc:368] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.


INFO:tensorflow:Assets written to: /tmp/tmpfa6yulm1/temp_model/assets


2022-05-18 01:25:47.838756: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-05-18 01:25:47.846586: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:272] Initialize GrpcChannelCache for job worker -> {0 -> localhost:52264}
2022-05-18 01:25:47.846760: I tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:427] Started server with target: grpc://localhost:52264
tcmalloc: large alloc 1073741824 bytes == 0x55fd0737e000 @  0x7f78e9ac0d3f 0x7f78e9af70c0 0x7f78e9afa082 0x7f78e9afa243 0x7f78e3a9d402 0x7f78d7e37eb0 0x7f78d7e580b5 0x7f78d7e5b9ea 0x7f78d7e5bf69 0x7f78d7e5c2d1 0x7f78d7e50ce3 0x7f78d3516051 0x7f78d337216a 0x7f78df68273f 0x7f78dee29216 0x7f78dee2b29d 0x7f78dee22a0a 0x7f78d307e8f0 0x7f78d3089496 

Epoch 1/10

2022-05-18 01:27:15.317409: W tensorflow/core/grappler/optimizers/data/auto_shard.cc:766] AUTO sharding policy will apply DATA sharding policy as it failed to apply FILE sharding policy because of the following reason: Found an unshardable source dataset: name: "TensorSliceDataset/_2"
op: "TensorSliceDataset"
input: "Placeholder/_0"
input: "Placeholder/_1"
attr {
  key: "Toutput_types"
  value {
    list {
      type: DT_FLOAT
      type: DT_FLOAT
    }
  }
}
attr {
  key: "_cardinality"
  value {
    i: 10000
  }
}
attr {
  key: "is_files"
  value {
    b: false
  }
}
attr {
  key: "metadata"
  value {
    s: "\n\024TensorSliceDataset:1"
  }
}
attr {
  key: "output_shapes"
  value {
    list {
      shape {
        dim {
          size: 32
        }
        dim {
          size: 32
        }
        dim {
          size: 3
        }
      }
      shape {
        dim {
          size: 10
        }
      }
    }
  }
}

2022-05-18 01:27:15.381875: W tensorflow/core/framework/dataset.cc:7

Epoch 2/10

2022-05-18 01:28:39.352539: W tensorflow/core/framework/dataset.cc:744] Input of GeneratorDatasetOp::Dataset will not be optimized because the dataset does not implement the AsGraphDefInternal() method needed to apply optimizations.


Epoch 3/10

2022-05-18 01:30:01.547212: W tensorflow/core/framework/dataset.cc:744] Input of GeneratorDatasetOp::Dataset will not be optimized because the dataset does not implement the AsGraphDefInternal() method needed to apply optimizations.


Epoch 4/10

2022-05-18 01:31:23.702996: W tensorflow/core/framework/dataset.cc:744] Input of GeneratorDatasetOp::Dataset will not be optimized because the dataset does not implement the AsGraphDefInternal() method needed to apply optimizations.


Epoch 5/10

2022-05-18 01:32:45.505063: W tensorflow/core/framework/dataset.cc:744] Input of GeneratorDatasetOp::Dataset will not be optimized because the dataset does not implement the AsGraphDefInternal() method needed to apply optimizations.


Epoch 6/10

2022-05-18 01:34:07.164693: W tensorflow/core/framework/dataset.cc:744] Input of GeneratorDatasetOp::Dataset will not be optimized because the dataset does not implement the AsGraphDefInternal() method needed to apply optimizations.


Epoch 7/10

2022-05-18 01:35:28.824157: W tensorflow/core/framework/dataset.cc:744] Input of GeneratorDatasetOp::Dataset will not be optimized because the dataset does not implement the AsGraphDefInternal() method needed to apply optimizations.


Epoch 8/10

2022-05-18 01:36:50.424715: W tensorflow/core/framework/dataset.cc:744] Input of GeneratorDatasetOp::Dataset will not be optimized because the dataset does not implement the AsGraphDefInternal() method needed to apply optimizations.


Epoch 9/10

2022-05-18 01:38:12.173148: W tensorflow/core/framework/dataset.cc:744] Input of GeneratorDatasetOp::Dataset will not be optimized because the dataset does not implement the AsGraphDefInternal() method needed to apply optimizations.


Epoch 10/10

2022-05-18 01:39:33.623068: W tensorflow/core/framework/dataset.cc:744] Input of GeneratorDatasetOp::Dataset will not be optimized because the dataset does not implement the AsGraphDefInternal() method needed to apply optimizations.




2022-05-18 01:39:41.306684: W tensorflow/python/util/util.cc:368] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.




[1.173243522644043, 0.7610999941825867]

### Multiple Processes with horovod
---

In [15]:
train_dataset = train_dataset.prefetch(buffer_size=tf.data.experimental.AUTOTUNE)
multi_horovod_model = Resnet18(10)
optimer = keras.optimizers.Adam(learning_rate=0.0005)
multi_horovod_model.compile(optimizer=optimer,
                            loss="categorical_crossentropy",
                            metrics=['accuracy'])
start = time()
multi_horovod_model.fit(train_dataset, 
                        epochs=10, 
                        validation_data=val_dataset,
                        steps_per_epoch=STEPS,
                        num_processes=1,
                        backend="horovod")
multi_horovod_train_time = time() - start
multi_horovod_model.evaluate(test_images, test_labels, verbose=1)



INFO:tensorflow:Assets written to: /tmp/tmp00wo9m33/temp_model/assets


[0]<stderr>:2022-05-18 02:06:31.839981: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
[0]<stderr>:To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
[0]<stderr>:tcmalloc: large alloc 1073741824 bytes == 0x55f1612c0000 @  0x7fc63a2a7d3f 0x7fc63a2de0c0 0x7fc63a2e1082 0x7fc63a2e1243 0x7fc634298402 0x7fc628632eb0 0x7fc6286530b5 0x7fc6286569ea 0x7fc628656f69 0x7fc6286572d1 0x7fc62864bce3 0x7fc623d11051 0x7fc623b6d16a 0x7fc62fe7d73f 0x7fc62f624216 0x7fc62f62629d 0x7fc62f61da0a 0x7fc6238798f0 0x7fc623884496 0x7fc623d12d7c 0x7fc62f7bbefd 0x7fc62957e829 0x7fc62957f2c1 0x7fc62fbbe2aa 0x7fc62957c27e 0x7fc62957cd10 0x7fc6292469b9 0x7fc62f7cb8bf 0x7fc6290069f5 0x7fc628f77d5f 0x7fc613f422dc
[0]<stderr>:2022-05-18 02:06:37.437377: W tensorflow/core/grappler/optimizers/data/auto_s

[0]<stdout>:Epoch 1/10
[0]<stdout>:Epoch 2/10






[0]<stderr>:2022-05-18 02:08:11.761784: W tensorflow/python/util/util.cc:368] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.




[1.5419765710830688, 0.5123000144958496]