<a href="https://colab.research.google.com/github/nigoda/machine_learning/blob/main/32_TensorFlow_customization.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **TensorFlow Customization**
So far in this, we used methods provided by build in TF APIs like tf.keras and tf.Estimator. While these constructs are sufficient to start any AI project, there could be situations where you may have to implement custom models, loss function or metrics. Tensorflow 2.0 provides support for extending its functionality. In this, we will learn how to customize TF 2.0 functionality.

This is an introductory TensorFlow that shows how to:
*  Import the required package
*  Create and use tensors
*  Use GPU acceleration
*  Demonstrate `tf.data.Dataset`

## **Import TensorFlow**
To get started, import the `tensorflow` module. As of TensorFlow 2, eager execution is turned on by default. This enables a more interactive fronted to Tesorflow, the details of which we will discuss much later

In [None]:
import tensorflow as tf

## **Tensor**
A Tensor is a multi-dimensional array. Similar to NumPy `ndarray` objects, `tf.tensor` objects have a data type and a shape. Additionally, `tf.Tensor`s can reside in accelerator memory (like a GPU). TensorFlow offers a rich library of operations ([tf.add ](https://www.tensorflow.org/api_docs/python/tf/math/add), [tf.matmul](https://www.tensorflow.org/api_docs/python/tf/linalg/matmul), [tf.linalg.inv](https://www.tensorflow.org/api_docs/python/tf/linalg/inv) etc.) that consume and produce `tf.Tensor`s. These operations automatically convert native Python types, for example:

In [None]:
print(tf.add(1, 2))
print(tf.add([1, 2], [3, 4]))
print(tf.square(5))
print(tf.reduce_sum([1, 2, 3]))

# Operatior overloading is also supported
print(tf.square(2) + tf.square(3))

tf.Tensor(3, shape=(), dtype=int32)
tf.Tensor([4 6], shape=(2,), dtype=int32)
tf.Tensor(25, shape=(), dtype=int32)
tf.Tensor(6, shape=(), dtype=int32)
tf.Tensor(13, shape=(), dtype=int32)


Each `tf.Tensor` has a shape and a datatype.

In [None]:
x = tf.matmul([[1]], [[2, 3]])
print(x)
print(x.shape)
print(x.dtype)

tf.Tensor([[2 3]], shape=(1, 2), dtype=int32)
(1, 2)
<dtype: 'int32'>


The most obvious different between NumPy array and `tf.Tensor`s are:
1. Tensors can be backed by accelerator memory(like GPU, TPU).
2.  Tensors are immutable.

## NumPy Compatibility
Converting between a TensorFlow [`tf.Tensor`](https://www.tensorflow.org/api_docs/python/tf/Tensor)s and a Numpy `ndarray` is easy:

*  TensorFlow operations automatically convert NumPy ndarray to Tensors.
*  NumPy operations automatically convert Tensors to NumPy ndarrays.

Tensors are expilcitly convert to NumPy ndarrays using their `.numpy` method. These conversions are typically cheap since the array and `tf.tensor` share the underlying memory representation, if possible. However, sharing the underlying representation isn't always possible since the `tf.Tensor` may be hosted in GPU memory while NumPy arrays are always backed by host memory, and the conversion involves a copy from GPU TO host memory.

In [None]:
import numpy as np

ndarray = np.ones([3, 3])

print("TensorFlow operations convert numpy arrays to tensors automatically")
tensor = tf.multiply(ndarray, 42)
print(tensor)

print("And NumPy operations convert Tensors to numpy arrays automatically")
print(np.add(tensor, 1))

print("The .numpy() method explicitly converts a Tensor to a numpy array")
print(tensor.numpy())

TensorFlow operations convert numpy arrays to tensors automatically
tf.Tensor(
[[42. 42. 42.]
 [42. 42. 42.]
 [42. 42. 42.]], shape=(3, 3), dtype=float64)
And NumPy operations convert Tensors to numpy arrays automatically
[[43. 43. 43.]
 [43. 43. 43.]
 [43. 43. 43.]]
The .numpy() method explicitly converts a Tensor to a numpy array
[[42. 42. 42.]
 [42. 42. 42.]
 [42. 42. 42.]]


# **GPU acceleration**

Many TensorFlow operations are accelerated using the GPU for computation. Without any annotations, TensorFlow automatically decides whether to use the GPU or CPU for an operation-copying the tensor between CPU and GPU memory, if necessary. Tensors produces by an operation are typically backed by the memory of the device on which the operation executed, for example:


In [None]:
x = tf.random.uniform([3, 3])

print("Is there a GPU available: ")
print(tf.config.list_physical_devices("GPU"))

print("Is the Tensor on GPU #0: ")
print(x.device.endswith('GPU:0'))

Is there a GPU available: 
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
Is the Tensor on GPU #0: 
True


## Device Name

The `Tensor.device` property provides a fully qualified string name of the device hosting the contents of the tensor. This name encodes many details, such as an identifier of the network address of the host on which this program is executing and the device with that host. This is required for distributed execution of a TensorFlow program. The string endsnwith `GPU:<N>` if the tensor is placed on the `N`-th GPU on the host.

## Explicit Deice Placement
In TensorFlow, *placement* refers to how individual operations are assigned (placed on) a device for execution. As mentioned, when there is no explicit guidance provided, TensorFlow automatically decides which device to execute an operation and copies tensors to specific devices using the `tf.device` context manager, for example:

In [None]:
import time

def time_matmul(x):
  start = time.time()
  for loop in range(10):
    tf.matmul(x, x)

  result = time.time()-start

  print("10 loops: {:0.2f}ms".format(1000*result))

# Force execution on CPU
print("On CPU:")
with tf.device("CPU:0"):
  x = tf.random.uniform([1000, 1000])
  assert x.device.endswith("CPU:0")
  time_matmul(x)

# Force execution on GPU #0 if available
if tf.config.list_physical_devices("GPU"):
  print("On GPU:")
  with tf.device("GPU:0"): # Or GPU:1 for the 2nd GPU, GPU2 for the 3rd etc.
    x = tf.random.uniform([1000, 1000])
    assert x.device.endswith("GPU:0")
    time_matmul(x)

On CPU:
10 loops: 192.01ms
On GPU:
10 loops: 2.46ms


## Datasets
This section uses the `tf.data.Dataset API` to build a pipeline for feeding data to your model. The `[tf.data.Dataset](https://www.tensorflow.org/api_docs/python/tf/data/Dataset)` API is used to build performant, complex input pipeline from simple, reusable pieces that will feed your model's training or evaluation loops.

## Create a sourse **Dataset**
Create a *source* dataset  using one of the factory functions like `Dataset.from_tensors, Dataset.from_tensor_slices`, or using objects that read from files like `TextLineDataset` or `TFRecordDataset`. See the *TensorFlow Dataset guide* for more information

In [None]:
ds_tensors = tf.data.Dataset.from_tensor_slices([1, 2, 3, 4, 5, 6])

# Create a CSV file
import tempfile
_, filename = tempfile.mkstemp()

with open(filename, 'w') as f:
  f.write("""Line 1
  Line 2
  Line 3
    """)
  
ds_file = tf.data.TextLineDataset(filename)

## Apply transformations
Use the transformation functions like `map`, `batch`, and 'shuffle' to apply transformations to dataset records.

In [None]:
ds_tensors = ds_tensors.map(tf.square).shuffle(2).batch(2)

ds_file = ds_file.batch(2)

## Iterate
`tf.data.Dataset` objects support iteration to loop over record:

In [None]:
print('Elements of ds_tensor:')
for x in ds_tensors:
  print(x)

print('\nElement in ds_file:')
for x in ds_file:
  print(x)

Elements of ds_tensor:
tf.Tensor([4 9], shape=(2,), dtype=int32)
tf.Tensor([16 25], shape=(2,), dtype=int32)
tf.Tensor([36  1], shape=(2,), dtype=int32)

Element in ds_file:
tf.Tensor([b'Line 1' b'  Line 2'], shape=(2,), dtype=string)
tf.Tensor([b'  Line 3' b'    '], shape=(2,), dtype=string)
