# What is TensorFlow?

![](https://upload.wikimedia.org/wikipedia/commons/a/a4/TensorFlowLogo.png)


TensorFlow is an open source software library released in 2015 by Google to make it easier for developers to design, build, and train deep learning models. TensorFlow originated as an internal library that Google developers used to build models in-house, and we expect additional functionality to be added to the open source version as they are tested and vetted in the internal flavor. Although TensorFlow is only one of several options available to developers, we choose to use it here because of its thoughtful design and ease of use. We’ll briefly compare TensorFlow to alternatives in the next section.

At a high level, TensorFlow is a Python library that allows users to express arbitrary computation as a graph of data flows. Nodes in this graph represent mathematical operations, whereas edges represent data that is communicated from one node to another. Data in TensorFlow are represented as tensors, which are multidimensional arrays. Although this framework for thinking about computation is valuable in many different fields, TensorFlow is primarily used for deep learning in practice and research.


There are 2 versions of Tensorflow running popularly, 

   i.e.  Tensorflow 1.X    &   Tensorflow 2.X

In [None]:
##  Installation of Tensorflow 1.x

  $ conda create -n new_env python==3.6.9      #  Creating a new conda environment

  $ source activate new_env                    #  activating New environment; remove source Windows machine

  $ pip install numpy pandas matplotlib scipy seaborn scikit-learn jupyter notebook flask

  $ pip install tensorflow==1.14.0

In [None]:
#  Importing Tensorflow and getting version

import tensorflow as tf              #  'tf' is alias here 

tf.__version__                       #  This returns version of Tensorflow

In [None]:
## Assigning a Variable in Tensorflow 1.X  and  calling  it

tf.constant(value)   ->  This is used to create a Constant variable in Tensorflow, which cant be modified 
                               after defining it.

tf.placeholder(tf.dtype)  ->  This is used to create a placeholder variable with given dtype and we used to 
                                assign values in a placeholder variable in runtime.
                                   
tf.Variable(value)  ->  This is used to create a Normal variable, which can be modified or upgraded further.
              

In [None]:
      ##########   ############   Concept of Sessions  in Tensorflow 1.x   ##############    #########
    
 =>  In Tensorflow 1.x, we need to create a sessions first, which checks for the required available
       Computational resources likes as CPU, GPU, RAM etc. .
     If all those computational resources are available in your system, then only sessions allows for further 
         executions. 
            
 =>  We cant print or acces some Tensorflow variables or objects just by calling with their names.
      Calling them without sessions, always gives error.
    
      i.e.  variable = tf.constant('sudh')
            print(variable)
            
 =>  We need to write all codes for execution inside this session itself, otherwise it will not be executed.

 =>  Code to create a session and run the code ...

         >>>  with tf.session() as sess:
                  output = sess.run()
                  print (output)
         

In [None]:
#  Python program to create and call a Tensorflow variable

import tensorflow as tf

# Create TensorFlow object called Tensor
x = tf.constant(5)                          #  This is a Constant variable            
y = tf.Variable(8)                          #  This is a Normal variable

with tf.Session() as sess:
        # Run the tf.constant operation in the session
        output = sess.run(x)
        print(output)

In [None]:
## Concepts of Tensors

Tensorflow is built on top of Numpy library, so each and every object in tensorflow is a Tensor or Numpy n-d array.
 
Patterns, Pictures, Images, Texts, speeches, logo, digits, numbers, variables all are Tensors.

In [None]:
   ################    ###########   Tensors of different dimensions   ###########    #################
    
 >>>  A = tf.constant(1234)                               #  A is a 0-dimensional int32 tensor
 
 >>>  B = tf.constant([123,456,789])                      #  B is a 1-dimensional int32 tensor

 >>>  C = tf.constant([[123,456,789], [222,333,444]])     # C is a 2-dimensional int32 tensor


In [None]:
       #############    ###########   Creating a placeholder Variable    ############    ############## 
    
 =>  tf.placeholder(tf.dtype)   ->  This is used to create a Placeholder variable with given dtype.

 =>  feed_dict = {key: value}   ->  We uses feed_dict{} for assigning values to Placeholder variables in runtime.

In [None]:
#  Creating a Placeholder variable in Tensorflow
x = tf.placeholder(tf.string)

with tf.Session() as sess:
    output = sess.run(x, feed_dict={x: 'sudh'})
    print(output)

In [None]:
# Assign multiple values using this Placeholder variable.

x = tf.placeholder(tf.string)
y = tf.placeholder(tf.int32)
z = tf.placeholder(tf.float32)

with tf.Session() as sess:
    output_x = sess.run(x,feed_dict={x: 'Test String', y: 123, z: 45.67})
    output_y = sess.run(y, feed_dict={x: 'Test String', y: 123, z:45.67})
    print(output_x)
    print(output_y)

In [None]:
  ** Note - If the data passed to the `feed_dict` doesn’t match the tensor type and can’t be cast into
               the tensor type, you’ll get the error `“ValueError: invalid literal for...”`.

## TensorFlow Mathematical operations

In [None]:
 =>  tf.add(x, y)          ->  This medhod is used to add both x and y tensors.
    
 =>  tf.subtract(x, y)     ->  This method is used to subtract x and y tensors.

 =>  tf.multiply(x, y)     ->  This method is used to multiply both tensors. 
    
 ** Note **  ->  These mathematical operations should be used for 2 numbers, 2 Tensors or 1 number and 1 Tensor.

In [None]:
#  Performing mathematical operations over Tensors and Numbers

x = tf.add(5, 2)  
y = tf.add(tf.constant(5), tf.constant(4))

with tf.Session() as sess:
    output = sess.run(x)
    print(output)

In [None]:
       ############   #############   Casting the dtypes of Tensors    ############   ##############
    
 =>  Some mathematical operations requires same dtype, otherwise it raises an Error.   
    
 =>  tf.cast(Tensor, tf.dtype)    ->  This is used to cast the dtype of Tensor into a given dtype.

 e.g.  >>> tf.cast(tf.constant(5.0), tf.int32)
    

In [None]:
   #############    ############   Normal variables in Tensorflow    #############     ################
    
 =>  Normal variables are such variables, which can be modified after their creation.

 =>  tf.Variable() is used to create a Normal variable in Tensorflow.
    
 =>  init = tf.global_variables_initializer() is must to be placed before creating session.

 =>  In session, we run this 'init' class.
    
    
    e.g.  x = tf.Variable(5)

          init = tf.global_variables_initializer()
          with tf.Session() as sess:
              sess.run(init)

In [None]:
#  Python code to create a Normal variable in Tensorflow

x = tf.Variable(5)

init = tf.global_variables_initializer()
with tf.Session() as sess:
    sess.run(init)

The tf.global_variables_initializer() call returns an operation that will initialize all TensorFlow variables from the graph. You call the operation using a session to initialize all the variables as shown above. Using the tf.Variable class allows us to change the weights and bias, but an initial value needs to be chosen.

Initializing the weights with random numbers from a normal distribution is good practice. Randomizing the weights helps the model from becoming stuck in the same place every time you train it. 

The tf.truncated_normal() function returns a tensor with random values from a normal distribution whose magnitude is no more than 2 standard deviations from the mean.

Since the weights are already helping prevent the model from getting stuck, you don't need to randomize the bias. Let's use the simplest solution, setting the bias to 0.



In [None]:
 tf.truncated_normal()  -> This is used to create a Normal distributed data.

In [None]:
## TensorFlow Linear Functions

The most common operation in neural networks is calculating the linear combination of inputs, weights, and biases. As a reminder, we can write the output of the linear operation as:

![](https://d17h27t6h515a5.cloudfront.net/topher/2017/February/58a4d8b3_linear-equation/linear-equation.gif)

Here, **W** is a matrix of the weights connecting two layers. The output **y**, the input **x**, and the biases **b** are all vectors.

### Weights and Bias in TensorFlow

The goal of training a neural network is to modify weights and biases to best predict the labels. In order to use weights and bias, you'll need a Tensor that can be modified. This leaves out `tf.placeholder()` and `tf.constant()`, since those Tensors can't be modified. This is where `tf.Variable` class comes in.



In [None]:
n_rows = 120
n_cols = 5

weights = tf.Variable(tf.truncated_normal((n_features, n_labels)))
weights

In [None]:
n_labels = 5
bias = tf.Variable(tf.zeros(n_labels))

In [None]:
 =>  tf.zeros(shape=())  ->  This is used to create a Tensor with all 0's and inmm a given shape.
    
 =>  tf.ones(shape=())   ->  This is used to create a Tensor with all 1's and in a given shape.

## TensorFlow Softmax

The softmax function squashes it's inputs, typically called **logits** or **logit scores**, to be between 0 and 1 and also normalizes the outputs such that they all sum to 1. This means the output of the softmax function is equivalent to a categorical probability distribution. It's the perfect function to use as the output activation for a network predicting multiple classes.

![](https://d17h27t6h515a5.cloudfront.net/topher/2017/February/58950908_softmax-input-output/softmax-input-output.png)

We're using TensorFlow to build neural networks and, appropriately, there's a function for calculating softmax.

Easy as that! tf.nn.softmax() implements the softmax function for you. It takes in logits and returns softmax activations.



In [None]:
x = tf.nn.softmax([2.0, 1.0, 0.2])

In [None]:
#  Python code to calculate softmax function 

import tensorflow as tf


def run_2():
    output = None
    logit_data = [19,354,354,45,354,54]
    logits = tf.placeholder(tf.float32)
    
    # Calculate the softmax of the logits
    softmax = tf.nn.softmax(logits)   
    
    with tf.Session() as sess:
        # Feed in the logit data
        output = sess.run(softmax, feed_dict={logits: logit_data})

    return output

print(run_2())

## One-Hot Encoding

Transforming your labels into one-hot encoded vectors is pretty simple with scikit-learn using `LabelBinarizer`. Check it out below!

## TensorFlow Cross Entropy

In the Intro to TFLearn lesson we discussed using cross entropy as the cost function for classification with one-hot encoded labels. Again, TensorFlow has a function to do the cross entropy calculations for us.

![](https://d17h27t6h515a5.cloudfront.net/topher/2017/February/589b18f5_cross-entropy-diagram/cross-entropy-diagram.png)

To create a cross entropy function in TensorFlow, you'll need to use two new functions:

* `tf.reduce_sum()`
* `tf.log()

In [None]:
 =>  tf.reduce(array_of_values)  ->  This return an aggregated sum of all values of a container.
    
 =>  tf.log(value)               ->  This return the logarithm of given value.

In [None]:
#  Print the cross entropy using softmax_data and one_hot_encod_label.

import tensorflow as tf

softmax_data = [0.7, 0.2, 0.1]
one_hot_data = [1.0, 0.0, 0.0]

softmax = tf.placeholder(tf.float32)
one_hot = tf.placeholder(tf.float32)

# Print cross entropy from session
cross_entropy = -tf.reduce_sum(tf.multiply(one_hot, tf.log(softmax)))

with tf.Session() as session:
    output = session.run(cross_entropy, feed_dict={one_hot: one_hot_data, softmax: softmax_data})
    print(output)


# TensorFlow 2.X

<img src='images/tf2.png'>

In [2]:
    #####################   ##########   Some basic things in Tensorflow 2.x   ###########   ##################    

[Tensorflow Methods Documentation](https://www.tensorflow.org/api_docs/python/tf/)

In [None]:
        ############    ###########    Creating a Constant in Tensorflow 2.x    ###########    #############
    
 >>> constant = tf.constant(42, dtype = tf.int32)    #  This is a Constant Tensor with dtype 'int32' .

 >>> variable = tf.Variable(42)                      #  This is a  Variable Tensor

In [None]:
 =>  Calling these Tensors with their name will give its signature only.
    
      >>> constant   ->  <tf.tensor: shape=(), dtype=int32, numpy=42> 
            
 =>  This means that we have created a Constant Tensor with value '42' and 'dtype = int32' and it is get
       stored as a Numpy array.
    
 =>  For accessing the value of this Constant Tensor, we call its name with reference to numpy().

       i.e.,  >>> constant.numpy()  ->  42
    
 =>  constant.shape()  ->  This gives the shape of Tensor.

In [None]:
##  Creating a Tensor with 2-d array 

2d_tensor = tf.tensor([[4, 2], [9, 5]])
2d_tensor.numpy()

In [None]:
##  Adding operation between 2 different tensors

tensor_1 = tf.constant([[1, 2, 3], [1, 2, 3]])
tensor_2 = tf.constant([[3,4,5], [3, 4, 5]])

tensor_sum = tf.add(tensor_1 + tensor_2)
tensor_sum

In [None]:
 =>  Creating a Tensor with Normally distributed data betwwen 0 and 10

      >>>  tf.random.normal(shape=(2,2), minval=0, maxval=10, dtypes=tf.int32)
        
 =>  Creating a Tensor with Uniformly distributed data betwwen 0 and 10
     
      >>>  tf.random.uniform(shape = (2,2), minval, maxval, dtypes = tf.int32)


In [None]:
 ** NOTE  =>  In Tensorflow 2.x, int32 is default dtype for Integers and tf.float32 is default dtype for Floats .

In [None]:
   #############   ###########   Re-assigning a Normal Variable in tensorflow 2.x   ###########   #############
    
 => variable.assign(value)  ->  This is used to re-assign some values to Variables.

 =>  assert() is also used for assigning values to Variables.

In [None]:
          ###################    ################    Reshaping a Tensor    ################    ##################
    
 =>  Tensor can be reshaped & retain the same values which is requird for constructing the Neural networks.
    
 =>  tf.reshape(tensor, shape=[x,y])  ->  This method is used to reshape the Tensor with given shape.

 ** Note **  ->  As shape,if one component is '-1', then it uderstand the shape automatically.

In [None]:
         ##################     ###############   Rank of a Tensor    ################     ##################
    
 =>  The rank of a tensor is not the same as the rank of a matrix. 

 =>  The rank of a tensor is the number of indices required to uniquely select each element of the tensor. 
       Rank is also known as "order", "degree", or "ndims." .
        
 >>>  tf.rank(tensor)  ->  This returns the Rank of Tensor.

In [None]:
      #############     ##############    Casting a Tensor to a Numpy Array    ##############     ################
    
   >>>  tensor.numpy()  ->  This method is used to cast a Tensor to a Numpy Array.

In [None]:
##  Some mathematical operations over Tensors

  >>> tf.square(Tensor)  ->  This method calculates the square root of tensor.
    
  >>> tf.exp(Tensor)   ->  This method calculates exponential of tensor.


In [None]:
##  Elementwose operatins over Tensors

 >>>  Tensor [+, -, *, /] Tensor  ->  This performs addition, subtraction like operations over two given Tensors.

##  Broadcasting of Tensors

 >>>  tensor * 4

In [None]:
 >>>  tf.transpose(Tensor)  ->  This gives the Transpose of given tensor.
    
 >>>  tf.matmul(Tensor_x, Tensor_y)  ->  This performs matrix multiplication over 2 given tensors.
                                           Both tensors should be compatible for Multiplication.

In [None]:
##  Casting a tensor

 >>> tf.cast(Tensor, dtype)   ->   This casts the given Tensor into specified dtype.

In [None]:
##  Ragged Tensors

 =>  Ragged tensors are such type of Tensors, which have slices of different lengths..
    
 =>  ragged Tensors are Nested tensors having some Tensors of different different sizes.
    
    e.x.  ragged = tf.ragged.constant([[9,7,4,3], [], [11, 12, 8], [3], [7, 6, 5, 4])

In [None]:
##  Squared Differences of Tensors

 >>> tf.math.squared_difference(x,y)  ->  This method returns the Squared difference of a;ll items of Tensor named X.
    

In [None]:
#  Calculation of squared differences

x = [1. 2, 3, 4, 5]; y = 8

sqrt_diff = tf.math.squared_difference(x,y)
sqrt_diff

 =>  (1, 4, 9, 16, 25)


In [None]:
## Getting reduced_mean of Tensors

 >>> tf.reduce_mean(input_tensor = )  ->  This method return the reduced mean of given Tensor.


In [None]:
 >>>  tf.argmax(Tensor)  ->  This return the Maxm item of Tensor.
 >>>  tf.argmin(Tensor)  ->  This return the Minm item of Tensor

# Using Data pipelines

> Data may also be passed into the fit method as a tf.data.Dataset() iterator
> The from_tensor_slices() method converts the NumPy arrays into a dataset
> The batch() and shuffle() methods chained together. 

>Next, the map() method invokes a method on the input images, x, that randomly flips one in two of them across
the y-axis, effectively increasing the size of the image set

>Finally, the repeat() method means that the dataset will be re-fed from the beginning when its end is
reached (continuously)

In [1]:
import tensorflow as tf
mnist = tf.keras.datasets.mnist
(train_x,train_y), (test_x, test_y) = mnist.load_data()
train_x, test_x = train_x/255.0, test_x/255.0
epochs=10

In [3]:
batch_size = 32
buffer_size = 10000
training_dataset = tf.data.Dataset.from_tensor_slices((train_x, train_y)).batch(32).shuffle(10000)
training_dataset = training_dataset.map(lambda x, y: (tf.image.random_flip_left_right(x), y))
training_dataset = training_dataset.repeat()

In [5]:
testing_dataset = tf.data.Dataset.from_tensor_slices((test_x, test_y)).batch(batch_size).shuffle(10000)
testing_dataset = training_dataset.repeat()

#### Building the model architecture

In [6]:
#Now in the fit() function, we can pass the dataset directly in, as follows:
model5 = tf.keras.models.Sequential([
 tf.keras.layers.Flatten(),
 tf.keras.layers.Dense(512,activation=tf.nn.relu),
 tf.keras.layers.Dropout(0.2),
 tf.keras.layers.Dense(10,activation=tf.nn.softmax)
])

#### Compiling the model

In [7]:
steps_per_epoch = len(train_x)//batch_size #required becuase of the repeat() on the dataset
optimiser = tf.keras.optimizers.Adam()
model5.compile (optimizer= optimiser, loss='sparse_categorical_crossentropy', metrics = ['accuracy'])

#### Fitting the model

In [8]:
model5.fit(training_dataset, epochs=epochs, steps_per_epoch = steps_per_epoch)

Train for 1875 steps
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0x171b07b5888>

#### Evaluating the model

In [9]:
model5.evaluate(testing_dataset,steps=10)



[0.030064156164735324, 0.99375]

In [10]:
import datetime as dt
callbacks = [
  # Write TensorBoard logs to `./logs` directory
  tf.keras.callbacks.TensorBoard(log_dir='log/{}/'.format(dt.datetime.now().strftime("%Y-%m-%d-%H-%M-%S")))
]

In [11]:
model5.fit(training_dataset, epochs=epochs, steps_per_epoch=steps_per_epoch,
          validation_data=testing_dataset,
          validation_steps=3)

Train for 1875 steps, validate for 3 steps
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0x171de9e6388>

#### Evaluating

In [12]:
model5.evaluate(testing_dataset,steps=10)



[0.015026384776138001, 0.99375]

## Saving and loading Keras models

>The Keras API in TensorFlow has the ability to save and restore models easily. This is done as follows, and saves the model in the current directory. Of course, a longer path may be passed here:

#### Saving a model
    
`model.save('./model_name.h5')`

>This will save the model architecture, its weights, its training state (loss, optimizer), and the state of the optimizer, so that you can carry on training the model from where you left off.


>Loading a saved model is done as follows. Note that if you have compiled your model, the load will compile your model using the saved training configuration:

#### Loding a model

`from tensorflow.keras.models import load_model
new_model = load_model('./model_name.h5')`

>It is also possible to save just the model weights and load them with this (in which case, you must build your architecture to load the weights into):

#### Saving the model weights only
    
    `model.save_weights('./model_weights.h5')`
    
>Then use the following to load it:

#### Loding the weights
    
    `model.load_weights('./model_weights.h5')`

# Keras datasets

>The following datasets are available from within Keras: boston_housing, cifar10, cifar100, fashion_mnist, imdb, mnist,and reuters.

>They are all accessed with the function.

`load_data()`  

>For example, to load the fashion_mnist dataset, use the following:

`(x_train, y_train), (x_test, y_test) = tf.keras.datasets.fashion_mnist.load_data()`

![title](img/dataset.png)

### Using NumPy arrays with datasets

In [13]:
import tensorflow as tf
import numpy as np
number_items = 11
number_list1 = np.arange(number_items)
number_list2 = np.arange(number_items,number_items*2)

#### Create datasets, using the from_tensor_slices() method

In [14]:
number_list1_dataset = tf.data.Dataset.from_tensor_slices(number_list1)

#### Create an iterator on it using the make_one_shot_iterator() method:

In [15]:
iterator = tf.compat.v1.data.make_one_shot_iterator(number_list1_dataset)

#### Using them together, with the get_next method:

In [16]:
for item in number_list1_dataset:
    number = iterator.get_next().numpy()
    print(number)


0
1
2
3
4
5
6
7
8
9
10


>Note that executing this code twice in the same program run will raise an error because we are using a one-shot iterator

#### It's also possible to access the data in batches() with the batch method. Note that the first argument is the number of elements to put in each batch and the second is the self-explanatory drop_remainder argument:

In [18]:
number_list1_dataset = tf.data.Dataset.from_tensor_slices(number_list1).batch(3, drop_remainder = False)
iterator = tf.compat.v1.data.make_one_shot_iterator(number_list1_dataset)
for item in number_list1_dataset:
    number = iterator.get_next().numpy()
    print(number)

[0 1 2]
[3 4 5]
[6 7 8]
[ 9 10]


### There is also a zip method, which is useful for presenting features and labels together:

In [19]:
data_set1 = [1,2,3,4,5]
data_set2 = ['a','e','i','o','u']
data_set1 = tf.data.Dataset.from_tensor_slices(data_set1)
data_set2 = tf.data.Dataset.from_tensor_slices(data_set2)
zipped_datasets = tf.data.Dataset.zip((data_set1, data_set2))
iterator = tf.compat.v1.data.make_one_shot_iterator(zipped_datasets)
for item in zipped_datasets:
    number = iterator.get_next()
    print(number)


(<tf.Tensor: shape=(), dtype=int32, numpy=1>, <tf.Tensor: shape=(), dtype=string, numpy=b'a'>)
(<tf.Tensor: shape=(), dtype=int32, numpy=2>, <tf.Tensor: shape=(), dtype=string, numpy=b'e'>)
(<tf.Tensor: shape=(), dtype=int32, numpy=3>, <tf.Tensor: shape=(), dtype=string, numpy=b'i'>)
(<tf.Tensor: shape=(), dtype=int32, numpy=4>, <tf.Tensor: shape=(), dtype=string, numpy=b'o'>)
(<tf.Tensor: shape=(), dtype=int32, numpy=5>, <tf.Tensor: shape=(), dtype=string, numpy=b'u'>)


#### We can concatenate two datasets as follows, using the concatenate method:

In [21]:
datas1 = tf.data.Dataset.from_tensor_slices([1,2,3,5,7,11,13,17])
datas2 = tf.data.Dataset.from_tensor_slices([19,23,29,31,37,41])
datas3 = datas1.concatenate(datas2)
print(datas3)
iterator = tf.compat.v1.data.make_one_shot_iterator(datas3)
for i in range(14):
    number = iterator.get_next()
    print(number)


<ConcatenateDataset shapes: (), types: tf.int32>
tf.Tensor(1, shape=(), dtype=int32)
tf.Tensor(2, shape=(), dtype=int32)
tf.Tensor(3, shape=(), dtype=int32)
tf.Tensor(5, shape=(), dtype=int32)
tf.Tensor(7, shape=(), dtype=int32)
tf.Tensor(11, shape=(), dtype=int32)
tf.Tensor(13, shape=(), dtype=int32)
tf.Tensor(17, shape=(), dtype=int32)
tf.Tensor(19, shape=(), dtype=int32)
tf.Tensor(23, shape=(), dtype=int32)
tf.Tensor(29, shape=(), dtype=int32)
tf.Tensor(31, shape=(), dtype=int32)
tf.Tensor(37, shape=(), dtype=int32)
tf.Tensor(41, shape=(), dtype=int32)


#### We can also do away with iterators altogether, as shown here:

In [22]:
epochs=2
for e in range(epochs):
    for item in datas3:
        print(item)


tf.Tensor(1, shape=(), dtype=int32)
tf.Tensor(2, shape=(), dtype=int32)
tf.Tensor(3, shape=(), dtype=int32)
tf.Tensor(5, shape=(), dtype=int32)
tf.Tensor(7, shape=(), dtype=int32)
tf.Tensor(11, shape=(), dtype=int32)
tf.Tensor(13, shape=(), dtype=int32)
tf.Tensor(17, shape=(), dtype=int32)
tf.Tensor(19, shape=(), dtype=int32)
tf.Tensor(23, shape=(), dtype=int32)
tf.Tensor(29, shape=(), dtype=int32)
tf.Tensor(31, shape=(), dtype=int32)
tf.Tensor(37, shape=(), dtype=int32)
tf.Tensor(41, shape=(), dtype=int32)
tf.Tensor(1, shape=(), dtype=int32)
tf.Tensor(2, shape=(), dtype=int32)
tf.Tensor(3, shape=(), dtype=int32)
tf.Tensor(5, shape=(), dtype=int32)
tf.Tensor(7, shape=(), dtype=int32)
tf.Tensor(11, shape=(), dtype=int32)
tf.Tensor(13, shape=(), dtype=int32)
tf.Tensor(17, shape=(), dtype=int32)
tf.Tensor(19, shape=(), dtype=int32)
tf.Tensor(23, shape=(), dtype=int32)
tf.Tensor(29, shape=(), dtype=int32)
tf.Tensor(31, shape=(), dtype=int32)
tf.Tensor(37, shape=(), dtype=int32)
tf.Tensor(4

### Using comma-separated value (CSV)files with datasets.

>CSV files are a very popular method of storing data. TensorFlow 2 contains flexible methods for dealing with them. 

>The main method here is tf.data.experimental.CsvDataset.

#### CSV Example 1

>With the following arguments, our dataset will consist of two items taken from each row of the
filename file, both of the float type, with the first line of the file ignored and columns 1 and 2 used
(column numbering is, of course, 0-based):


In [24]:
filename = ["./size_1000.csv"]
record_defaults = [tf.float32] * 2 # two required float columns
data_set = tf.data.experimental.CsvDataset(filename, record_defaults, header=True, select_cols=[1,2])
for item in data_set:
    print(item)

#### #CSV example 2

In [None]:

In this example, and with the following arguments, our dataset will consist of one required float,
one optional float with a default value of 0.0, and an int, where there is no header in the CSV file and
only columns 1, 2, and 3 are imported:
#file Chapter_2.ipynb


In [22]:
filename = "mycsvfile.txt"
record_defaults = [tf.float32, tf.constant([0.0], dtype=tf.float32), tf.int32,]
data_set = tf.data.experimental.CsvDataset(filename, record_defaults, header=False, select_cols=[1,2,3])
for item in data_set:
    print(item)

(<tf.Tensor: shape=(), dtype=float32, numpy=428000.0>, <tf.Tensor: shape=(), dtype=float32, numpy=555.0>, <tf.Tensor: shape=(), dtype=int32, numpy=42>)
(<tf.Tensor: shape=(), dtype=float32, numpy=-5.3>, <tf.Tensor: shape=(), dtype=float32, numpy=0.0>, <tf.Tensor: shape=(), dtype=int32, numpy=69>)


#### #CSV example 3

In [24]:
#For our final example, our dataset will consist of two required floats and a required string, where the
#CSV file has a header variable:
filename = "file1.txt"
record_defaults = [tf.float32, tf.float32, tf.string ,]
dataset = tf.data.experimental.CsvDataset(filename, record_defaults, header=False)
for item in dataset:
    print(item[0].numpy(), item[1].numpy(),item[2].numpy().decode() )


12.6 23.4  Abc.co.uk
98.7 56.8  Xyz.com
34.2 68.1  Pqr.net


## TFRecords

>TFRecord format is a binary file format. For large files, it is a good choice because binary files take up less disc space, take less time to copy, and can be read very efficiently from the disc. All this can have a significant effect on the efficiency of your data pipeline and thus, the training time of your model. The format is also optimized in a
variety of ways for use with TensorFlow. It is a little complex because data has to be converted into
the binary format prior to storage and decoded when read back.

#### #TFRecord example 1


>A TFRecord file is a sequence of binary strings, its structure must be specified prior to
saving so that it can be properly written and subsequently read back.

>TensorFlow has two structures for this, 

`tf.train.Example and tf.train.SequenceExample. `

>We have to store each sample of your data in one of these structures, then serialize it, and use `tf.python_io.TFRecordWriter` to save it to disk.

>In the next example, 
the  data, is first converted to the binary format and then saved to disc.

>A feature is a dictionary containing the data that is passed to tf.train.Example prior to serialization and saving the data.

In [25]:
import tensorflow as tf
import numpy as np
data = np.array([10.,11.,12.,13.,14.,15.])
def npy_to_tfrecords(fname,data):
    writer = tf.io.TFRecordWriter(fname)
    feature={}
    feature['data'] = tf.train.Feature(float_list=tf.train.FloatList(value=data))
    example = tf.train.Example(features=tf.train.Features(feature=feature))
    serialized = example.SerializeToString()
    writer.write(serialized)
    writer.close()
npy_to_tfrecords("./myfile.tfrecords",data)

>The code to read the record back is as follows. 

>A parse_function function is constructed that decodes the dataset read back from the file. This requires a dictionary (keys_to_features) with the same name and structure as the saved data:

In [26]:
data_set = tf.data.TFRecordDataset("./myfile.tfrecords")
def parse_function(example_proto):
    keys_to_features = {'data':tf.io.FixedLenSequenceFeature([], dtype = tf.float32, allow_missing = True) }
    parsed_features = tf.io.parse_single_example(serialized=example_proto, features=keys_to_features)
    return parsed_features['data']
data_set = data_set.map(parse_function)
iterator = tf.compat.v1.data.make_one_shot_iterator(data_set)
# array is retrieved as one item
item = iterator.get_next()
print(item)
print(item.numpy())
print(item[2].numpy())

tf.Tensor([10. 11. 12. 13. 14. 15.], shape=(6,), dtype=float32)
[10. 11. 12. 13. 14. 15.]
12.0


#### TFRecord example 2

In [25]:
filename = './students.tfrecords'
dataset = {
'ID': 61553,
'Name': ['Jones', 'Felicity'],
'Scores': [45.6, 97.2]}

>Using this, we can construct a tf.train.Example class, again using the `Feature()` method. Note how we have to encode our string:


In [27]:
ID = tf.train.Feature(int64_list=tf.train.Int64List(value=[dataset['ID']]))
Name = tf.train.Feature(bytes_list=tf.train.BytesList(value=[n.encode('utf-8') for n in dataset['Name']]))
Scores = tf.train.Feature(float_list=tf.train.FloatList(value=dataset['Scores']))
example = tf.train.Example(features=tf.train.Features(feature={'ID': ID, 'Name': Name, 'Scores': Scores }))

#### #Serializing and writing this record to disc is the same as TFRecord example 1:

In [29]:
writer_rec = tf.io.TFRecordWriter(filename)
writer_rec.write(example.SerializeToString())
writer_rec.close()

#### To read this back, we just need to construct our parse_function function to reflect the structure of the record:

In [38]:
data_set = tf.data.TFRecordDataset("./students.tfrecords")
def parse_function(example_proto):
    keys_to_features = {'ID':tf.io.FixedLenFeature([], dtype = tf.int64),
    'Name':tf.io.VarLenFeature(dtype = tf.string),
    'Scores':tf.io.VarLenFeature(dtype = tf.float32)}
    parsed_features = tf.io.parse_single_example(serialized=example_proto, features=keys_to_features)
    return parsed_features["ID"], parsed_features["Name"],parsed_features["Scores"]

#### Parsing the data

In [40]:
dataset = data_set.map(parse_function)
iterator = tf.compat.v1.data.make_one_shot_iterator(dataset)
items = iterator.get_next()

### Record is retrieved as one item

In [41]:
print(items)

(<tf.Tensor: shape=(), dtype=int64, numpy=61553>, <tensorflow.python.framework.sparse_tensor.SparseTensor object at 0x00000171A13D64C8>, <tensorflow.python.framework.sparse_tensor.SparseTensor object at 0x00000171AC2E0BC8>)


>Now we can extract our data from item (note that the string must be decoded (from bytes) where the default for our Python 3 is utf8). Note also that the string and
the array of floats are returned as sparse arrays, and to extract them from the record, we use the sparse array value method:

In [42]:
print("ID: ",item[0].numpy())
name = item[1].values.numpy()
name1= name[0].decode()
name2 = name[1].decode('utf8')
print("Name:",name1,",",name2)
print("Scores: ",item[2].values.numpy())

ID:  61553
Name: Jones , Felicity
Scores:  [45.6 97.2]


### One-hot Encoding

>One-hot encoding (OHE) is where a tensor is constructed from the data labels with a 1 in each of
the elements corresponding to a label's value, and 0 everywhere else; that is, one of the bits in the
tensor is hot (1).

#### One-hot Encoding Example 1

>In this example, we are converting a decimal value of 7 to a one-hot encoded value of 0000000100 using

`the tf.one_hot() method:`

In [43]:
z = 7
z_train_ohe = tf.one_hot(z, depth=10).numpy()
print(z, "is ",z_train_ohe,"when one-hot encoded with a depth of 10")

7 is  [0. 0. 0. 0. 0. 0. 0. 1. 0. 0.] when one-hot encoded with a depth of 10


#### One-hot Encoding Example 2

>Using the fashion MNIST dataset.

>The original labels are integers from 0 to 9, so, for example, a label of 5 becomes 0000010000 when onehot encoded, but note the difference between the index and the label stored at that index:

In [66]:
import tensorflow as tf
from tensorflow.python.keras.datasets import fashion_mnist

width, height, = 28,28
# total classes
n_classes = 10

#### loading the dataset

In [None]:
(x_train, y_train), (x_test, y_test) = fashion_mnist.load_data()

#### Split feature training set into training and validation sets

In [None]:
split = 50000
(y_train, y_valid) = y_train[:split], y_train[split:]

#### one-hot encode the labels using TensorFlow then convert back to numpy for display

In [68]:
y_train_ohe = tf.one_hot(y_train, depth=n_classes).numpy()
y_valid_ohe = tf.one_hot(y_valid, depth=n_classes).numpy()
y_test_ohe = tf.one_hot(y_test, depth=n_classes).numpy()

# show difference between the original label and a one-hot-encoded label
i=8
print(y_train[i]) # 'ordinary' number value of label at index i=8 is 5
# note the difference between the index of 8 and the label at that index which is 5
print(y_train_ohe[i]) 

5
[0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]
