<a href="https://colab.research.google.com/github/jay05Hawk/TensorFlow_all/blob/main/Tensorflow_1_2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Using Data pipelines

> Data may also be passed into the fit method as a tf.data.Dataset() iterator
> The from_tensor_slices() method converts the NumPy arrays into a dataset
> The batch() and shuffle() methods chained together. 

>Next, the map() method invokes a method on the input images, x, that randomly flips one in two of them across
the y-axis, effectively increasing the size of the image set

>Finally, the repeat() method means that the dataset will be re-fed from the beginning when its end is
reached (continuously)

In [None]:
import tensorflow as tf
mnist = tf.keras.datasets.mnist
(train_x,train_y), (test_x, test_y) = mnist.load_data()
train_x, test_x = train_x/255.0, test_x/255.0
epochs=10

In [None]:
batch_size = 32
buffer_size = 10000
training_dataset = tf.data.Dataset.from_tensor_slices((train_x, train_y)).batch(32).shuffle(10000)
training_dataset = training_dataset.map(lambda x, y: (tf.image.random_flip_left_right(x), y))
training_dataset = training_dataset.repeat()

In [None]:
testing_dataset = tf.data.Dataset.from_tensor_slices((test_x, test_y)).batch(batch_size).shuffle(10000)
testing_dataset = training_dataset.repeat()

#### Building the model architecture

In [None]:
#Now in the fit() function, we can pass the dataset directly in, as follows:
model5 = tf.keras.models.Sequential([
 tf.keras.layers.Flatten(),
 tf.keras.layers.Dense(512,activation=tf.nn.relu),
 tf.keras.layers.Dropout(0.2),
 tf.keras.layers.Dense(10,activation=tf.nn.softmax)
])

#### Compiling the model

In [None]:
steps_per_epoch = len(train_x)//batch_size #required becuase of the repeat() on the dataset
optimiser = tf.keras.optimizers.Adam()
model5.compile (optimizer= optimiser, loss='sparse_categorical_crossentropy', metrics = ['accuracy'])

#### Fitting the model

In [None]:
model5.fit(training_dataset, epochs=epochs, steps_per_epoch = steps_per_epoch)

#### Evaluating the model

In [None]:
model5.evaluate(testing_dataset,steps=10)

In [None]:
import datetime as dt
callbacks = [
  # Write TensorBoard logs to `./logs` directory
  tf.keras.callbacks.TensorBoard(log_dir='log/{}/'.format(dt.datetime.now().strftime("%Y-%m-%d-%H-%M-%S")))
]

In [None]:
model5.fit(training_dataset, epochs=epochs, steps_per_epoch=steps_per_epoch,
          validation_data=testing_dataset,
          validation_steps=3)

#### Evaluating

In [None]:
model5.evaluate(testing_dataset,steps=10)

## Saving and loading Keras models

>The Keras API in TensorFlow has the ability to save and restore models easily. This is done as follows, and saves the model in the current directory. Of course, a longer path may be passed here:

#### Saving a model
    
`model.save('./model_name.h5')`

>This will save the model architecture, its weights, its training state (loss, optimizer), and the state of the optimizer, so that you can carry on training the model from where you left off.


>Loading a saved model is done as follows. Note that if you have compiled your model, the load will compile your model using the saved training configuration:

#### Loding a model

`from tensorflow.keras.models import load_model
new_model = load_model('./model_name.h5')`

>It is also possible to save just the model weights and load them with this (in which case, you must build your architecture to load the weights into):

#### Saving the model weights only
    
    `model.save_weights('./model_weights.h5')`
    
>Then use the following to load it:

#### Loding the weights
    
    `model.load_weights('./model_weights.h5')`

# Keras datasets

>The following datasets are available from within Keras: boston_housing, cifar10, cifar100, fashion_mnist, imdb, mnist,and reuters.

>They are all accessed with the function.

`load_data()`  

>For example, to load the fashion_mnist dataset, use the following:

`(x_train, y_train), (x_test, y_test) = tf.keras.datasets.fashion_mnist.load_data()`

### Using NumPy arrays with datasets

In [None]:
import tensorflow as tf
import numpy as np
number_items = 11
number_list1 = np.arange(number_items)
number_list2 = np.arange(number_items,number_items*2)

#### Create datasets, using the from_tensor_slices() method

In [None]:
number_list1_dataset = tf.data.Dataset.from_tensor_slices(number_list1)

#### Create an iterator on it using the make_one_shot_iterator() method:

In [None]:
iterator = tf.compat.v1.data.make_one_shot_iterator(number_list1_dataset)

#### Using them together, with the get_next method:

In [None]:
for item in number_list1_dataset:
    number = iterator.get_next().numpy()
    print(number)


>Note that executing this code twice in the same program run will raise an error because we are using a one-shot iterator


#### It's also possible to access the data in batches() with the batch method. Note that the first argument is the number of elements to put in each batch and the second is the self-explanatory drop_remainder argument:

In [None]:
number_list1_dataset = tf.data.Dataset.from_tensor_slices(number_list1).batch(3, drop_remainder = False)
iterator = tf.compat.v1.data.make_one_shot_iterator(number_list1_dataset)
for item in number_list1_dataset:
    number = iterator.get_next().numpy()
    print(number)

### There is also a zip method, which is useful for presenting features and labels together:

In [None]:
data_set1 = [1,2,3,4,5]
data_set2 = ['a','e','i','o','u']
data_set1 = tf.data.Dataset.from_tensor_slices(data_set1)
data_set2 = tf.data.Dataset.from_tensor_slices(data_set2)
zipped_datasets = tf.data.Dataset.zip((data_set1, data_set2))
iterator = tf.compat.v1.data.make_one_shot_iterator(zipped_datasets)
for item in zipped_datasets:
    number = iterator.get_next()
    print(number)


#### We can concatenate two datasets as follows, using the concatenate method:

In [None]:
datas1 = tf.data.Dataset.from_tensor_slices([1,2,3,5,7,11,13,17])
datas2 = tf.data.Dataset.from_tensor_slices([19,23,29,31,37,41])
datas3 = datas1.concatenate(datas2)
print(datas3)
iterator = tf.compat.v1.data.make_one_shot_iterator(datas3)
for i in range(14):
    number = iterator.get_next()
    print(number)


#### We can also do away with iterators altogether, as shown here:

In [None]:
epochs=2
for e in range(epochs):
    for item in datas3:
        print(item)


### Using comma-separated value (CSV)files with datasets.

>CSV files are a very popular method of storing data. TensorFlow 2 contains flexible methods for dealing with them. 

>The main method here is tf.data.experimental.CsvDataset.

#### CSV Example 1


>With the following arguments, our dataset will consist of two items taken from each row of the
filename file, both of the float type, with the first line of the file ignored and columns 1 and 2 used
(column numbering is, of course, 0-based):


In [None]:
filename = ["./size_1000.csv"]
record_defaults = [tf.float32] * 2 # two required float columns
data_set = tf.data.experimental.CsvDataset(filename, record_defaults, header=True, select_cols=[1,2])
for item in data_set:
    print(item)

#### #CSV example 2

In [None]:
filename = "mycsvfile.txt"
record_defaults = [tf.float32, tf.constant([0.0], dtype=tf.float32), tf.int32,]
data_set = tf.data.experimental.CsvDataset(filename, record_defaults, header=False, select_cols=[1,2,3])
for item in data_set:
    print(item)

In [None]:
#### #CSV example 3


#For our final example, our dataset will consist of two required floats and a required string, where the
#CSV file has a header variable:
filename = "file1.txt"
record_defaults = [tf.float32, tf.float32, tf.string ,]
dataset = tf.data.experimental.CsvDataset(filename, record_defaults, header=False)
for item in dataset:
    print(item[0].numpy(), item[1].numpy(),item[2].numpy().decode() )
