# 5-Flower Dataset

`Machine Perception`
- Asking machine learning model to learn to perceive what's in the image

`Computer Vision`
- Type of perception is analogous to human sight


[flower-dataset](https://www.tensorflow.org/datasets/catalog/tf_flowers)
- `'Daisy', 'Roses', 'Dandelions', 'Sunflowers' & 'Tuplips'`



**should not be used as a template, but as an example**

1. Quantity
  - to train ML models from scratch, you'll need to collect millions of images
  - There are alternative approaches that work with fewer images, but you should attempt to collect the largest dataset that is practical
2. Data Format
  - Storing the images as individual JPEG files is very inefficient because most of your model training time will be spent waiting for data to be read.
  - Better to use TensorFlow Record Format
3. Content
  - The dataset itself consists of found data - images that were not explicitly collected for the classification task.
  - Collect data more puposefully
4. Labelling


## Confirming if GPU is being used

In [2]:
# import tensorflow as tf
# print(tf.version.VERSION)
# device_name = tf.test.gpu_device_name()

# if device_name != '/device:GPU:0':
#   raise SystemError('GPU device not found')
# print('Found GPU at: {}'.format(device_name))

## Examining the Images

In [3]:
!gsutil cat gs://practical-ml-vision-book/flowers_5_jpeg/flower_photos/train_set.csv | head -5

'gsutil' is not recognized as an internal or external command,
operable program or batch file.


## Reading Image Data

1. `img = tf.io.read_file(filename)` Used to read the file and convert it into pixel data - also called decoding
2. `img = tf.image.decode_jpeg(img, channels=IMG_CHANNELS)` specifying the number of color channels (red, green, blue) from the JPEG
3. `img = tf.image.convert_image_dtype(img, tf.float32)` pixel values consist of RGB that are of type uint and in the range of [0, 255], this converts them into a float scale so they lie in the range [0, 1]
4. `tf.image.resize(img, reshape_dims)` resizing to a desired shape



In [4]:
import matplotlib.pylab as plt
import numpy as np
import tensorflow as tf

IMG_HEIGHT = 224
IMG_WIDTH  = 224
IMG_CHANNELS = 3

In [5]:
def read_and_decode(filename, reshape_dims):

  # Read the file - convert them into pixel data, also called decoding
  img = tf.io.read_file(filename)

  # Convert the compressed string to a 3D uint8 tensor - specifying the number of 
  # color channels (red, green, blue) from JPEG
  img = tf.image.decode_jpeg(img, channels=IMG_CHANNELS)

  # Use `convert_image_dtype` to convert to floats in the [0, 1] range
  img = tf.image.convert_image_dtype(img, tf.float32)

  # Resize the image to the desired size
  return tf.image.resize(img, reshape_dims) 

In [6]:
CLASS_NAMES = [item.numpy().decode('utf-8')
              for item in tf.strings.regex_replace(tf.io.gfile.glob("gs://practical-ml-vision-book/flowers_5_jpeg/flower_photos/*"),
                 "gs://practical-ml-vision-book/flowers_5_jpeg/flower_photos/", "")]

# for item in CLASS_NAMES:
#   if item.find(".") == -1:
#     print(item)

CLASS_NAMES = [item for item in CLASS_NAMES if item.find(".") == -1]
print("These are the available class: ", CLASS_NAMES)

UnimplementedError: File system scheme 'gs' not implemented (file: 'gs://practical-ml-vision-book/flowers_5_jpeg/flower_photos/*')

## What's a Tensor

1D array -> vector

2D array -> matrix

*tensor* -> an array with any number of dimensions 

A matrix with 12 rows and 18 columns is said to have a *shape* of (12, 18) and a *rank* of 2

`x = np.array([2.0, 3.0, 1.0, 0.0])`

`x5d = np.zeros(shape=(4, 3, 7, 8, 3))`

For obtaining hardware acceleration use TensorFlow

`tx = tx.convert_to_tensor(x, dtype=tf.float32)`

converting back to numpy array `x = tx.numpy()`

Mathematiclly numpy anf TensorFlow are the same
  - numpy arithmetic is done on the CPU
  - TensorFlow code runs on GPU 

`x = x * 0.3` is less efficient than `tx = tx*0.3`

**Efficient to vectorize the code so that you can carry out a single in-place tensor operation instead of a bunch of tiny scalar operations**

## Visualizing Image Data

In [None]:
def show_image(filename):
  img = read_and_decode(filename, [IMG_HEIGHT, IMG_WIDTH])
  plt.imshow(img.numpy())

show_image("gs://practical-ml-vision-book/flowers_5_jpeg/flower_photos/daisy/754296579_30a9ae018c_n.jpg")

In [None]:
tuplis = tf.io.gfile.glob("gs://practical-ml-vision-book/flowers_5_jpeg/flower_photos/tulips/*.jpg")
f, ax = plt.subplots(1, 5, figsize=(15, 15))

for idx, filename in enumerate(tuplis[:5]):
  print(filename)

  img = read_and_decode(filename, [IMG_HEIGHT, IMG_WIDTH])
  ax[idx].imshow((img.numpy()))
  ax[idx].axis('off')

In [None]:
tf.strings.split(tf.strings.regex_replace("gs://practical-ml-vision-book/flowers_5_jpeg/flower_photos/tulips/10094731133_94a942463c.jpg",
    "gs://practical-ml-vision-book/flowers_5_jpeg/flower_photos/", ""),'/')[0]

In [None]:
f, ax = plt.subplots(1, 5, figsize=(15, 15))

for idx, filename in enumerate([
  "gs://practical-ml-vision-book/flowers_5_jpeg/flower_photos/daisy/754296579_30a9ae018c_n.jpg",
  "gs://practical-ml-vision-book/flowers_5_jpeg/flower_photos/dandelion/3554992110_81d8c9b0bd_m.jpg",
  "gs://practical-ml-vision-book/flowers_5_jpeg/flower_photos/roses/7420699022_60fa574524_m.jpg",
  "gs://practical-ml-vision-book/flowers_5_jpeg/flower_photos/sunflowers/21518663809_3d69f5b995_n.jpg",
  "gs://practical-ml-vision-book/flowers_5_jpeg/flower_photos/tulips/8713398906_28e59a225a_n.jpg"]):

  img = read_and_decode(filename, [IMG_HEIGHT, IMG_WIDTH])

  ax[idx].imshow((img.numpy()))
  ax[idx].set_title(CLASS_NAMES[idx])
  ax[idx].axis('off')



## Reading the Dataset File

Reading all the images using the wildcard

`tf.io.gfile.glob("gs://cloud-ml-data/img/flower_photos/*/*.jpg)`

In [None]:
basename = tf.strings.regex_replace(filename, 
                                    "gs://cloud-ml-data/img/flower_photos/", "")
label = tf.strings.split(basename, '/')[0]
print(label)

In [None]:
# Specify what TensorFlow needs to erplace in ordr to handle a line where one or more values are missing
def decode_csv(csv_row):
  record_defaults = ['path', 'flower']
  filename, label_string = tf.io.decode_csv(csv_row, record_defaults)
  img = read_and_decode(filename, [IMG_HEIGHT, IMG_WIDTH])
  return img, label_string

In [None]:
dataset = (tf.data.TextLineDataset(
    "gs://practical-ml-vision-book/flowers_5_jpeg/flower_photos/train_set.csv").
    map(decode_csv))

`take(3)` truncates the datset to three items

print out the average pixel value using `tf.reduce_mean()`


label -> string tensor

```
tf.Tensor(b'daisy', shape=(), dtype=string) 
tf.Tensor([0.3588961  0.36257887 0.26933077], 
          shape=(3,), dtype=float32)
```

avg -> 1D tensor of length 3
  - we got 1D tensor because of `axis=[0, 1]`

`[IMG_HEIGHT, IMG_WIDTH, NUM_CHANNELS]` providing an `axis=[0, 1]` we are asking tensor to compute the average of all columns (axis=0) and all rows (axis=1), but not to average the RGB values

In [None]:
for img, label in dataset.take(3):
  avg = tf.math.reduce_mean(img, axis=[0, 1]) # Average pixel in the image
  print(label, avg)

## A Linear Model Using Keras
