# Flower Classification

In this project I will be participating in the [Petals to the Metal](https://www.kaggle.com/c/tpu-getting-started/overview) competition, which is a "getting started" competition on [Kaggle](kaggle.com). The objective is to classify different flowers

## Setup
### Imports

In [12]:
import tensorflow as tf
AUTO = tf.data.experimental.AUTOTUNE
IMAGE_SIZE = [512, 512]

### Filenames

In [3]:
path = r'.\tfrecords-jpeg-512x512'
TRAINING_FILENAMES = tf.io.gfile.glob(path + '/train/*.tfrec')
VALIDATION_FILENAMES = tf.io.gfile.glob(path + '/val/*.tfrec')
TEST_FILENAMES = tf.io.gfile.glob(path + '/test/*.tfrec')

### Helper Functions
These functions can be found in the `petal_helper.py` file. I will be re-making them here in an effort to understand what they do. These functions are primarily used to prepare the data for use by the tpu.

In [33]:
def decode_image(image_data):
    '''Turn image data into an array of numbers
    Args:
        image_data: jpeg image extracted from a tfrecord file
    Returns:
        
    '''
    image = tf.image.decode_jpeg(image_data, channels = 3)
    image = tf.cast(image, tf.float32) / 255.0 # convert image to floates in [0,1] range
    image = tf.reshape(image, [*IMAGE_SIZE , 3]) # explicit size needed for TPU
    return image

In [37]:
def read_labeled_tfrecord(example):
    LABELED_TFREC_FORMAT = {
        'image': tf.io.FixedLenFeature([], tf.string), # tf.string means bytesting
        'class': tf.io.FixedLenFeature([], tf.int64) # shape[] means single element
    }
    example = tf.io.parse_single_example(example, LABELED_TFREC_FORMAT)
    image = decode_image(example['image'])
    label = tf.cast(example['class'], tf.int32)
    return image, label

def read_unlabeled_tfrecord(example):
    UNLABELED_TFREC_FORMAT = {
        'image': tf.io.FixedLenFeature([], tf.string),
        'id': tf.io.FixedLenFeature([], tf.string)
        # no class, because this will be used on the test dataset
    }

In [38]:
def load_dataset(filenames, labeled = True, ordered = False):
    # Read from tfrecords. for optimal performance, reading from multiple files at once and disregarding
    # data order. order does not mapper since we will be shuffling the data anyway
    # it's generally a good idea to split the data into 16 parts when working with a tpu
    ignore_order = tf.data.Options()
    if not ordered:
        ignore_order.experimental_deterministic = False #disable order use what is currently being loaded
    
    dataset = tf.data.TFRecordDataset(filenames, num_parallel_reads = AUTO) 
    dataset = dataset.with_options(ignore_order)
    dataset = dataset.map(read_labeled_tfrecord if labeled else read_unlabeled_tfrecord,
                          num_parallel_calls = AUTO)
    # returns a dataset of (image, label) pairs if labeled = True or (image, id) if not labeled
    return dataset


In [39]:
load_dataset(TRAINING_FILENAMES)

<ParallelMapDataset shapes: ((512, 512, 3), ()), types: (tf.float32, tf.int32)>