<a href="https://colab.research.google.com/github/mvenouziou/Project-Attention-Is-What-You-Get/blob/main/bms_molecular_translation_AttentionIsWhatYouGet.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Attention is What You Get

This is my entry into the [Bristol-Myers Squibb Molecular Translation](https://www.kaggle.com/c/bms-molecular-translation)  Kaggle competition.

-----

AUTHOR: 

Mo Venouziou

- *Email: mvenouziou@gmail.com*
- *LinkedIn: www.linkedin.com/in/movenouziou/*

*June 2, 2021*

----



### Our Goal: Predict the "InChI" value of any given chemical compound diagram. 

International Chemical Identifiers ("InChI values") are a standardized encoding to describe chemical compounds. They take the form of a string of letters, numbers and deliminators, often between 100 - 400 characters long. 

The chemical diagrams are provided as PNG files, often of such low quality that it may take a human several seconds to decipher. 

Label length and image quality become a serious challenge here, because we must predict labels for a very large quantity of images. There are 1.6 million images in the test set abd 2.4 million images available in the training set!

In [None]:
"""
# Example (image, target label) pair\n\n'
for val in train_ds.unbatch().take(1):
    print('Example Label:\n', val['InChI'].numpy())
    print('\nCorresponding Image:', plt.imshow(val['image'][:,:,0], cmap='binary'))
### note: load datasets before running this cell
"""

## MODEL STRUCTURE: 

**Image CNN + Attention Features encoder --> text Attention + CNN feature layer decoder.**

This is a hybrid approach with:
 
 - Image Encoder from [*Show, Attend and Tell: Neural Image Caption Generation with Visual Attention*](https://proceedings.mlr.press/v37/xuc15.pdf).  Generate image feature vectors using intermediate layer outputs from a pretrained CNN. (Here I use the more modern EfficientNet model with fixed weights and add a trainable Dense layer for customization.)
 
 - T2T encoder-decoder model from [*All You Need is Attention*](https://papers.nips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf) (Self-attention feature extraction for both encoder and decoder, joint encoder-decoder attention feature interactions, and an (optional) dense prediction output block. 

 - ***PLUS*** *(optional):* Decoder Output Blocks placed in Series (not stacked). Increase the number of trainable paramaters without adding inference computational complexity, while also allowing decoders to specialize on different regions of the output. (Note: Training is a bit trickier. My experiments show it is best to first train with the decoders using shared weights, then allowing them to vary later on in training.)
 
 - ***PLUS*** *(optional):* Is attention really all you need? Add a convolutional layer to enhance text features before decoder self-attention to experiment with performance differences with and without extra convolutional layer(s). Use of CNN's in NLP comes from [*Convolutional Sequence to Sequence Learning*](http://proceedings.mlr.press/v70/gehring17a.html.)

 - ***PLUS*** *(optional):* Beam-Search Alternative, an extra decoding layer applied after the full logits prediction has been made. This takes the form of a bidirectional RNN applied to the full logits sequence. Because a full (initial) prediction has already been made, computations can be paralelized using stateful RNNs. (See more details below.)

*Optional features can be enabled/disabled using parameters in my model definitions.*

----

## NEXT STEPS:

 - Implement **TPU training**. (Currently runs on GPU. Note that a CPU along is not enough to achieve acceptable inference speed.)

 - experiment with **"Tokens-to-Token ViT"** in place of the image CNN. (Technique from [*Training Vision Transformers from Scratch on ImageNet*](https://arxiv.org/pdf/2101.11986.pdf)
  
 - Train my **Beam-search Alternative**. 

    - Beam search is a technique to modify model predictions to reflect the (local) maximum likelihood estimate. However, it is *very* local in that computation expense increases quickly with the number of character steps taken into account. This is also a hard-coded algorithm, which is somewhat contrary to the philosophy of deep learning.

    - A *Beam-search Alternative* would be an extra decoding layer applied *after* the full logits prediction has been made. This might be in the form of a stateful, bidirectional RNN that is computationally parallizable because it is applied to the full logits sequence.

    - This is coded and ready to train, although I have not yet had the time to do so.

 - Treat the number of convolutional layers (decoder feature extraction) and number of decoders places in series (decoder prediction output) as **new hyperparamaters** to tune.

-------------




### CITATIONS

- "Attention is All You Need." 
 - Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin. NIPS (2017). *https://research.google/pubs/pub46201/*

- "Convolutional Sequence to Sequence Learning."
 
  - Gehring, J., Auli, M., Grangier, D., Yarats, D. & Dauphin, Y.N.. (2017). Convolutional Sequence to Sequence Learning. Proceedings of the 34th International Conference on Machine Learning, in Proceedings of Machine Learning Research 70:1243-1252 Available from *http://proceedings.mlr.press/v70/gehring17a.html.*


-  "Show, Attend and Tell: Neural Image Caption Generation with Visual Attention."
  -  Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhudinov, R., Zemel, R. & Bengio, Y.. (2015). Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. Proceedings of the 32nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 37:2048-2057 Available from *http://proceedings.mlr.press/v37/xuc15.html.* 
            

- "Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet"

  - Li Yuan, Yunpeng Chen, Tao Wang, Weihao Yu, Yujun Shi, Zihang Jiang, Francis EH Tay, Jiashi Feng, Shuicheng Yan. Preprint (2021). Available at *https://arxiv.org/abs/2101.11986*.

- Special thanks to [Darien Schettler](https://www.kaggle.com/dschettler8845/bms-efficientnetv2-tpu-e2e-pipeline-in-3hrs/notebook.) for leading readers to the "Show" and "Attention" papers cited above, and sharing his progress at various stages in the competition through public Kaggle notebooks. My work includes a few of his hyperparameter choices and selection of EfficientNet transfer model. I did not read or use his implementations of the papers above.

- It is possible my idea of a Beam Search Alternative is based on a lecture video from DeepLearning.ai's [Deep Learning Specialization](https://www.coursera.org/specializations/deep-learning)  on Coursera.

- **Dataset / Kaggle Competition:** "Bristol-Myers Squibb – Molecular Translation" competition on Kaggle (2021). *https://www.kaggle.com/c/bms-molecular-translation*

----


## Contents

1. [Imports](https://colab.research.google.com/drive/1i6LMwu7BRfs955U4AdtV2oaI_9_A_Awq#scrollTo=TjuUOVXao__C&line=4&uniqifier=1)
2. [Data Pipeline](https://colab.research.google.com/drive/1i6LMwu7BRfs955U4AdtV2oaI_9_A_Awq#scrollTo=lrLHKs5Ni7Sz)
3. [Model Layers](https://colab.research.google.com/drive/1i6LMwu7BRfs955U4AdtV2oaI_9_A_Awq#scrollTo=W0T-u0vZamI8)
    - [InChI Encoding](https://colab.research.google.com/drive/1i6LMwu7BRfs955U4AdtV2oaI_9_A_Awq#scrollTo=DYApmA2lf1hp&line=1&uniqifier=1)
    - [Image Encoding and Self-Attention](https://colab.research.google.com/drive/1i6LMwu7BRfs955U4AdtV2oaI_9_A_Awq#scrollTo=FESofcGdEaWF&line=1&uniqifier=1)
    - [Decoder Self-Attention](https://colab.research.google.com/drive/1i6LMwu7BRfs955U4AdtV2oaI_9_A_Awq#scrollTo=6qFDs9RTjvod&line=1&uniqifier=1)
    - [Joint Encoder-Decoder Attention](https://colab.research.google.com/drive/1i6LMwu7BRfs955U4AdtV2oaI_9_A_Awq#scrollTo=jP-t1MkKnD5L)
    - [Decoder Head (Prediction Output)](https://colab.research.google.com/drive/1i6LMwu7BRfs955U4AdtV2oaI_9_A_Awq#scrollTo=38GA7wtNEhqW&line=1&uniqifier=1)
    - [Update Mechanism](https://colab.research.google.com/drive/1i6LMwu7BRfs955U4AdtV2oaI_9_A_Awq#scrollTo=_2UR1DLljD0S&line=1&uniqifier=1)
4. [Full Model](https://colab.research.google.com/drive/1i6LMwu7BRfs955U4AdtV2oaI_9_A_Awq#scrollTo=D6GIs3f3rpu0&line=1&uniqifier=1)
5. [Training](https://colab.research.google.com/drive/1i6LMwu7BRfs955U4AdtV2oaI_9_A_Awq#scrollTo=otxdN02mf1ht&line=1&uniqifier=1)
6. [Inference](https://colab.research.google.com/drive/1i6LMwu7BRfs955U4AdtV2oaI_9_A_Awq#scrollTo=Sbvzr5rdmjgs&line=5&uniqifier=1)

---

In [None]:
#### PACKAGE IMPORTS ####

# TF Model design
import tensorflow as tf
from tensorflow import keras
from tensorflow.data import TFRecordDataset
from tensorflow.data.experimental import TFRecordWriter
#!pip install -q tensorflow_addons
#import tensorflow_addons as tfa

# Text processing
import re
import string

# Kaggle (for TPU)
#from kaggle_datasets import KaggleDatasets

# Visualizations
import matplotlib.pyplot as plt
%matplotlib inline
from PIL import Image
!pip install -U tensorboard_plugin_profile
%load_ext tensorboard

# data management
import numpy as np
import pandas as pd
import itertools

# file management
import os

Requirement already up-to-date: tensorboard_plugin_profile in /usr/local/lib/python3.7/dist-packages (2.4.0)
The tensorboard extension is already loaded. To reload it, use:
  %reload_ext tensorboard


## Model parameters

The 'ModelParameters' class manages global hyperparamaters for portability between Colab and Kaggle notebook environments. Once set, all other cells will run on either platform.

On Colab, connection to my personal Google Drive is required, as ModelParameters will extract the dataset from a zip file to the hosted environment. This process may take several minutes. (It would not be difficult for the reader to update the code to point to their own drive and download the zip dataset using the Kaggle API code below.)

In [None]:
""" Kaggle api for download the compressed dataset from Kaggle's servers.

# imports
!pip uninstall -y kaggle
!pip install --upgrade pip
!pip install kaggle==1.5.6

# if needed, download data using '!kaggle competitions download -c bms-molecular-translation'
# then unzip with '! unzip bms-molecular-translation.zip -d datasets'
os.environ['KAGGLE_CONFIG_DIR'] = '/content/gdrive/MyDrive/Kaggle'  # api token location
"""

" Kaggle api for download the compressed dataset from Kaggle's servers.\n\n# imports\n!pip uninstall -y kaggle\n!pip install --upgrade pip\n!pip install kaggle==1.5.6\n\n# if needed, download data using '!kaggle competitions download -c bms-molecular-translation'\n# then unzip with '! unzip bms-molecular-translation.zip -d datasets'\nos.environ['KAGGLE_CONFIG_DIR'] = '/content/gdrive/MyDrive/Kaggle'  # api token location\n"

In [None]:
class ModelParameters:
    def __init__(self, cloud_server='kaggle'):
               
        # universal parameters
        self._batch_size = 8  # adjust based on system RAM
        self._inference_batch_size = 256
        self._image_size = (224, 224)  # shape to process images in data pipeline
        self.SOS_string = 'InChI=1S/'  # start of sentence value
        self.EOS_string = '<EOS>'  # end of sentence value
        self._strategy = tf.distribute.get_strategy() # default distribution strategy in Tensorflow. May be overwritten below

        # File Paths
        
        if cloud_server == 'colab':  # Google Colab with GDrive (CPU / GPU)
            
            from google.colab import drive
            drive.mount('/content/gdrive/') 

            # unzip data
            if not os.path.isdir('/content/bms-molecular-translation'):
                !unzip -q /content/gdrive/MyDrive/Colab_Notebooks/models/MolecularTranslation/bms-molecular-translation.zip -d '/content/bms-molecular-translation'
            
            self._dataset_dir = 'bms-molecular-translation/'
            self._labels_dir = self._dataset_dir
            self._prepared_files_dir = '/content/gdrive/MyDrive/Colab_Notebooks/models/MolecularTranslation/'
            self._checkpoint_dir = '/content/gdrive/MyDrive/Colab_Notebooks/models/MolecularTranslation/checkpoints/'
            self._load_checkpoint_dir = self._checkpoint_dir
            self._csv_save_dir = self._prepared_files_dir

        elif cloud_server == 'colab_TPU':
            """ NOTE: not yet implemented """
            resolver = tf.distribute.cluster_resolver.TPUClusterResolver(tpu='')
            tf.config.experimental_connect_to_cluster(resolver)
            tf.tpu.experimental.initialize_tpu_system(resolver)
            self._strategy = tf.distribute.TPUStrategy(resolver)    

            # update batch size
            self._batch_size = 16 * self._strategy .num_replicas_in_sync
            self._inference_batch_size = 16 * self._strategy .num_replicas_in_sync            

            """
            # add file system structure here
            """
                
        elif cloud_server == 'kaggle': # Kaggle cloud notebook (CPU / GPU)
            
            self._dataset_dir = '../input/bms-molecular-translation/'
            self._labels_dir = self._dataset_dir
            self._prepared_files_dir = '../input/periodic-table/'
            self._checkpoint_dir = './'
            self._load_checkpoint_dir = '../input/k/mvenou/bms-molecular-translation/checkpoints/'
            self._csv_save_dir = './'
        
        elif cloud_server == 'kaggle_TPU': # Enables Kaggle TPU 
            # Detect hardware, return appropriate distribution strategy
            
            tpu = tf.distribute.cluster_resolver.TPUClusterResolver()  # TPU detection. No parameters necessary if TPU_NAME environment variable is set. On Kaggle this is always the case.
            print('Running on TPU ', tpu.master())

            tf.config.experimental_connect_to_cluster(tpu)
            tf.tpu.experimental.initialize_tpu_system(tpu)
            self._strategy = tf.distribute.experimental.TPUStrategy(tpu)

            print("REPLICAS: ", self._strategy.num_replicas_in_sync)
            
            # update batch size
            self._batch_size = 16 * self._strategy .num_replicas_in_sync
            self._inference_batch_size = 16 * self._strategy .num_replicas_in_sync

            # Data Path
            self._dataset_dir = KaggleDatasets().get_gcs_path('bms-molecular-translation')
            self._labels_dir = KaggleDatasets().get_gcs_path('bms-molecular-translation')
            self._prepared_files_dir = KaggleDatasets().get_gcs_path('periodic-table')
            self._checkpoint_dir = './'
            self._load_checkpoint_dir = './'
            self._csv_save_dir = './'
            
        # common file paths
        self._periodic_table_csv = os.path.join(self._prepared_files_dir, 'periodic_table_elements.csv')
        self._vocab_csv = os.path.join(self._prepared_files_dir, 'vocab.csv')
        self._processed_labels_valid_csv = os.path.join(self._prepared_files_dir, 'processed_labels_valid.csv')
        self._processed_labels_train_csv = os.path.join(self._prepared_files_dir, 'processed_labels_train.csv')
            
        self._test_images_dir = os.path.join(self._dataset_dir, 'test/')
        self._train_images_dir = os.path.join(self._dataset_dir, 'train/')
        self._extra_labels_csv = os.path.join(self._labels_dir, 'extra_approved_InChIs.csv')
        self._train_labels_csv = os.path.join(self._labels_dir, 'train_labels.csv')
        self._sample_submission_csv = os.path.join(self._labels_dir, 'sample_submission.csv')
        
    # functions to access params
    def cloud_server(self):
        return self._cloud_server
    def strategy(self):
        return self._strategy
    def csv_save_dir(self):
        return self._csv_save_dir
    def processed_labels_train_csv(self):
        return self._processed_labels_train_csv
    def processed_labels_valid_csv(self):
        return self._processed_labels_valid_csv
    def train_labels_csv(self):
        return self._train_labels_csv
    def vocab_csv(self):
        return self._vocab_csv
    def periodic_table_csv(self):
        return self._periodic_table_csv
    def batch_size(self):
        return self._batch_size  
    def inference_batch_size(self):
        return self._inference_batch_size
    def image_size(self):
        return self._image_size    
    def SOS(self):
        return self.SOS_string
    def EOS(self):
        return self.EOS_string
    def dataset_dir(self):
        return self._dataset_dir
    def labels_dir(self):
        return self._labels_dir
    def temp_dataset_dir(self):
        return self._temp_dataset_dir
    def images_dir(self):
        return self._images_dir
    def train_images_dir(self):
        return self._train_images_dir
    def test_images_dir(self):
        return self._test_images_dir   
    def checkpoint_dir(self):
        return self._checkpoint_dir
    def load_checkpoint_dir(self):
        return self._load_checkpoint_dir
    def image_size(self):
        return self._image_size   
    def batch_size(self):
        return self._batch_size
    def elements_csv_path(self):
        return self._periodic_table_csv

Initialize Parameter Options

In [None]:
PARAMETERS = ModelParameters(cloud_server='colab')

Drive already mounted at /content/gdrive/; to attempt to forcibly remount, call drive.mount("/content/gdrive/", force_remount=True).


# **Input Pipeline**

Load train labels as DataFrame

In [None]:
# Load CSV as dataframe
train_labels_df = pd.read_csv(PARAMETERS.train_labels_csv())

In [None]:
train_labels_df.head()

Unnamed: 0,image_id,InChI
0,000011a64c74,InChI=1S/C13H20OS/c1-9(2)8-15-13-6-5-10(3)7-12...
1,000019cc0cd2,InChI=1S/C21H30O4/c1-12(22)25-14-6-8-20(2)13(1...
2,0000252b6d2b,InChI=1S/C24H23N5O4/c1-14-13-15(7-8-17(14)28-1...
3,000026b49b7e,InChI=1S/C17H24N2O4S/c1-12(20)18-13(14-7-6-10-...
4,000026fc6c36,InChI=1S/C10H19N3O2S/c1-15-10(14)12-8-4-6-13(7...


### InChI Text Parsing

We split each InChI label into its "vocabulary" of logical subunits, consisting of element abbreviations numbers, common symbols and the required string 'InChI=1S/', which is at the start of every InChI label. We want to narrow down this vocabulary to the smallest set represented in our training data. The functions below provide a system for finding this minimal set, as well as preparing a new CSV file with parsed labels ready to be fed into a tokenizer layer.

(For clarity and to reduce reliance on loading external files, the true code has been commented out and replaced with corresponding hard-coded values.)

In [None]:
def inchi_parsing_regex(parameters=PARAMETERS):
    # regex for spliting on InChi, but preserving chemical element abbreviations and three-digit numbers
    
    # shortcut: hard coded values
    vocab = [parameters.EOS(), parameters.SOS(), '(',
            ')', '+', ',', '-', '/', 'Br', 'B', 'Cl', 'C', 'D', 'F',
            'H', 'I', 'N', 'O', 'P', 'Si', 'S', 'T', 'b', 'c', 'h', 'i',
            'm', 's', 't']
        
    vocab += [str(num) for num in reversed(range(168))]
    vocab = [re.escape(val) for val in vocab]
       
    """ # to create vocab from scratch, use:
    SOS = parameters.SOS()
    EOS = parameters.EOS()
    
    # load list of elements we should search for within InChI strings: 
    periodic_elements = pd.read_csv(PARAMETERS.periodic_table_csv(), header=None)[1].to_list()
    periodic_elements = periodic_elements + [val.lower() for val in periodic_elements] + [val.upper() for val in periodic_elements]
    
    punctuation = list(string.punctuation)
    punctuation = [re.escape(val) for val in punctuation]   # update values with regex escape chars added as needed

    three_dig_nums_list = [str(i) for i in range(1000, -1, -1)]

    vocab = [SOS, EOS] + periodic_elements + three_dig_nums_list + punctuation
    """

    split_elements_regex = rf"({'|'.join(vocab)})"
    
    return split_elements_regex

In [None]:
INCHI_PARSING_REGEX = inchi_parsing_regex()

def parse_InChI(texts, parsing_regex=INCHI_PARSING_REGEX):  
    return ' '.join(re.findall(parsing_regex, texts))

## Datasets

Here we create efficient tf.data.Dataset train / validation / test sets.

Out data pipeline will read our prepared CSV of (image filename, parsed InChI and standard InChI) tuples. (If this file is not found, it will be created from scratch. This may take several minutes)  Iterating through the list, it will load batches of corresponding images and labels.

Our datasets contain the following information, accessible by dict keys: images, image_id, InChI, parsed_InChI. (The test set uses InChI = parsed_InChI = 'InChI=1S/', the known required stating value for any InChI code.)

In [None]:
def data_generator(image_set, labels_dataframe=None, parameters=PARAMETERS):
    
    # image parameters
    load_image_size = (350, 350)  # use min of (224, 224), the EfficientNet model input size
    
    # get universal params
    batch_size = parameters.batch_size()
    inference_batch_size = parameters.inference_batch_size()
    target_size = parameters.image_size()
    SOS = parameters.SOS()
    EOS = parameters.EOS()
    
    # dataset options
    prefetch_size = tf.data.AUTOTUNE 
    inference_prefetch_size = tf.data.AUTOTUNE    
    options = tf.data.Options()
    options.experimental_optimization.autotune_buffers = True
    options.experimental_optimization.map_vectorization.enabled = True
    options.experimental_optimization.apply_default_optimizations = True
            
    def preprocess_image(image):
        # layers
        scale = keras.layers.experimental.preprocessing.Rescaling(1./255)
        resize = keras.layers.experimental.preprocessing.Resizing(
            height=target_size[0], width=target_size[1], interpolation='bicubic')
        
        # preprocessing steps
        image = tf.cast(image, tf.float32)
        image = resize(image)
        
        return image
        
    # Train & Validation Datasets
    if image_set in ['train', 'valid']:
        DIRECTORY = os.path.join(parameters.dataset_dir(), 'train/')
        
        # try loading processed CSV
        try:
            if image_set == 'train':
                assert os.path.isfile(parameters.processed_labels_train_csv()) or parameters.cloud_server()=='kaggle_TPU'
                processed_labels_csv = parameters.processed_labels_train_csv()
                
            elif image_set == 'valid':
                assert os.path.isfile(parameters.processed_labels_valid_csv()) or parameters.cloud_server()=='kaggle_TPU'
                processed_labels_csv = parameters.processed_labels_valid_csv()
                
     # otherwise process data from scratch 
        except:
            valid_split = .05
            num_valid_samples = int(valid_split * len(labels_dataframe))

            if image_set == 'train':
                dataframe = labels_dataframe.iloc[num_valid_samples: ]  # get train split
            elif image_set == 'valid':
                dataframe = labels_dataframe.iloc[: num_valid_samples]  # get validation split

            # shuffle
            dataframe = dataframe.sample(frac=1)

            # add image path info
            dataframe['image_path'] =  dataframe['image_id'].apply(lambda x: os.path.join(x[0], x[1], x[2], x + '.png'))  # update image path info

            # prepare InChI labels for model       
            dataframe['parsed_InChI'] =  dataframe['InChI'].apply(lambda x: parse_InChI(x))

            # save as CSV
            dataframe = dataframe[['image_path', 'image_id', 'InChI', 'parsed_InChI']]  # set column order
            processed_labels_csv = ''.join([parameters.csv_save_dir(), 'processed_labels_', image_set, '.csv'])
            dataframe.to_csv(processed_labels_csv, index=False)

        # reload as dataset
        dataset = tf.data.experimental.make_csv_dataset(
                    file_pattern=processed_labels_csv, 
                    batch_size=1, 
                    column_names=['image_path', 'image_id', 'InChI', 'parsed_InChI'], 
                    column_defaults=[tf.string, tf.string, tf.string, tf.string],
                    label_name=None, select_columns=None, field_delim=',',
                    use_quote_delim=True, na_value='', header=True, num_epochs=None,
                    shuffle=True, shuffle_buffer_size=250, shuffle_seed=None,
                    prefetch_buffer_size=None, num_parallel_reads=5, sloppy=True,
                    num_rows_for_inference=100, compression_type=None, ignore_errors=False)
        
        # load images into dataset
        def load_image(image_path, target_size=target_size):
            
            # layer for initial image loading (to match test data pipeline)
            resize = keras.layers.experimental.preprocessing.Resizing(
                height=load_image_size[0], width=load_image_size[1], interpolation='bicubic')
            
            # update image path
            image_path = tf.strings.join([DIRECTORY, image_path])
            
            # load file
            image = keras.layers.Lambda(lambda x: tf.io.read_file(x))(image_path)
            image = keras.layers.Lambda(lambda x: tf.io.decode_jpeg(x, channels=1))(image)
            image = tf.cast(image, tf.float32)
            image = resize(image)
            
            # preprocessing / standardization
            image = preprocess_image(image)

            return image      

        # final dataset with column keys
        dataset = dataset.map(lambda x: {'image': load_image(x['image_path'][0]), 
                                         'image_id': x['image_id'][0], 
                                         'parsed_InChI': tf.strings.join([x['parsed_InChI'][0], EOS, EOS, EOS, EOS, EOS], separator=' '),
                                         'InChI': x['InChI'][0]}, 
                              num_parallel_calls=tf.data.AUTOTUNE)

        dataset = dataset.with_options(options)
        dataset = dataset.batch(batch_size)
        dataset = dataset.prefetch(prefetch_size)
    
    # Test Dataset
    elif image_set == 'test':
        dataset = tf.keras.preprocessing.image_dataset_from_directory(
            directory=parameters.test_images_dir(), labels='inferred', label_mode=None,
            class_names=None, color_mode='grayscale', batch_size=1, 
            image_size=load_image_size, shuffle=False, seed=None, validation_split=None, subset=None,
            interpolation='bicubic', follow_links=False)
        
        # set filenames as label
        image_id_ds = tf.data.Dataset.from_tensor_slices(dataset.file_paths)
        image_id_ds = image_id_ds.map(lambda x: tf.strings.split(x, os.path.sep)[-1],
                                      num_parallel_calls=tf.data.AUTOTUNE)
        
        # image preprocessing / standardization
        dataset = dataset.map(preprocess_image)
        
        # set InChI label as start value 'InChI=1S/'
        inchi_ds = image_id_ds.map(lambda x: tf.constant(SOS, dtype=tf.string),
                                   num_parallel_calls=tf.data.AUTOTUNE)
        
        # merge datasets
        dataset = tf.data.Dataset.zip((dataset, image_id_ds, inchi_ds))
        
        # set key names
        dataset = dataset.map(lambda x, y, z: {'image': tf.squeeze(x, axis=0), 
                                               'image_id': y, 
                                               'parsed_InChI': z,
                                               'InChI': z},
                              num_parallel_calls=tf.data.AUTOTUNE)
        
        dataset = dataset.with_options(options)
        dataset = dataset.batch(inference_batch_size)
        dataset = dataset.prefetch(inference_prefetch_size)
        
        
    else: # generate error for invalid set name
        assert 0==1    
    
    return dataset

Create Test, Train and Validation Datasets

In [None]:
train_ds = data_generator(image_set='train', labels_dataframe=train_labels_df, parameters=PARAMETERS)
valid_ds = data_generator(image_set='valid', labels_dataframe=train_labels_df, parameters=PARAMETERS)
#test_ds = data_generator(image_set='test', parameters=PARAMETERS)

Examine data shapes

In [None]:
for val in train_ds.take(1):
    print('Train DS')
    print('image:', val['image'].shape, 'image_id:', val['image_id'].shape, 'InChI:', val['InChI'].shape, 'parsed_InChI:', val['parsed_InChI'].shape)

for val in valid_ds.take(1):
    print('\nValidation DS')
    print('image:', val['image'].shape, 'image_id:', val['image_id'].shape, 'InChI:', val['InChI'].shape, 'parsed_InChI:', val['parsed_InChI'].shape)

"""
for val in test_ds.take(1):
    print('\nTest DS')
    print(val[0].shape, val[1].shape, val[2].shape)
    # print('image:', val['image'].shape, 'image_id:', val['image_id'].shape, 'parsed_InChI:', val['parsed_InChI'].shape)
"""

### TF Records Implementation

In [None]:
# helper functions to create TFRecords file for use on TPU
def make_example(image, image_id, parsed_InChI, InChI):
    image_feature = tf.train.Feature(
        bytes_list=tf.train.BytesList(value=[            
            tf.io.serialize_tensor(image).numpy()])
    )
    image_id_feature = tf.train.Feature(
        bytes_list=tf.train.BytesList(value=[
            tf.io.serialize_tensor(image_id).numpy()])
    )
    parsed_InChI_feature = tf.train.Feature(
        bytes_list=tf.train.BytesList(value=[
            tf.io.serialize_tensor(parsed_InChI).numpy()])
    )
    InChI_feature = tf.train.Feature(
        bytes_list=tf.train.BytesList(value=[
            tf.io.serialize_tensor(InChI).numpy()])
    )

    features = tf.train.Features(feature={
        'image': image_feature,
        'image_id': image_id_feature,
        'parsed_InChI': parsed_InChI_feature,
        'InChI': InChI_feature
    })
    
    example = tf.train.Example(features=features)

    return example.SerializeToString()

def make_example_py_fn(image, image_id, InChI, parsed_InChI):
    return tf.py_function(func=make_example, 
                   inp=[image, image_id, InChI, parsed_InChI], 
                   Tout=tf.string)

def decode_example(example):    
    feature_description = {'image': tf.io.FixedLenFeature([], tf.string),
                            'image_id': tf.io.FixedLenFeature([], tf.string),
                            'parsed_InChI': tf.io.FixedLenFeature([], tf.string),
                            'InChI': tf.io.FixedLenFeature([], tf.string)}
    
    values = tf.io.parse_single_example(example, feature_description)
    values['image'] = tf.io.parse_tensor(values['image'], out_type=tf.float32)
    
    return values

In [None]:
def create_records(dataset, folder):
    
    num_shards = 200
    
    # map to Feature Examples
    dataset = dataset.unbatch().batch(128)
    dataset = dataset.map(lambda x: make_example_py_fn(x['image'], x['image_id'], x['InChI'], x['parsed_InChI']))
    
    for shard_num in range(num_shards):
        path = os.path.join('./', folder, str(shard_num))
        if not os.path.isdir(os.path.join('./', folder)):
            os.mkdir(os.path.join('./', folder))
        writer = TFRecordWriter(path)
        
        this_shard = dataset.shard(num_shards, index=shard_num)
        writer.write(this_shard)
    
    return None
        
def read_records(filename):

    dataset = TFRecordDataset(filename)
    return dataset.map(decode_example)

In [None]:
"""
# This takes an incredible amount of time to run. 
# I think the code works, so I'm letting it run on Kaggle
# and will update when I have results

create_records(valid_ds, folder='valid_tfrec')
create_records(train_ds, folder='train_tfrec')
create_records(test_ds, folder='test_tfrec')
"""

# **Model Layers**

## InChI Encoding

Tokenizer and Embedding to convert parsed InChI strings to tensors of numbers

In [None]:
def Tokenizer(parameters, padded_length):
    """ note: crops /pads to max len
    """

    SOS = parameters.SOS()
    EOS = parameters.EOS()
    
    # Create vocabulary for tokenizer
    def create_vocab():       
        hard_coded_vocab = [PARAMETERS.EOS(), PARAMETERS.SOS(), '(',
            ')', '+', ',', '-', '/', 'B', 'Br',  'C', 'Cl', 'D', 'F',
            'H', 'I', 'N', 'O', 'P', 'S', 'Si', 'T', 'b', 'c', 'h', 'i',
            'm', 's', 't']
        
        numbers = [str(num) for num in range(168)]
        
        vocab = hard_coded_vocab + numbers
        
        """
        # get from saved file
        vocab = pd.read_csv(PARAMETERS.vocab_csv())['vocab_value'].to_list()   
        vocab = list(vocab)
        """

        """ 
        # To create from scratch, extract all vocab elements appearing in train set:
        df = pd.read_csv(PARAMETERS.train_labels_csv())  
        seg_len = 250000
        num_breaks = len(df) // seg_len

        vocab = set()
        for i in range(num_breaks):

            df_i =  df['InChI'].iloc[seg_len * i: seg_len * (i+1)]
            texts =  df_i.apply(lambda x: set(parse_InChI(x).split()))
            texts = texts.tolist()

            vocab = vocab.union(*texts)

            print(f'completed {i} / {num_breaks}')

        vocab = list(vocab)
        vocab_df = pd.DataFrame({'vocab_value': vocab})

        # save results
        filename = os.path.join(PARAMETERS.csv_save_dir(), 'vocab.csv')
        vocab_df.to_csv(filename, index=False)
        """
               
        return vocab

    vocab = create_vocab()
    
    # create tokenizer
    tokenizer_layer = tf.keras.layers.experimental.preprocessing.TextVectorization(
        standardize=None, split=lambda x: tf.strings.split(x, sep=' ', maxsplit=-1), 
        output_mode='int', output_sequence_length=padded_length, vocabulary=vocab)

    # record EOS token
    tokenized_EOS = tokenizer_layer(tf.constant([EOS]))
    
    # create inverse (de-tokenizer)
    inverse_tokenizer = tf.keras.layers.experimental.preprocessing.StringLookup(
        vocabulary=tokenizer_layer.get_vocabulary(), invert=True)

    return tokenizer_layer, inverse_tokenizer, tokenized_EOS

In [None]:
temp_tokenizer_layer, temp_inverse_tokenizer, temp_tokenized_EOS = \
    Tokenizer(parameters=PARAMETERS, padded_length=200)

InChI Input Prep Layer

In [None]:
def InchiPrep(tokenizer_layer, vocab_size, embedding_dim, name='InchiPrep'):
        
    # inputs
    parsed_inchi = keras.layers.Input([], dtype=tf.string, name='parsed_inchi')
    start_var = keras.layers.Input([1, embedding_dim], dtype=tf.float32)
    inputs = [parsed_inchi, start_var]
    
    # tokenize
    tokenized_inchi = tokenizer_layer(parsed_inchi)

    # split into input / target pairs
    inchi_target = tokenized_inchi
    
    inchi_input = keras.layers.Lambda(lambda x: x[:, :-1])(tokenized_inchi)
    inchi_input = keras.layers.Embedding(input_dim=vocab_size, output_dim=embedding_dim, 
                    mask_zero=True, name='Embedding')(inchi_input)
    inchi_input = keras.layers.Reshape([-1, embedding_dim])(inchi_input)
    inchi_input = keras.layers.Concatenate(-2)([start_var, inchi_input])
    
    # outpus
    outputs = [inchi_input, inchi_target]
    
    return keras.Model(inputs, outputs, name=name)

In [None]:
temp_tokenizer_layer, _, _ = Tokenizer(PARAMETERS, padded_length=200)
temp_prep = InchiPrep(temp_tokenizer_layer, vocab_size=130, embedding_dim=100)
temp_prep.summary()

Model: "InchiPrep"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
parsed_inchi (InputLayer)       [(None,)]            0                                            
__________________________________________________________________________________________________
text_vectorization_9 (TextVecto (None, 200)          0           parsed_inchi[0][0]               
__________________________________________________________________________________________________
lambda_8 (Lambda)               (None, 199)          0           text_vectorization_9[0][0]       
__________________________________________________________________________________________________
Embedding (Embedding)           (None, 199, 100)     13000       lambda_8[0][0]                   
__________________________________________________________________________________________

# Image Encoder

Feature Extraction Step 1: Run the images through a pre-trained image network, extracting features as the output of an intermediate convolutional layer. [Technique from "Show, Attend and Tell: Neural Image Caption Generation with Visual Attention," cited at the top of this notebook.]  A dense layer is added for transfer learning and to control the dimension of the attention mechanism used later.

Transfer Model

In [None]:
TRANSFER_MODEL = tf.keras.applications.EfficientNetB0(
        include_top=False, pooling='max', weights='imagenet')

FEATURES_LAYER = 'block7a_project_bn'  # layer to extract features from

In [None]:
def ImageEncoder(image_shape, output_dim, use_dense_top, name='ImageEncoder'):

    # transfer model
    transfer_model = TRANSFER_MODEL
    features_layers = FEATURES_LAYER
    transfer_model.trainable=False  # optional. My system doesn't have enough RAM to train this portion of the model, at a reasonable batch size

    # sub-models
    features_model = keras.Model(inputs=transfer_model.inputs,
                                 outputs=transfer_model.get_layer(features_layers).output,
                                 name='transfer_features') 
    
    # Inputs
    image = keras.layers.Input(image_shape, dtype=tf.float32, name='image')
    inputs = [image]
    
    # Model Path
    image_features = image
    image_features = tf.image.grayscale_to_rgb(image_features)  # enable if needed
    image_features = tf.keras.applications.efficientnet.preprocess_input(image_features)  # preprocessing
    image_features = features_model(image_features)
    
    features_dim = image_features.shape[-1]
    image_features = keras.layers.Reshape([-1, features_dim])(image_features)
    if use_dense_top:
        image_features = keras.layers.Dense(output_dim, activation='relu',
                                            name='dense')(image_features)

    outputs = [image_features]
    
    return keras.Model(inputs, outputs, name=name)

In [None]:
ImageEncoder(image_shape=(224, 224,1), output_dim=208, use_dense_top=True).summary()

Model: "ImageEncoder"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
image (InputLayer)           [(None, 224, 224, 1)]     0         
_________________________________________________________________
tf.image.grayscale_to_rgb_18 (None, 224, 224, 3)       0         
_________________________________________________________________
transfer_features (Functiona (None, None, None, 320)   3634851   
_________________________________________________________________
reshape_33 (Reshape)         (None, 49, 320)           0         
_________________________________________________________________
dense (Dense)                (None, 49, 208)           66768     
Total params: 3,701,619
Trainable params: 66,768
Non-trainable params: 3,634,851
_________________________________________________________________


## Encoder Attention

Feature Extraction Step 2: Now that we have basic feature vectors, we use self-attention to generate more complex features. This is the encoding step used in "Attention is All You Need," cited above. 

In [None]:
def EncoderAttention(decoder_units, num_att_elems, name='EncoderAttention'):

    # inputs
    encoder_vectors = keras.layers.Input([num_att_elems, decoder_units], name='encoder_vectors')   # from image encoder
    inputs = [encoder_vectors]

    # attention (uses "Attention is All You Need" structure)

    # encoder self-attentiondef EncoderAttention(decoder_units, num_att_elems, name='EncoderAttention'):

    # inputs
    encoder_vectors = keras.layers.Input([num_att_elems, decoder_units], name='encoder_vectors')   # from image encoder
    inputs = [encoder_vectors]

    # attention (uses "Attention is All You Need" structure)

    # encoder self-attention
    attention = tf.keras.layers.MultiHeadAttention(
            num_heads=8, key_dim=decoder_units//8, name='encoder_attention')(  # uses 'num_heads * key_dim = rnn_units' from paper
                query=encoder_vectors, value=encoder_vectors)
    
    attention = keras.layers.Dropout(rate=.1)(attention)
    attention = keras.layers.Add()([encoder_vectors, attention])
    attention = keras.layers.BatchNormalization()(attention)    

    # updated encoder vectors
    encoder_vectors = keras.layers.Dense(decoder_units, 'relu')(attention)    
    
    encoder_vectors = keras.layers.Dropout(rate=.1)(encoder_vectors)
    encoder_vectors = keras.layers.Add()([attention, encoder_vectors])
    encoder_vectors = keras.layers.BatchNormalization()(encoder_vectors)     

    # output
    outputs = [encoder_vectors]

    return keras.Model(inputs, outputs, name=name)
    attention = tf.keras.layers.MultiHeadAttention(
            num_heads=8, key_dim=decoder_units//8, name='encoder_attention')(  # uses 'num_heads * key_dim = rnn_units' from paper
                query=encoder_vectors, value=encoder_vectors)
    
    attention = keras.layers.Dropout(rate=.1)(attention)
    attention = keras.layers.Add()([encoder_vectors, attention])
    attention = keras.layers.BatchNormalization()(attention)    

    # updated encoder vectors
    encoder_vectors = keras.layers.Dense(decoder_units, 'relu')(attention)    
    
    encoder_vectors = keras.layers.Dropout(rate=.1)(encoder_vectors)
    encoder_vectors = keras.layers.Add()([attention, encoder_vectors])
    encoder_vectors = keras.layers.BatchNormalization()(encoder_vectors)     

    # output
    outputs = [encoder_vectors]

    return keras.Model(inputs, outputs, name=name)

In [None]:
EncoderAttention(decoder_units=320, num_att_elems=196).summary()

Model: "EncoderAttention"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
encoder_vectors (InputLayer)    [(None, 196, 320)]   0                                            
__________________________________________________________________________________________________
encoder_attention (MultiHeadAtt (None, 196, 320)     410880      encoder_vectors[0][0]            
                                                                 encoder_vectors[0][0]            
__________________________________________________________________________________________________
dropout_32 (Dropout)            (None, 196, 320)     0           encoder_attention[0][0]          
__________________________________________________________________________________________________
add_24 (Add)                    (None, 196, 320)     0           encoder_vectors[0]

## Decoder Self-Attention

Text Feature extraction + Encoder/Decoder Joint Attention interaction.

With use_covolutions set to False, this is the decoder self-attention feature-extraction step from "Attention is All You Need," cited above (with learned positional encoding). 

Includes an (optional) parameter to add a small convolutional layer for feature enhancement before the attention layer. This is included for experimentation / verification that attention really is all you need.


In [None]:
def DecoderAttention(embedding_dim, decoder_units, max_len, 
                     use_convolutions=True, name='DecoderAttention'):

    # inputs
    embedded_text = keras.layers.Input([max_len, embedding_dim], name='embedded_text')  # zero=masked
    mask = keras.layers.Input([max_len, max_len], name='mask')   # used to avoid leaking future info

    inputs = [embedded_text, mask]

    # positional encoding
    initializer = tf.random_normal_initializer()
    position_enc = tf.Variable(initializer(shape=[max_len, decoder_units], 
                                             dtype=tf.float32))
    position_enc = tf.expand_dims(position_enc, 0)  # for broadcasting against batch

    # mask the text input
    decoder_features = embedded_text * tf.expand_dims(mask[:, :, 0], axis=2)
    decoder_features = tf.keras.layers.Masking(mask_value=0.0)(decoder_features)  # not sure if Conv1D accepts masks?

    # (Optional, for experimentation) update features using convolution kernel
    if use_convolutions:
        # crop to unmasked input
        step = tf.math.argmin(mask[0, :, 0])
        decoder_features = tf.keras.layers.Conv1D(filters=decoder_units, kernel_size=3, 
                    strides=1, padding='same', groups=1)(decoder_features[:, :step, :])

        # pad back to pull length for uniform (masked) Attention input
        decoder_features = tf.pad(decoder_features, [[0,0],[0, max_len - step], [0,0]])
    
    # Add positional encoding
    decoder_features = position_enc + decoder_features

    # Decoder Self-Attention Block (with mask)
    decoder_attention = tf.keras.layers.MultiHeadAttention(
            num_heads=8, key_dim=decoder_units//8, name='decoder_attention')(
                query=decoder_features, value=decoder_features, attention_mask=mask)

    decoder_attention = keras.layers.Dropout(rate=.1)(decoder_attention)            
    decoder_attention = keras.layers.Add()([decoder_features, decoder_attention])
    decoder_attention = keras.layers.BatchNormalization()(decoder_attention)     

    outputs = [decoder_attention]

    return keras.Model(inputs, outputs, name=name)
    

In [None]:
DecoderAttention(embedding_dim=352, decoder_units=352, max_len=200, use_convolutions=True).summary()

Model: "DecoderAttention"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
mask (InputLayer)               [(None, 200, 200)]   0                                            
__________________________________________________________________________________________________
tf.__operators__.getitem_15 (Sl (None, 200)          0           mask[0][0]                       
__________________________________________________________________________________________________
embedded_text (InputLayer)      [(None, 200, 352)]   0                                            
__________________________________________________________________________________________________
tf.expand_dims_6 (TFOpLambda)   (None, 200, 1)       0           tf.__operators__.getitem_15[0][0]
___________________________________________________________________________________

##  Joint Encoder-Decoder Attention

This is the 'translation' component where our image features interacts with our (masked) text features. Masking is used to prevent information leak so only known text values are used at a given time step. This is the encoder-decoder attention step from "Attention is All You Need."

In [None]:
def JointAttention(decoder_units, num_att_elems, max_len, name='JointAttention'):

    encoder_attention = keras.layers.Input([num_att_elems, decoder_units], name='encoder_attention')   # from image
    decoder_attention = keras.layers.Input([max_len, decoder_units], name='decoder_attention')   # from known text

    inputs = [encoder_attention, decoder_attention]
    
    # Encode-Decoder Attention Block
    joint_attention = tf.keras.layers.MultiHeadAttention(
            num_heads=8, key_dim=decoder_units//8, name='joint_attention')(
                query=decoder_attention, value=encoder_attention)
            
    joint_attention = keras.layers.Dropout(rate=.1)(joint_attention)          
    joint_attention = keras.layers.Add()([decoder_attention, joint_attention])
    joint_attention = keras.layers.BatchNormalization()(joint_attention)    

    outputs = [joint_attention]

    return keras.Model(inputs, outputs, name=name)

In [None]:
JointAttention(decoder_units=32, num_att_elems=50, max_len=200, name='JointAttention').summary()

Model: "JointAttention"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
decoder_attention (InputLayer)  [(None, 200, 32)]    0                                            
__________________________________________________________________________________________________
encoder_attention (InputLayer)  [(None, 50, 32)]     0                                            
__________________________________________________________________________________________________
joint_attention (MultiHeadAtten (None, 200, 32)      4224        decoder_attention[0][0]          
                                                                 encoder_attention[0][0]          
__________________________________________________________________________________________________
dropout_35 (Dropout)            (None, 200, 32)      0           joint_attention[0][0

## Decoder Head (Prediction Output)

This is where we use what was learned in the encoder-decoder attention to output predicted labels. It is the prediction step from "Attention is All You Need."

In [None]:
def DecoderHead(decoder_units, vocab_size, max_len, name='DecoderHead'):
    
    decoder_input = keras.layers.Input([max_len, decoder_units])  # from Decoder Attention layer

    inputs = [decoder_input]

    # Prediction Block
    decoder_out = keras.layers.Dense(decoder_units, activation='relu',
                kernel_initializer= tf.keras.initializers.HeNormal())(decoder_input)

    decoder_out = keras.layers.Dropout(rate=.1)(decoder_out)
    decoder_out = decoder_input + decoder_out
    decoder_out = keras.layers.BatchNormalization()(decoder_out)

    probs = keras.layers.Dense(vocab_size, activation='softmax',
                kernel_initializer= tf.keras.initializers.HeNormal())(decoder_out)

    outputs = [probs]

    return keras.Model(inputs, outputs, name=name)


In [None]:
DecoderHead(decoder_units=320, vocab_size=199, max_len=200).summary()

Model: "DecoderHead"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_18 (InputLayer)           [(None, 200, 320)]   0                                            
__________________________________________________________________________________________________
dense_36 (Dense)                (None, 200, 320)     102720      input_18[0][0]                   
__________________________________________________________________________________________________
dropout_36 (Dropout)            (None, 200, 320)     0           dense_36[0][0]                   
__________________________________________________________________________________________________
tf.__operators__.add_15 (TFOpLa (None, 200, 320)     0           input_18[0][0]                   
                                                                 dropout_36[0][0]       

## Update Mechanism (Optional)

*Note: this is fully coded but I have not had time to train parameters with it. I leave that as a future opportunity for exploration.*

NLP technicques typically output logits to find the highest likelhood token prediction. This can be improved to a (local) maximum likelihood selection using a "beam step" that ay override the initial prediction choice. 

This layer is an alternative system for updating predictions. Unlike "beam," it is trainable and includes longer-range dependencies (instead of the very "local" beam step.) The entire original prediction is passed through a bidirectional RNN. 

In [None]:
def BeamUpdate(decoder_units, beam_rnn_units, input_dim, vocab_size, name='BeamUpdate'):
    
    # rnn layers
    BeamUnit = keras.layers.GRU(beam_rnn_units, return_sequences=True, return_state=True, go_backwards=True)
    
    # Inputs
    beam_input = keras.layers.Input([None, input_dim], dtype=tf.float32, name='beam_input') 
    hidden_state = keras.layers.Input([decoder_units], dtype=tf.float32, name='hidden_state')
    
    inputs = [beam_input, hidden_state]

    # downscale hidden state to beam dims
    beam_hidden_state = keras.layers.Dense(beam_rnn_units, activation='relu',
                              kernel_initializer= tf.keras.initializers.HeNormal()
                              )(hidden_state)   
    # RNN
    beam_out, beam_hidden_state = \
        BeamUnit(beam_input, initial_state=[beam_hidden_state])  # beam 1

    
    # logits
    probs = keras.layers.Dense(vocab_size, activation='softmax', name='dense_beam_probs',
                   kernel_initializer= tf.keras.initializers.HeNormal())(beam_out)

    outputs = [probs]
    
    return keras.Model(inputs, outputs, name=name)

In [None]:
temp_beam = BeamUpdate(decoder_units=320, beam_rnn_units=128, input_dim=130, vocab_size=199, name='BeamUpdate')
temp_beam.summary()

Model: "BeamUpdate"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
hidden_state (InputLayer)       [(None, 320)]        0                                            
__________________________________________________________________________________________________
beam_input (InputLayer)         [(None, None, 130)]  0                                            
__________________________________________________________________________________________________
dense_38 (Dense)                (None, 128)          41088       hidden_state[0][0]               
__________________________________________________________________________________________________
gru_6 (GRU)                     [(None, None, 128),  99840       beam_input[0][0]                 
                                                                 dense_38[0][0]          

# **Full Model**

All the components are combined into a full encoder/decoder model. This is implemented using the subclassing API with custom call, train,  evaluation and prediction steps. Once initialized, the models have full access to high-level model.fit(), model.compile() and model.save_weights() methods.

An extra features implemented is having Decoder() elements in *series* (not stacked). This adds more trainable parameters without affecting inference speed, and allows decoders to specialize more on different regions of the text.

BaseTrainer() model has the BeamUpdate mechanism disabled. InchiGenerator() models include the BeamUpdate.

In [None]:
class BaseTrainer(keras.Model):
    
    def __init__(self, beam_rnn_units, attention_dim, use_dense_encoder_top, 
                 use_convolutions, use_dual_decoder, 
                 max_len, parameters, name='BaseTrainer', **kwargs):
        super().__init__(name=name, **kwargs)

        """
        Beam updates turned off. 
        Training conducted with teach-fed inputs.

        note: dataset provided as (image, image_id, parsed_InChI, InChI)
        """
    
        # hard-coded parameters
        self.regularization_factor = 0.0
        
        # other params (required values)
        self.max_len = max_len  # max number of token prediction length
        self.beam_rnn_units = beam_rnn_units
        self.attention_dim = attention_dim
        self.use_dense_encoder_top = use_dense_encoder_top
        self.use_convolutions = use_convolutions
        self.use_dual_decoder = use_dual_decoder
        self.parameters = parameters
        self.SOS = parameters.SOS()
        self.EOS = parameters.EOS()
        
        # tokenizer / inverse tokenizer
        self.tokenizer_layer, self.inverse_tokenizer, self.tokenized_EOS = \
            Tokenizer(parameters=self.parameters, padded_length=self.max_len)
        self.vocab_size = len(self.tokenizer_layer.get_vocabulary())

    def get_config(self):
        config = {'beam_rnn_units': self.beam_rnn_units, 
                  'attention_dim':self.attention_dim,
                  'use_dense_encoder_top': self.use_dense_encoder_top,
                  'use_convolutions':use_convolutions,
                  'max_len':self.max_len,
                  'use_dual_decoder': self.use_dual_decoder,
                  'parameters': self.parameters}
        return config 
    
    def build(self, input_shape):
        # note: dataset prepared with dict keys (image, image_id, parsed_InChI, InChI)
        
        self.batch_size = input_shape['image'][0]

        # encoder
        self.image_shape = input_shape['image'][1:]  # drops batch dims       
        self.image_encoder = ImageEncoder(image_shape=self.image_shape, 
                                          output_dim=self.attention_dim,
                                          use_dense_top=self.use_dense_encoder_top,
                                          name='ImageEncoder')   

        # collect params
        self.decoder_units = self.image_encoder.output_shape[-1]
        self.num_att_vectors = self.image_encoder.output_shape[-2]
        self.embedding_dim = self.decoder_units  # required for consistency

        # trainable start value
        mean = 1.0 / self.embedding_dim
        intializer = tf.random_normal_initializer(mean=mean, stddev=.2*mean)
        self.start_var = tf.Variable(intializer(shape=[1, 1, self.embedding_dim]), 
                                     name='start_var')
        
        # InChI token embedding layers        
        self.inchi_prep = InchiPrep(self.tokenizer_layer, self.vocab_size, 
                                    self.embedding_dim, name='InchiPrep')
        self.embedding_layer = self.inchi_prep.get_layer('Embedding')

        # attentions
        self.encoder_attention = EncoderAttention(decoder_units=self.decoder_units, 
                                                  num_att_elems=self.num_att_vectors,
                                                  name='EncoderAttention')
        self.decoder_attention = DecoderAttention(embedding_dim=self.embedding_dim, 
                                                decoder_units=self.decoder_units, 
                                        #num_att_elems=self.num_att_vectors,
                                                max_len=self.max_len,
                                                use_convolutions=self.use_convolutions,
                                                name='DecoderAttention')   
        self.joint_attention = JointAttention(decoder_units=self.decoder_units, 
                                              num_att_elems=self.num_att_vectors,
                                              max_len=self.max_len,
                                              name='JointAttention') 
        
        # collect params
        self.rnn_input_dim = self.encoder_attention.output_shape[-1]
        beam_input_dim = self.vocab_size
        
        # decoders
        self.start1 = 50  # step to switch to next decoder
        self.decoder_0 = DecoderHead(self.decoder_units, self.vocab_size, 
                                     max_len=self.max_len, name='DecoderHead_0')
        
        if self.use_dual_decoder:
            self.decoder_1 = DecoderHead(self.decoder_units, self.vocab_size, 
                                         max_len=self.max_len, name='DecoderHead_1')
        else:
            self.decoder_1 = self.decoder_0

        # prediction update
        self.beam = BeamUpdate(self.decoder_units, self.beam_rnn_units, beam_input_dim, self.vocab_size, name='beam')


    def encoding_step(self, image, parsed_inchi):
        
        # text encoding
        start_var_batch = tf.tile(self.start_var, [self.batch_size, 1, 1])

        # tokenize and encode InChI
        inchi, targets = self.inchi_prep([parsed_inchi, start_var_batch])
        
        # image encoding
        image = self.image_encoder(image)
        encoder_attention = self.encoder_attention(image)

        return encoder_attention, inchi, targets

    def call(self, inputs, training=False):#, validation=False):        
        # note: dataset provided as (image, image_id, parsed_InChI, InChI)

        # generation step options
        use_beam = False
        use_preds = not training

        # inputs
        image = inputs['image']
        parsed_inchi = inputs['parsed_InChI']
        
        # Encoder
        encoder_attention, inchi, targets = self.encoding_step(image, parsed_inchi)

        # Decoder
        predictions, probs = self.generation_loop(use_preds, use_beam, 
                                                   encoder_attention,
                                                   inchi, targets)
        
        return targets, predictions, probs

    def predict(self, data):
        # note: dataset provided as (image, image_id, parsed_InChI, InChI)

        # get image id (passed through to output)  
        image_id = data['image_id'] 
        
        # generate predictions
        targets, predictions, probs = self(data, training=False)

        # convert back to string
        generated_predictions, generated_predictions_parsed = \
            self.tokens_to_string(predictions) 

        return image_id, generated_predictions

    def train_step(self, data):
        print('start')  # only shows upon function tracing

        # get loss and grads
        with tf.GradientTape() as tape:

            targets, predictions, probs = self(data, training=True)       
            loss = self.compiled_loss(targets, probs)

            # add any regularization losses
            loss += tf.math.reduce_sum(self.losses) * self.regularization_factor

        gradients = tape.gradient(loss, self.trainable_variables) 
        self.optimizer.apply_gradients(zip(gradients, self.trainable_variables))
        
        # update metrics
        self.compiled_metrics.update_state(targets, probs)
        
        return {m.name: m.result() for m in self.metrics}

    # validation step
    def test_step(self, data):
        
        # Compute predictions
        targets, predictions, probs = self(data, training=False)       
        
        # record loss
        self.compiled_loss(targets, probs)

        # Update the metrics.
        self.compiled_metrics.update_state(targets, probs)

        return {m.name: m.result() for m in self.metrics}


    # Full Generation Loop
    def generation_loop(self, use_preds, use_beam, encoder_attention, inchi, target):
        
        # containers
        # note: JIT/XLA not compatible with dynamic size TensorArrays. Update def if using JIT
        tokens_array = tf.TensorArray(size=1, dtype=tf.int64, dynamic_size=True, tensor_array_name='tokens_array')
        probs_array = tf.TensorArray(size=1, dtype=tf.float32, dynamic_size=True, tensor_array_name='probs_array')
        
        # "while loop" function
        def loop_fn(step, continue_cond, inchi_step, tokens_array, probs_array):

            # create mask
            mask = tf.ones((step + 1, step + 1))
            mask = tf.pad(mask, [[0, self.max_len - step - 1], 
                                 [0, self.max_len - step - 1]])
            mask = tf.expand_dims(mask, 0)
            mask = tf.tile(mask, [self.batch_size, 1, 1])

            # attention update
            decoder_att = self.decoder_attention([inchi_step, mask])
            joint_attention = self.joint_attention([encoder_attention, decoder_att])
            
            # get char probs
            # (use correct decoder for position in sequence)
            if tf.math.less(step, self.start1):
                probs = self.decoder_0([joint_attention])  

            else:  # tf.math.less(step, self.start2):
                probs = self.decoder_1([joint_attention])       

            # select current step's probabilities and predictions
            probs = probs[:, step:step+1, :]
            predictions = tf.argmax(probs, axis=-1)

            # save results
            tokens_array = tokens_array.write(step, predictions)
            probs_array = probs_array.write(step, probs)

            # check early stopping criteria
            if use_preds: 
                predictions = tf.expand_dims(predictions, axis=1)
                continue_cond = tf.math.reduce_any(predictions != self.tokenized_EOS)
                predictions = tf.squeeze(predictions, axis=1)
            
            # update continue condition
            if continue_cond:
                continue_cond = tf.math.less(step + 1, self.max_len)
            
            # prepare next input if continuing
            if continue_cond:
        
                step = step + 1
                if use_preds:
                    predictions = self.embedding_layer(predictions)

                    # pad to match inchi_step shape
                    predictions = tf.pad(predictions, 
                        [[0,0], [step, tf.math.maximum(0, self.max_len - step - 1)], [0,0]])

                    inchi_step = predictions + inchi_step
                
            # caution: make sure input is masked during decoder attention!

            return [step, continue_cond, inchi_step, tokens_array, probs_array]

        # stopping condition function
        def cond_fn(step, continue_cond, inchi_step, tokens_array, probs_array):
            return continue_cond

        # generation loop
        step = 0
        continue_cond = True

        inchi_step = inchi  # add step for final prediction
        # note: inchi_step masking done during decoder attention

        step, continue_cond, inchi_step, tokens_array, probs_array \
            = tf.while_loop(
                    cond=cond_fn, 
                    body=loop_fn, 
                    loop_vars=[step, continue_cond, inchi_step, tokens_array, probs_array], 
                    maximum_iterations=self.max_len,                
                    shape_invariants=[tf.TensorShape([]), # step
                                      tf.TensorShape([]), # continue_cond
                                      tf.TensorShape([None, self.max_len, self.embedding_dim]), # inchi_step
                                      None, #tokens_array
                                      None] #probs_array  # attention scores
                    )
        
        # unpack token arrays
        predicted_tokens = tokens_array.stack()  # predicted characters
        predicted_tokens = tf.squeeze(predicted_tokens)
        predicted_tokens = tf.transpose(predicted_tokens, perm=[1, 0])   

        # unpack probs_array (no beam update)
        predicted_probs = probs_array.stack()  # predicted logits
        predicted_probs = tf.squeeze(predicted_probs)
        predicted_probs = tf.transpose(predicted_probs, perm=[1, 0, 2])  

        # (optional) beam update
        if use_beam:
            beam_inputs = predicted_probs

            # pad/crop to uniform length and mask
            mask_value = -1.0

            # pad
            beam_inputs = tf.pad(beam_inputs, constant_values=mask_value,
                                 paddings=([[0, 0], [0, 200], [0, 0]]))
            # crop
            beam_inputs = beam_inputs[:, :self.max_len]
            # mask
            beam_inputs = tf.keras.layers.Masking(mask_value=mask_value)(beam_inputs)

            # create initial RNN state
            initial_state = tf.math.reduce_mean(encoder_attention, axis=1)

            # get probs and predictions
            predicted_probs = self.beam([beam_inputs, initial_state])
            predicted_tokens = tf.argmax(predicted_probs, axis=-1)

        return predicted_tokens, predicted_probs
    
    
    def tokens_to_string(self, tokens):
        parsed_string_vals = self.inverse_tokenizer(tf.constant(tokens))
        string_vals = keras.layers.Lambda(lambda x: tf.strings.reduce_join(x, axis=-1))(parsed_string_vals)

        # remove first EOS generated and everything after
        pattern = ''.join([self.EOS, '.*$'])
        string_vals = tf.strings.regex_replace(string_vals, pattern, rewrite='', 
                                               replace_global=True, name='remove_EOS')   

        return string_vals, parsed_string_vals
    
    def update_max_len(self, new_value):
        self.max_len = new_value

        # update padding on tokenizer and layers with params depending on it
        self.tokenizer_layer, self.inverse_tokenizer, self.tokenized_EOS = \
            Tokenizer(parameters=self.parameters, padded_length=self.max_len)

        self.inchi_prep = InchiPrep(self.tokenizer_layer, self.vocab_size, 
                                    self.embedding_dim, name='InchiPrep')
        
        self.embedding_layer = self.inchi_prep.get_layer('Embedding')
        return None
    
    def update_reg_factor(self, value):
        # note: might need to recompile model before next training
        self.regularization_factor = value

    """  
    # Our Tensorflow metric calculates Levenshtein scores of the tokens.
    # To calculate the true character-level score use this:
    
    !pip install levenshtein
    from leven import levenshtein

    def compute_levenshtein_scores(self, inchi_true, inchi_predicted):
        scores = [levenshtein(pred, orig) for (pred, orig)
                  in zip(inchi_predicted.numpy().tolist(), inchi_true.numpy().tolist())]
        return tf.reduce_mean(scores)
    """

In [None]:
class InchiGenerator(BaseTrainer):
    """
    Beam updates turned on, training conducted using generated preds.
    """

    def __init__(self, base_model, name='BeamInchiTrainer', **kwargs):

        super().__init__(beam_rnn_units=base_model.beam_rnn_units, 
                         attention_dim=base_model.attention_dim,
                         use_dense_encoder_top=base_model.use_dense_encoder_top,
                         use_convolutions=base_model.use_convolutions,
                         use_dual_decoder= base_model.use_dual_decoder,
                         max_len = base_model.max_len,
                         parameters=base_model.parameters, 
                         name=name, **kwargs)
        
    def call(self, inputs, training=False):

        # generation step options
        use_beam = True
        use_preds = True

        # inputs
        # note: dataset provided as (image, image_id, parsed_InChI, InChI)
        image = inputs['image']      
        parsed_inchi = inputs['parsed_InChI']
        
        # Encoder
        encoder_attention, inchi, targets = self.encoding_step(image, parsed_inchi)

        # Decoder
        predictions, probs = self.generation_loop(use_preds, use_beam, 
                                                   encoder_attention,
                                                   inchi, targets)
        
        return targets, predictions, probs

In [None]:
class EditDistanceMetric(tf.keras.metrics.Metric):
    def __init__(self, name='edit_distance', **kwargs):
        super().__init__(name=name, **kwargs)
        self.edit_distance = self.add_weight(name='edit_distance', initializer='zeros')
        self.batch_counter = self.add_weight(name='batch_counter', initializer='zeros')
    
    def update_state(self, y_true, y_pred, sample_weight=None):
        y_true = tf.sparse.from_dense(y_true)
        y_pred = tf.sparse.from_dense(tf.argmax(y_pred, axis=-1))  # convert probs to preds

        # compute edit distance (of parsed tokens)
        edit_distance = tf.edit_distance(y_pred, y_true, normalize=False)
        self.edit_distance.assign_add(tf.reduce_mean(edit_distance))

        self.batch_counter.assign_add(1.)
    
    def result(self):
        return self.edit_distance / self.batch_counter

    def reset_state(self):
        # The state of the metric will be reset at the start of each epoch.
        self.edit_distance.assign(0.0)
        self.batch_counter.assign(0.0)

In [None]:
# Learning rate schedule used in "Attention is All You Need"

class LRScheduleAIAYN(tf.keras.optimizers.schedules.LearningRateSchedule):

    def __init__(self, scale_factor=1, warmup_steps=4000):  # defaults reflect paper's values
        self.warmup_steps = tf.constant(warmup_steps, dtype=tf.float32)
        dim = tf.constant(352, dtype=tf.float32)
        self.scale = scale_factor * tf.math.pow(dim, -1.5)

    def __call__(self, step):
        opt_1 = tf.math.pow(step, -.5)
        opt_2 = step * tf.math.pow(self.warmup_steps, -1.5)
        return self.scale * tf.math.reduce_min([opt_1, opt_2])

## Build Model

Model compile options

In [None]:
NAME_MODIFIER = ''

# build model
ATTENTION_DIM = 208  # note: only used if USE_DENSE_ENCODER_TOP = True.
                     # value from Darien Schettler public Kaggle notebook
BEAM_RNN_UNITS = 128  # note: only used in beam_model
USE_DENSE_ENCODER_TOP = True
USE_CONVOLUTIONS = False
USE_DUAL_DECODERS = False
if USE_CONVOLUTIONS:
    checkpoint_save_name = 'ConvAtt_model_weights' + NAME_MODIFIER
else:
    checkpoint_save_name = 'AISAYN_model_weights' + NAME_MODIFIER

LOAD_CHECKPOINT_FILE = os.path.join(PARAMETERS.load_checkpoint_dir(), checkpoint_save_name, checkpoint_save_name)
SAVE_CHECKPOINT_FILE = os.path.join(PARAMETERS.checkpoint_dir(), checkpoint_save_name, checkpoint_save_name)

Initialize model

In [None]:
# initialize models
for val in train_ds.take(1):
    with PARAMETERS.strategy().scope():  # ("distribution strategy" defined at top of notebook)
        model_base = BaseTrainer(beam_rnn_units=BEAM_RNN_UNITS, # only used in full model
                                 attention_dim=ATTENTION_DIM,  # value used in "All You Need is Attention"
                                 use_dense_encoder_top=USE_DENSE_ENCODER_TOP,
                                 use_convolutions=USE_CONVOLUTIONS,
                                 use_dual_decoder=USE_DUAL_DECODERS,
                                 max_len=175,  # 133 is recommended value from Darien Schettler Kaggle notebook
                                 parameters=PARAMETERS, 
                                 name='BaseTrainer')
        
        # use larger batch size for inference
        # (model needs to be rebuilt to change batch size)
        model_inference = BaseTrainer(beam_rnn_units=BEAM_RNN_UNITS, # only used in full model
                                 attention_dim=ATTENTION_DIM,  # value used in "All You Need is Attention"      
                                 use_dense_encoder_top=USE_DENSE_ENCODER_TOP,                                
                                 use_convolutions=USE_CONVOLUTIONS,
                                 use_dual_decoder=USE_DUAL_DECODERS,
                                 max_len=175,  # 133 is recommended value from Darien Schettler Kaggle notebook
                                 parameters=PARAMETERS, 
                                 name='BaseInference')
        
        """ 
        # Beam-update model
        model_beam = InchiGenerator(model_base, name='InchiGenerator_inference')  
        """

    print('Models initialized.')

# build / verify function call work
for val in train_ds.take(1):
    model_base(val, training=False)
    model_base(val, training=True)
    model_base.predict(val)

    """
    model_beam(val, training=False)
    model_beam(val, training=True)
    model_beam.predict(val)
    """

#for val2 in train_ds.unbatch().batch(PARAMETERS.inference_batch_size()).take(1):
#    model_inference(val2)

# sync weights
# WARNING!: in Kaggle this loads from prev session saved weights
try:
    #model_base.load_weights(LOAD_CHECKPOINT_FILE)  
    #model_inference.load_weights(LOAD_CHECKPOINT_FILE)
except:
    print('No weights loaded')

print('\n\n')    
model_base.summary()

Models initialized.


Exception ignored in: <function IteratorResourceDeleter.__del__ at 0x7f8829c3d830>
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/data/ops/iterator_ops.py", line 546, in __del__
    handle=self._handle, deleter=self._deleter)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/ops/gen_dataset_ops.py", line 1264, in delete_iterator
    _ctx, "DeleteIterator", name, handle, deleter)
KeyboardInterrupt: 


KeyboardInterrupt: ignored

In [None]:
#%tensorboard --logdir logs

Test inference speed

In [None]:
#"""
# Base Model: inference speed (no beam)

%%timeit

num_batches = 3  # processing time for 3 * 256 = 768 images
for val2 in train_ds.unbatch().batch(PARAMETERS.inference_batch_size()).take(1):
    im_id, preds = (model_inference.predict(val))
#"""

In [None]:
"""
# Full model: inference speed (with beam)
%%timeit
num_batches = 3

for val in train_ds.unbatch().batch(PARAMETERS.inference_batch_size()).take(num_batches): 
    im_id, preds = (model_base.predict(val))
"""

# Training

In [None]:
# learning rate scheduler

learning_rate = LRScheduleAIAYN(scale_factor=1, warmup_steps=4000)
#learning_rate = 1e-3

# optimizer
optimizer = tf.keras.optimizers.Adam(learning_rate=learning_rate,
                    beta_1=0.9, beta_2=0.98, epsilon=10e-9)  # params from "Attention is All You Need" paper
optimizer = tf.keras.mixed_precision.LossScaleOptimizer(optimizer)  # speeds training on GPU

# callbacks
checkpoint = tf.keras.callbacks.ModelCheckpoint(CHECKPOINT_FILE, monitor='loss', 
                    save_weights_only=True, save_best_only=True, save_freq='epoch')

#tensorboard = tf.keras.callbacks.TensorBoard(log_dir='./logs/')

Train base model

In [None]:
# Train base model (teacher-fed training, prediction-fed validation, no beam update)

# Note: if training decoders segments, initialize second decoder weights using:
# "model_base.decoder_1.set_weights(model_base.decoder_0.get_weights())"

steps_per_epoch = 150
epochs = len(train_labels_df) // steps_per_epoch  # one full pass through the dataset

# choose variables to train
model_base.image_encoder.get_layer('dense').trainable=True
model_base.decoder_0.trainable = True
#model_base.decoder_1.trainable = True

# compile
model_base.compile(optimizer=optimizer, 
            loss=keras.losses.SparseCategoricalCrossentropy(from_logits=False),
            metrics=['sparse_categorical_accuracy', EditDistanceMetric()])

# train
model_base.fit(train_ds, 
               epochs=epochs, steps_per_epoch=steps_per_epoch, 
               validation_data=valid_ds, validation_steps=10, validation_freq=6,
               callbacks=[checkpoint],#, tensorboard], 
               verbose=2, use_multiprocessing=True)

Train beam update model

In [None]:
"""
# train beam model (prediction-fed training and inference, includes beam update mech)

steps_per_epoch = 150
epochs = len(train_labels_df) // steps_per_epoch  # one full pass through the dataset

# sync weights
model_beam.load_weights(PARAMETERS.load_checkpoint_dir() + 'checkpoints')
print('Loaded saved weights')

# choose variables to train
model_base.decoder_0.trainable = True
model_base.decoder_1.trainable = True  # if multiple decoders enabled

# compile
model_beam.compile(optimizer=optimizer, 
              loss=keras.losses.SparseCategoricalCrossentropy(from_logits=False),
              metrics=['sparse_categorical_accuracy', EditDistanceMetric()])

# train
model_beam.fit(train_ds, epochs=30, steps_per_epoch=20, 
               validation_data=valid_ds, validation_steps=3, validation_freq=5,
               callbacks=[checkpoint],#, tensorboard], 
               verbose=2, use_multiprocessing=True)
"""

# Inference

Here we define function to conduct inference on the test set. Results are saved to "submission.csv".

Intermediate results are saved at regular intervals to. This allows inference to be conducted in stages and is a safeguard in case of interruptions before the full set has been processed. 

In [None]:
def make_inference_progress(predictions_df, skip_recorded=True, save_freq=50, parameters=PARAMETERS):

    # initialize model and build with inference batch size (test_ds)
    model_base = BaseTrainer(embedding_dim=EMBEDDING_DIM, rnn_units=RNN_DIM, 
                             beam_rnn_units=BEAM_RNN_UNITS, 
                             parameters=PARAMETERS, name='BaseTrainer')
    for val in test_ds.take(1):
        model_base(val)
    model_full = InchiGenerator(model_base, name='InchiGenerator')
    for val in test_ds.take(1):
        model_full(val)

    # load saved weights
    model_full.load_weights(PARAMETERS.load_checkpoint_dir() + 'checkpoints')
    print('Loaded model')

    batch_size = 1024
    if skip_recorded:
        existing_batches = len(predictions_df) // batch_size
    else:
        existing_batches = 0
    i = 0

    for val in test_ds.skip(existing_batches):
        test_im_id, test_pred = model_full.predict(val, return_lev_score=False)

        # decode bytestrings
        test_im_id = [x.decode()[:-4] for x in test_im_id.numpy().tolist()]  # drops '.png'
        test_pred = [x.decode() for x in test_pred.numpy().tolist()]

        new_preds = pd.DataFrame({'image_id': test_im_id, 
                                  'InChI': test_pred})

        
        predictions_df = predictions_df.append(new_preds)

        # save to CSV
        if i % save_freq == 0:
            predictions_df = predictions_df.drop_duplicates(subset='image_id', keep='last')
            predictions_df.to_csv(PARAMETERS.csv_save_dir() + 'submission.csv', index=False)
            print(f'iteration {i}')

        i += 1

    return predictions_df

Load previosuly generated predictions

In [None]:
try:
    predictions_df = pd.read_csv(PARAMETERS.csv_save_dir() + 'submission.csv')
except:
    predictions_df = pd.DataFrame({'image_id':[], 'InChI':[]}, dtype=str)

Generate additional predictions

In [None]:
""" On first pass or to start from scratch, initialize the dataframe with:
predictions_df = pd.DataFrame({'image_id':[], 'InChI':[]}, dtype=str)
"""

predictions_df = make_inference_progress(predictions_df, save_freq=100, num_batches=1, starting_batch=0, parameters=PARAMETERS)
predictions_df