<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><ul class="toc-item"><li><span><a href="#Zero-shot-learning" data-toc-modified-id="Zero-shot-learning-0.1">Zero-shot learning</a></span></li></ul></li><li><span><a href="#Data" data-toc-modified-id="Data-1">Data</a></span><ul class="toc-item"><li><span><a href="#Download-word2vec" data-toc-modified-id="Download-word2vec-1.1">Download word2vec</a></span></li><li><span><a href="#Download-cifar10" data-toc-modified-id="Download-cifar10-1.2">Download cifar10</a></span></li><li><span><a href="#Train-test-split" data-toc-modified-id="Train-test-split-1.3">Train-test split</a></span></li><li><span><a href="#Label-embeddings" data-toc-modified-id="Label-embeddings-1.4">Label embeddings</a></span></li></ul></li><li><span><a href="#Model" data-toc-modified-id="Model-2">Model</a></span><ul class="toc-item"><li><span><a href="#Pretrained-VGG19" data-toc-modified-id="Pretrained-VGG19-2.1">Pretrained VGG19</a></span></li><li><span><a href="#Embeddings-prediction-model" data-toc-modified-id="Embeddings-prediction-model-2.2">Embeddings prediction model</a></span></li></ul></li><li><span><a href="#Train-model" data-toc-modified-id="Train-model-3">Train model</a></span><ul class="toc-item"><li><span><a href="#Most-similar" data-toc-modified-id="Most-similar-3.1">Most similar</a></span></li></ul></li></ul></div>

## Zero-shot learning

Some links:
* [DeViSE: A Deep Visual-Semantic Embedding Model](https://papers.nips.cc/paper/2013/file/7cce53cf90577442771720a370c3c723-Paper.pdf)
* [DeViSE Zero-shot learning](https://towardsdatascience.com/devise-zero-shot-learning-c62eed17e93d)
* [Zero-shot learning research by Max Planck Institute](https://www.mpi-inf.mpg.de/departments/computer-vision-and-machine-learning/research/zero-shot-learning/)
* [Zero-shot Learning: An Introduction](https://www.learnopencv.com/zero-shot-learning-an-introduction/)

In [7]:
import tensorflow as tf
from tensorflow import keras
import numpy as np

tf.__version__, keras.__version__

('2.0.0', '2.2.4-tf')

# Data

## Download word2vec

In [2]:
# Run load only once and store data on a disk

# import gensim.downloader as api
# wv = api.load('word2vec-google-news-300')
# wv.save("word2vec-google-news-300.model")

In [3]:
from gensim.models.keyedvectors import Word2VecKeyedVectors

# load word2vec model
wv = Word2VecKeyedVectors.load("word2vec-google-news-300.model")

## Download cifar10

In [5]:
# Load cifar 10 dataset which contains images with 10 classes

data = tf.keras.datasets.cifar10.load_data()

(train_images, train_labels), (test_images, test_labels) = data

In [8]:
class_names = [
    'airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse',
    'ship', 'truck'
]

list(zip(class_names, np.unique(train_labels)))

[('airplane', 0),
 ('automobile', 1),
 ('bird', 2),
 ('cat', 3),
 ('deer', 4),
 ('dog', 5),
 ('frog', 6),
 ('horse', 7),
 ('ship', 8),
 ('truck', 9)]

## Train-test split

Split data by labels with test set containing only unseen labels

In [9]:
split_label = 'frog'
split_idx = class_names.index(split_label)

X_train = train_images[(train_labels <= split_idx).ravel()]
y_train = train_labels[train_labels <= split_idx]
print(f'Train features: {X_train.shape}, labels: {y_train.shape}')

X_test = test_images[(test_labels > split_idx).ravel()]
y_test = test_labels[test_labels > split_idx]
print(f'Test features: {X_test.shape}, labels: {y_test.shape}')

Train features: (35000, 32, 32, 3), labels: (35000,)
Test features: (3000, 32, 32, 3), labels: (3000,)


## Label embeddings 

In [10]:
# Use word2vec model to translate labels (strings) to vectors of numbers
y_train = np.array([wv[class_names[y]] for y in y_train])
y_test = np.array([wv[class_names[y]] for y in y_test])

# Model

## Pretrained VGG19

In [11]:
# set include_top to False to use different input image size

vgg19 = tf.keras.applications.VGG19(
    weights='imagenet', include_top=False, input_shape=(32, 32, 3))

## Embeddings prediction model

In [12]:
def img_to_text_embeddings(img_model, num_classes):
    """Extend VGG19 by adding few fully connected layers and transform
    the model to regressor
    """
    
    model = keras.models.Sequential(img_model.layers[:-1])
    
    model.add(keras.layers.Flatten(input_shape=[32, 32]))
    
    # use BatchNormalization as regularization and to avoid additional normalization
    model.add(keras.layers.BatchNormalization())
    model.add(keras.layers.Dense(256, activation='relu'))
    model.add(keras.layers.BatchNormalization())
    model.add(keras.layers.Dense(num_classes, activation='softmax'))

    return model

In [13]:
model = img_to_text_embeddings(vgg19, 300)

model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
block1_conv1 (Conv2D)        (None, 32, 32, 64)        1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 32, 32, 64)        36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 16, 16, 64)        0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 16, 16, 128)       73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 16, 16, 128)       147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 8, 8, 128)         0         
_________________________________________________________________
block3_conv1 (Conv2D)        (None, 8, 8, 256)         2

# Train model

In [14]:
model.compile(
    # important part - we want to minimize simularity
    # between word vectors
    loss=keras.losses.cosine_similarity,
    optimizer=keras.optimizers.Adam(),
    metrics=['mean_absolute_percentage_error',
             'mean_absolute_error']
)

In [28]:
# use a small subset of the data
history = model.fit(X_train[:1000], y_train[:1000], epochs=5)
# history = model.fit(X_train, y_train, epochs=5)

Train on 1000 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


In [18]:
# model.evaluate(X_test, y_test)

In [29]:
pred = model.predict(X_train[:100].astype(np.float32))

In [30]:
# predicted embeddings
pred

array([[2.39669716e-05, 7.04840249e-06, 2.39799672e-04, ...,
        6.51157461e-06, 4.89254540e-04, 7.63902699e-06],
       [2.97556422e-03, 5.59292792e-04, 2.31011261e-04, ...,
        1.58765574e-03, 3.93970491e-04, 8.67550552e-04],
       [9.26235954e-36, 2.70010446e-33, 2.36223104e-06, ...,
        0.00000000e+00, 7.44081126e-06, 2.64058297e-36],
       ...,
       [1.57814687e-19, 1.24880346e-18, 5.17695953e-05, ...,
        1.56849365e-22, 1.34593967e-04, 4.70326721e-20],
       [1.52692127e-27, 7.26766212e-26, 1.27805988e-05, ...,
        7.17907902e-32, 3.65746091e-05, 4.44827465e-28],
       [1.10308835e-02, 2.56429776e-03, 5.96170066e-05, ...,
        1.26234954e-02, 5.57754320e-05, 4.33210330e-03]], dtype=float32)

## Most similar

Find most similar embeddings. These embeddings are candidate labels for an image with unseen label.

In [38]:
# similar = wv.most_similar(pred[0])