# Training and testing of unimodal speech and vision models

**Author:** Ryan Eloff<br>
**Contact:** ryan.peter.eloff@gmail.com<br>
**Date:** October 2018

Experiments notebook 1.

## Overview

We approach the *multimodal one-shot learning* problem by extending unimodal one-shot models to the multimodal case. 
Using speech and image background data (disjoint from the one-shot problem), we separately train unimodal speech and vision models that can perform one-shot classification in their respective modality.

This notebook demonstrates how to train and test these unimodal one-shot models, and reproduces the one-shot speech (and image) classification results presented in [our paper](https://arxiv.org/abs/1811.03875): 
R. Eloff, H. A. Engelbrecht, H. Kamper, "Multimodal One-Shot Learning of Speech and Images," 2018.

## Navigation

1. [Generate random model seeds](#seeds)<br>
2. [Unimodal one-shot speech models](#speech)<br>
    2.1. [Hyperparameters](#speech_params)<br>
    2.2. [Background training](#speech_train)<br>
    2.3. [One-shot speech testing](#speech_test)<br>
    2.4. [Summaries](#speech_summ)<br>
3. [Unimodal one-shot vision models](#vision)<br>
    3.1. [Hyperparameters](#vision_params)<br>
    3.2. [Background training](#vision_train)<br>
    3.3. [One-shot vision testing](#vision_test)<br>
    3.4. [Summaries](#vision_summ)<br>

### Imports:

In [None]:
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function


import os
import json
import datetime
import numpy as np

### Utility functions:

In [None]:
def test_speech_k_shot(model_dir, out_dir, random_seed, k_shot):
    print("--------------------------------------------------------------------------------")
    print("Testing speech model: --model-dir={}".format(model_dir))
    print("--------------------------------------------------------------------------------")
    !python ../src/multimodal/test_speech.py \
        --data-dir=../kaldi_features/tidigits \
        --model-dir={model_dir} \
        --output-dir={out_dir} \
        --random-seed={random_seed} \
        --n-queries=10 \
        --n-test-episodes=400 \
        --k-shot={k_shot} \
        --l-way=11


def test_speaker_invariance(model_dir, out_dir, random_seed):
    print("--------------------------------------------------------------------------------")
    print("Testing speech model: --model-dir={}".format(model_dir))
    print("--------------------------------------------------------------------------------")
    !python ../src/multimodal/test_speech.py \
        --data-dir=../kaldi_features/tidigits \
        --model-dir={model_dir} \
        --output-dir={out_dir} \
        --random-seed={random_seed} \
        --originator-type='difficult' \
        --n-queries=1 \
        --n-test-episodes=4000 \
        --k-shot=1 \
        --l-way=11


def test_vision_k_shot(model_dir, out_dir, random_seed, k_shot):
    print("--------------------------------------------------------------------------------")
    print("Testing vision model: --model-dir={}".format(model_dir))
    print("--------------------------------------------------------------------------------")
    !python ../src/multimodal/test_vision.py \
        --model-dir={model_dir} \
        --output-dir={out_dir} \
        --random-seed={random_seed} \
        --n-queries=10 \
        --n-test-episodes=400 \
        --k-shot={k_shot} \
        --l-way=10


def get_all_dirs(top_dir):
    all_dirs = [os.path.join(top_dir, file)
                  for file in os.listdir(top_dir)
                  if os.path.isdir(os.path.join(top_dir, file))]
    return all_dirs


def summarise_tests(result_dir, result_file='test_result.txt', speaker_invariance=False):
    result_dir = get_all_dirs(result_dir)
    overall_results = []
    easy_overall_results = []
    dist_overall_results = []
    for root, subdirs, files in os.walk(result_dir):
        subdirs.sort()
        for dirname in subdirs:
            res_file = os.path.join(root, dirname, result_file)
            if os.path.isfile(res_file):
                print("--------------------------------------------------------------------------------")
                print("Model summary: directory={}".format(os.path.join(root, dirname)))
                print("--------------------------------------------------------------------------------")
                with open(res_file, 'r') as fp:
                    results = fp.read()
                print('\tResults: {}'.format(results))
                overall_results.append(float(results.split('\n')[0].split('accuracy: ')[1]))
                if speaker_invariance:
                    invariance_results = results.split('\n')[1].strip().split('\t')
                    easy_overall_results.append(float(invariance_results[0].split('accuracy: ')[1]))
                    dist_overall_results.append(float(invariance_results[1].split('accuracy: ')[1]))
    conf_interval_95 = 1.96 * np.std(overall_results) / np.sqrt(len(overall_results))
    easy_conf_interval_95 = 1.96 * np.std(easy_overall_results) / np.sqrt(len(easy_overall_results))
    dist_conf_interval_95 = 1.96 * np.std(dist_overall_results) / np.sqrt(len(dist_overall_results))
    print("================================================================================")
    print("OVERALL: AVERAGE ACCURACY: {:.4f} % +- {:.4f} (total tests: {})"
          .format(np.mean(overall_results)*100, conf_interval_95*100, len(overall_results)))
    if speaker_invariance:
        print("\tAVERAGE EASY SPEAKER ACCURACY: {:.4f} % +- {:.4f} (total tests: {})"
              .format(np.mean(easy_overall_results)*100, easy_conf_interval_95*100, len(easy_overall_results)))
        print("\tAVERAGE DISTRACTOR SPEAKER ACCURACY: {:.4f} % +- {:.4f} (total tests: {})"
              .format(np.mean(dist_overall_results)*100, dist_conf_interval_95*100, len(dist_overall_results)))
    print("--------------------------------------------------------------------------------")

    
def summarise_train_time(result_dir, log_file='train_speech.log'):
    result_dir = get_all_dirs(result_dir)
    all_results = []
    n_train_results = 0
    for root, subdirs, files in os.walk(result_dir):
        subdirs.sort()
        for dirname in subdirs:
            next_file = os.path.join(root, dirname, log_file)
            if os.path.isfile(next_file):
                with open(next_file, 'r') as fp:
                    log_lines = fp.readlines()
                    start_time = datetime.datetime.strptime(log_lines[0].split(":INFO:")[0], "%Y-%m-%d %H:%M:%S,%f")
                    end_time = datetime.datetime.strptime(log_lines[-1].split(":INFO:")[0], "%Y-%m-%d %H:%M:%S,%f")
                    train_minutes = (end_time - start_time).total_seconds()/60.0
                    print(end_time - start_time)
                    all_results.append(train_minutes)
    total_results = all_results[0]
    for i in range(1, len(all_results)):
        total_results += all_results[i]
    conf_interval_95 = 1.96 * np.std(all_results) / np.sqrt(len(all_results))
    print("Average train duration: {:.4f} min. +- {:.4f}".format(np.mean(all_results), conf_interval_95))

## 1. Generate random model seeds
<a id='seeds'></a>

We average results over 10 models trained with different seeds so that we can report average accuracies with 95% confidence intervals.

These seeds are generated as follows:

In [None]:
np.random.seed(42)
random_seeds = np.random.randint(1000, size=10)
print("Random seeds:", random_seeds)

## 2. Unimodal one-shot speech models
<a id='speech'></a>

Several different models can be used for one-shot speech classification.
We specifically investigate Siamese neural networks (and their advances) as a way to explicitly train unimodal distance metrics.
We also compare to directly using extracted speech features with classical dynamic time warping (DTW) and to transfer learning with neural network classifiers.

## 2.1. Hyperparameters
<a id='speech_params'></a>

The following hyperparameters were used to produce the (speech) results in the paper:

In [None]:
# Training options:
learn_rate = 1e-3
decay_rate = 0.96

# Network architectures:
kernel_sizes = [[39, 9], [1, 10]]
pool_sizes = [[1, 3], [1, 28]]
n_filters = [128, 128]
n_hidden_units = [2048]
n_linear_units = None
ffnn_n_hidden_units = [512, 512, 512]

# Regularization:
dropout_keep_prob = 0.9
dropout_channels = True
conv_batch_norm = 'after'
hidden_batch_norm = 'after'

# Batching:
balanced_batching = True
p_batch = 128  # 32 for Siamese CNN (offline)
k_batch = 8  # 2 for Siamese CNN (offline)
ffnn_batch_size = 200
cnn_batch_size = 200

# Feature selection:
feats_type = 'mfcc'
center_padded = True
n_padded = 120

# Early stopping on one-shot validation error:
l_way_validation = 20

# Siamese CNN parameters:
triplet_margin = 0.2
# Train episodes based on ~88400 examples and ~5500 classes (digits removed)
# - Siamese CNN (offline): choose 5500/32 ~ 200 (each epoch processes 200*32*2=12800 examples)
# - Siamese CNN (online): choose 5500/128 ~ 50 (each epoch processes 50*128*8=51200 examples)
# NOTE: offline variant takes long to train; online is more efficient and reaches SOTA in fewer epochs
offline_n_train_episodes = 200
online_n_train_episodes = 50

## 2.2. Background training
<a id='speech_train'></a>

Neural network models are trained on a large set of isolated words from background speech data ([Flickr 8k audio caption corpus](https://groups.csail.mit.edu/sls/downloads/flickraudio/); all overlapping one-shot test words removed), learning features which are useful for measuring similarity between inputs (even on unseen classes).

### Dynamic Time Warping (DTW)

No training for DTW, just create directories containing model options for test script.

In [None]:
for dtw_feats_type in ['mfcc', 'fbank']:
    for random_seed in random_seeds:
        model_dir = os.path.join("./models/speech/dtw", dtw_feats_type, "random_seed={}".format(random_seed))
        print("-----------------------------------------------------------------------------------")
        print("Create DTW speech model params: --random-seed={} --feats-type={}".format(random_seed, dtw_feats_type))
        print("-----------------------------------------------------------------------------------")
        # Create DTW model parameters
        !python ../src/multimodal/train_speech.py \
            --model-version=dtw \
            --data-dir=../kaldi_features/flickr_audio \
            --model-dir={model_dir}\
            --no-unique-dir \
            --n-max-epochs=100 \
            --random-seed={random_seed} \
            --feats-type={dtw_feats_type}

### Feedforward Neural Network (FFNN) Softmax Classifier

In [None]:
for random_seed in random_seeds:
    model_dir = "./models/speech/ffnn_softmax/batch_size={}/random_seed={}".format(ffnn_batch_size,
                                                                                   random_seed)
    print("-----------------------------------------------------------------------------------")
    print("Training FFNN classifier speech model: --random-seed={}".format(random_seed))
    print("-----------------------------------------------------------------------------------")
    # Save base params file in model directory
    !python ../src/multimodal/train_speech.py \
        --model-version=feedforward_softmax \
        --model-dir={model_dir} \
        --no-unique-dir \
        --save-base-params
    # Load model params from file
    with open(os.path.join(model_dir, 'model_params.json'), 'r') as fp:
        model_params = json.load(fp)
    # Set new model params
    model_params['n_hidden_units'] = ffnn_n_hidden_units
    model_params['n_linear_units'] = None  # automatically adds output logits
    model_params['dropout_keep_prob'] = dropout_keep_prob
    model_params['hidden_batch_norm'] = hidden_batch_norm
    # Save updated model params to file
    with open(os.path.join(model_dir, 'model_params.json'), 'w') as fp:
        json.dump(model_params, fp, indent=4)
    # Train ffnn clasifier speech model on updated model params
    !python ../src/multimodal/train_speech.py \
        --model-version=feedforward_softmax \
        --data-dir=../kaldi_features/flickr_audio \
        --model-dir={model_dir}\
        --no-unique-dir \
        --n-max-epochs=100 \
        --random-seed={random_seed} \
        -lr={learn_rate} \
        -dr={decay_rate} \
        --batch-size={ffnn_batch_size} \
        --center-padded \
        --n-padded={n_padded} \
        --feats-type={feats_type} \
        --k-shot=1 \
        --l-way={l_way_validation}

### Convolutional Neural Network (CNN) Softmax Classifier

In [None]:
for random_seed in random_seeds:
    model_dir = "./models/speech/cnn_softmax/batch_size={}/random_seed={}".format(cnn_batch_size,
                                                                                  random_seed)
    print("-----------------------------------------------------------------------------------")
    print("Training CNN classifier speech model: --random-seed={}".format(random_seed))
    print("-----------------------------------------------------------------------------------")
    # Save base params file in model directory
    !python ../src/multimodal/train_speech.py \
        --model-version=convolutional_softmax \
        --model-dir={model_dir} \
        --no-unique-dir \
        --save-base-params
    # Load model params from file
    with open(os.path.join(model_dir, 'model_params.json'), 'r') as fp:
        model_params = json.load(fp)
    # Set new model params
    model_params['kernel_sizes'] = kernel_sizes
    model_params['pool_sizes'] = pool_sizes
    model_params['n_filters'] = n_filters
    model_params['n_hidden_units'] = n_hidden_units
    model_params['n_linear_units'] = None  # automatically adds output logits
    model_params['dropout_keep_prob'] = dropout_keep_prob
    model_params['dropout_channels'] = dropout_channels
    model_params['conv_batch_norm'] = conv_batch_norm
    model_params['hidden_batch_norm'] = hidden_batch_norm
    # Save updated model params to file
    with open(os.path.join(model_dir, 'model_params.json'), 'w') as fp:
        json.dump(model_params, fp, indent=4)
    # Train cnn clasifier speech model on updated model params
    !python ../src/multimodal/train_speech.py \
        --model-version=convolutional_softmax \
        --data-dir=../kaldi_features/flickr_audio \
        --model-dir={model_dir}\
        --no-unique-dir \
        --n-max-epochs=100 \
        --random-seed={random_seed} \
        -lr={learn_rate} \
        -dr={decay_rate} \
        --batch-size={cnn_batch_size} \
        --center-padded \
        --n-padded={n_padded} \
        --feats-type={feats_type} \
        --k-shot=1 \
        --l-way={l_way_validation}

### Siamese CNN (offline)

In [None]:
# Adjust k_batch and p_batch since Siamese offline variant hits memory limit
# on a single Titan Xp GPU with larger batch sizes ...
p_batch_adjust = 32  # wide spread of concepts per batch
k_batch_adjust = 2   # few concept examples per batch

In [None]:
for random_seed in random_seeds:
    model_dir = "./models/speech/siamese_offline/n_train={}/random_seed={}".format(offline_n_train_episodes,
                                                                                   random_seed)
    print("-----------------------------------------------------------------------------------")
    print("Training Siamese CNN (offline) speech model: --random-seed={}".format(random_seed))
    print("-----------------------------------------------------------------------------------")
    # Save base params file in model directory
    !python ../src/multimodal/train_speech.py \
        --model-version=siamese_triplet \
        --model-dir={model_dir} \
        --no-unique-dir \
        --save-base-params
    # Load model params from file
    with open(os.path.join(model_dir, 'model_params.json'), 'r') as fp:
        model_params = json.load(fp)
    # Set new model params
    model_params['kernel_sizes'] = kernel_sizes
    model_params['pool_sizes'] = pool_sizes
    model_params['n_filters'] = n_filters
    model_params['n_hidden_units'] = n_hidden_units
    model_params['n_linear_units'] = n_linear_units
    model_params['dropout_keep_prob'] = dropout_keep_prob
    model_params['dropout_channels'] = dropout_channels
    model_params['conv_batch_norm'] = conv_batch_norm
    model_params['hidden_batch_norm'] = hidden_batch_norm
    model_params['triplet_margin'] = triplet_margin
    # Save updated model params to file
    with open(os.path.join(model_dir, 'model_params.json'), 'w') as fp:
        json.dump(model_params, fp, indent=4)
    # Train siamese offline speech model on updated model params
    !python ../src/multimodal/train_speech.py \
        --model-version=siamese_triplet \
        --data-dir=../kaldi_features/flickr_audio \
        --model-dir={model_dir}\
        --no-unique-dir \
        --n-max-epochs=100 \
        --n-train-episodes={offline_n_train_episodes} \
        --random-seed={random_seed} \
        -lr={learn_rate} \
        -dr={decay_rate} \
        --balanced-batching \
        --p-batch={p_batch_adjust} \
        --k-batch={k_batch_adjust} \
        --center-padded \
        --n-padded={n_padded} \
        --feats-type={feats_type} \
        --k-shot=1 \
        --l-way={l_way_validation}

### Siamese CNN (online)

In [None]:
for random_seed in random_seeds:
    model_dir = "./models/speech/siamese_online/n_train={}/random_seed={}".format(online_n_train_episodes,
                                                                                   random_seed)
    print("-----------------------------------------------------------------------------------")
    print("Training Siamese CNN (online) speech model: --random-seed={}".format(random_seed))
    print("-----------------------------------------------------------------------------------")
    # Save base params file in model directory
    !python ../src/multimodal/train_speech.py \
        --model-version=siamese_triplet_online \
        --model-dir={model_dir} \
        --no-unique-dir \
        --save-base-params
    # Load model params from file
    with open(os.path.join(model_dir, 'model_params.json'), 'r') as fp:
        model_params = json.load(fp)
    # Set new model params
    model_params['kernel_sizes'] = kernel_sizes
    model_params['pool_sizes'] = pool_sizes
    model_params['n_filters'] = n_filters
    model_params['n_hidden_units'] = n_hidden_units
    model_params['n_linear_units'] = n_linear_units
    model_params['dropout_keep_prob'] = dropout_keep_prob
    model_params['dropout_channels'] = dropout_channels
    model_params['conv_batch_norm'] = conv_batch_norm
    model_params['hidden_batch_norm'] = hidden_batch_norm
    model_params['triplet_margin'] = triplet_margin
    # Save updated model params to file
    with open(os.path.join(model_dir, 'model_params.json'), 'w') as fp:
        json.dump(model_params, fp, indent=4)
    # Train siamese online speech model on updated model params
    !python ../src/multimodal/train_speech.py \
        --model-version=siamese_triplet_online \
        --data-dir=../kaldi_features/flickr_audio \
        --model-dir={model_dir}\
        --no-unique-dir \
        --n-max-epochs=100 \
        --n-train-episodes={online_n_train_episodes} \
        --random-seed={random_seed} \
        -lr={learn_rate} \
        -dr={decay_rate} \
        --balanced-batching \
        --p-batch={p_batch} \
        --k-batch={k_batch} \
        --center-padded \
        --n-padded={n_padded} \
        --feats-type={feats_type} \
        --k-shot=1 \
        --l-way={l_way_validation}

## 2.3. One-shot speech testing
<a id='speech_test'></a>

We now test the trained unimodal speech models on three tasks on the [TIDigits speech corpus](https://catalog.ldc.upenn.edu/LDC93S10):

1. One-shot 11-way spoken digit classification
2. Five-shot 11-way spoken digit classification
3. Speaker invariance for one-shot 11-way spoken digit classification in the presence of query speaker distractors

### Dynamic Time Warping (DTW)

1. One-shot 11-way spoken digit classification:

In [None]:
output_dir = "./results/speech/dtw/1_shot"
for dtw_feats_type in ['mfcc', 'fbank']:
    for random_seed in random_seeds:
        model_dir = os.path.join("./models/speech/dtw", dtw_feats_type, "random_seed={}".format(random_seed))
        out_dir = os.path.join(output_dir, dtw_feats_type, 'random_seed={}'.format(random_seed))
        test_speech_k_shot(model_dir, out_dir, random_seed, k_shot=1)

2. Five-shot 11-way spoken digit classification

In [None]:
output_dir = "./results/speech/dtw/5_shot"
for dtw_feats_type in ['mfcc', 'fbank']:
    for random_seed in random_seeds:
        model_dir = os.path.join("./models/speech/dtw", dtw_feats_type, "random_seed={}".format(random_seed))
        out_dir = os.path.join(output_dir, dtw_feats_type, 'random_seed={}'.format(random_seed))
        test_speech_k_shot(model_dir, out_dir, random_seed, k_shot=5)

3. Speaker invariance for one-shot 11-way spoken digit classification in the presence of query speaker distractors

In [None]:
output_dir = "./results/speech/dtw/speaker_invariance"
for dtw_feats_type in ['mfcc', 'fbank']:
    for random_seed in random_seeds:
        model_dir = os.path.join("./models/speech/dtw", dtw_feats_type, "random_seed={}".format(random_seed))
        out_dir = os.path.join(output_dir, dtw_feats_type, 'random_seed={}'.format(random_seed))
        test_speaker_invariance(model_dir, out_dir, random_seed)

### Feedforward Neural Network (FFNN) Softmax Classifier

1. One-shot 11-way spoken digit classification:

In [None]:
output_dir = "./results/speech/ffnn_softmax/1_shot/batch_size={}/".format(ffnn_batch_size)
for random_seed in random_seeds:
    model_dir = "./models/speech/ffnn_softmax/batch_size={}/random_seed={}".format(ffnn_batch_size,
                                                                                   random_seed)
    out_dir = os.path.join(output_dir, 'random_seed={}'.format(random_seed))
    test_speech_k_shot(model_dir, out_dir, random_seed, k_shot=1)

2. Five-shot 11-way spoken digit classification

In [None]:
output_dir = "./results/speech/ffnn_softmax/5_shot/batch_size={}/".format(ffnn_batch_size)
for random_seed in random_seeds:
    model_dir = "./models/speech/ffnn_softmax/batch_size={}/random_seed={}".format(ffnn_batch_size,
                                                                                   random_seed)
    out_dir = os.path.join(output_dir, 'random_seed={}'.format(random_seed))
    test_speech_k_shot(model_dir, out_dir, random_seed, k_shot=5)

3. Speaker invariance for one-shot 11-way spoken digit classification in the presence of query speaker distractors

In [None]:
output_dir = "./results/speech/ffnn_softmax/speaker_invariance/batch_size={}".format(ffnn_batch_size)
for random_seed in random_seeds:
    model_dir = "./models/speech/ffnn_softmax/batch_size={}/random_seed={}".format(ffnn_batch_size,
                                                                                   random_seed)
    out_dir = os.path.join(output_dir, 'random_seed={}'.format(random_seed))
    test_speaker_invariance(model_dir, out_dir, random_seed)

### Convolutional Neural Network (CNN) Softmax Classifier

1. One-shot 11-way spoken digit classification:

In [None]:
output_dir = "./results/speech/cnn_softmax/1_shot/batch_size={}".format(cnn_batch_size)
for random_seed in random_seeds:
    model_dir = "./models/speech/cnn_softmax/batch_size={}/random_seed={}".format(cnn_batch_size, random_seed)
    out_dir = os.path.join(output_dir, 'random_seed={}'.format(random_seed))
    test_speech_k_shot(model_dir, out_dir, random_seed, k_shot=1)

2. Five-shot 11-way spoken digit classification

In [None]:
output_dir = "./results/speech/cnn_softmax/5_shot/batch_size={}".format(batch_size_adjust)
for random_seed in random_seeds:
    model_dir = "./models/speech/cnn_softmax/batch_size={}/random_seed={}".format(cnn_batch_size, random_seed)
    out_dir = os.path.join(output_dir, 'random_seed={}'.format(random_seed))
    test_speech_k_shot(model_dir, out_dir, random_seed, k_shot=5)

3. Speaker invariance for one-shot 11-way spoken digit classification in the presence of query speaker distractors

In [None]:
output_dir = "./results/speech/cnn_softmax/speaker_invariance/batch_size={}".format(cnn_batch_size)
for random_seed in random_seeds:
    model_dir = "./models/speech/cnn_softmax/batch_size={}/random_seed={}".format(cnn_batch_size, random_seed)
    out_dir = os.path.join(output_dir, 'random_seed={}'.format(random_seed))
    test_speaker_invariance(model_dir, out_dir, random_seed)

### Siamese CNN (offline)

1. One-shot 11-way spoken digit classification:

In [None]:
output_dir = "./results/speech/siamese_offline/1_shot/n_train={}".format(offline_n_train_episodes)
for random_seed in random_seeds:
    model_dir = "./models/speech/siamese_offline/n_train={}/random_seed={}".format(offline_n_train_episodes,
                                                                                   random_seed)
    out_dir = os.path.join(output_dir, 'random_seed={}'.format(random_seed))
    test_speech_k_shot(model_dir, out_dir, random_seed, k_shot=1)

2. Five-shot 11-way spoken digit classification

In [None]:
output_dir = "./results/speech/siamese_offline/5_shot/n_train={}".format(offline_n_train_episodes)
for random_seed in random_seeds:
    model_dir = "./models/speech/siamese_offline/n_train={}/random_seed={}".format(offline_n_train_episodes,
                                                                                   random_seed)
    out_dir = os.path.join(output_dir, 'random_seed={}'.format(random_seed))
    test_speech_k_shot(model_dir, out_dir, random_seed, k_shot=5)

3. Speaker invariance for one-shot 11-way spoken digit classification in the presence of query speaker distractors

In [None]:
output_dir = "./results/speech/siamese_offline/speaker_invariance/n_train={}".format(offline_n_train_episodes)
for random_seed in random_seeds:
    model_dir = "./models/speech/siamese_offline/n_train={}/random_seed={}".format(offline_n_train_episodes,
                                                                                   random_seed)
    out_dir = os.path.join(output_dir, 'random_seed={}'.format(random_seed))
    test_speaker_invariance(model_dir, out_dir, random_seed)

### Siamese CNN (online)

1. One-shot 11-way spoken digit classification:

In [None]:
output_dir = "./results/speech/siamese_online/1_shot/n_train={}".format(online_n_train_episodes)
for random_seed in random_seeds:
    model_dir = "./models/speech/siamese_online/n_train={}/random_seed={}".format(online_n_train_episodes,
                                                                                  random_seed)
    out_dir = os.path.join(output_dir, 'random_seed={}'.format(random_seed))
    test_speech_k_shot(model_dir, out_dir, random_seed, k_shot=1)

2. Five-shot 11-way spoken digit classification

In [None]:
output_dir = "./results/speech/siamese_online/5_shot/n_train={}".format(online_n_train_episodes)
for random_seed in random_seeds:
    model_dir = "./models/speech/siamese_online/n_train={}/random_seed={}".format(online_n_train_episodes,
                                                                                  random_seed)
    out_dir = os.path.join(output_dir, 'random_seed={}'.format(random_seed))
    test_speech_k_shot(model_dir, out_dir, random_seed, k_shot=5)

3. Speaker invariance for one-shot 11-way spoken digit classification in the presence of query speaker distractors

In [None]:
output_dir = "./results/speech/siamese_online/speaker_invariance/n_train={}".format(online_n_train_episodes)
for random_seed in random_seeds:
    model_dir = "./models/speech/siamese_online/n_train={}/random_seed={}".format(online_n_train_episodes,
                                                                                  random_seed)
    out_dir = os.path.join(output_dir, 'random_seed={}'.format(random_seed))
    test_speaker_invariance(model_dir, out_dir, random_seed)

## 2.4. Summaries
<a id='speech_summ'></a>

This section presents summaries on the training and one-shot testing of the unimodal speech models.

### Dynamic Time Warping (DTW)

1. One-shot 11-way spoken digit classification:

In [None]:
result_dir = os.path.join('./results/speech/dtw/1_shot', feats_type)
summarise_tests(result_dir)

2. Five-shot 11-way spoken digit classification

In [None]:
result_dir = os.path.join('./results/speech/dtw/5_shot', feats_type)
summarise_tests(result_dir)

3. Speaker invariance for one-shot 11-way spoken digit classification in the presence of query speaker distractors

In [None]:
result_dir = os.path.join('./results/speech/dtw/speaker_invariance', feats_type)
summarise_tests(result_dir, speaker_invariance=True)

### Feedforward Neural Network (FFNN) Softmax Classifier

0. Training:

In [None]:
result_dir = "./models/speech/ffnn_softmax/batch_size={}".format(ffnn_batch_size)
summarise_tests(result_dir, result_file='train_result.txt', result_file='train_result.txt')
summarise_train_time(result_dir)

1. One-shot 11-way spoken digit classification:

In [None]:
result_dir = "./results/speech/ffnn_softmax/1_shot/batch_size={}".format(ffnn_batch_size)
summarise_tests(result_dir)

2. Five-shot 11-way spoken digit classification

In [None]:
result_dir = "./results/speech/ffnn_softmax/5_shot/batch_size={}".format(ffnn_batch_size)
summarise_tests(result_dir)

3. Speaker invariance for one-shot 11-way spoken digit classification in the presence of query speaker distractors

In [None]:
result_dir = "./results/speech/ffnn_softmax/speaker_invariance/batch_size={}".format(ffnn_batch_size)
summarise_tests(result_dir, speaker_invariance=True)

### Convolutional Neural Network (CNN) Softmax Classifier

0. Training:

In [None]:
result_dir = "./models/speech/cnn_softmax/batch_size={}".format(cnn_batch_size)
summarise_tests(result_dir, result_file='train_result.txt', result_file='train_result.txt')
summarise_train_time(result_dir)

1. One-shot 11-way spoken digit classification:

In [None]:
result_dir = "./results/speech/cnn_softmax/1_shot/batch_size={}".format(cnn_batch_size)
summarise_tests(result_dir)

2. Five-shot 11-way spoken digit classification

In [None]:
result_dir = "./results/speech/cnn_softmax/5_shot/batch_size={}".format(cnn_batch_size)
summarise_tests(result_dir)

3. Speaker invariance for one-shot 11-way spoken digit classification in the presence of query speaker distractors

In [None]:
result_dir = "./results/speech/cnn_softmax/speaker_invariance/batch_size={}".format(cnn_batch_size)
summarise_tests(result_dir, speaker_invariance=True)

### Siamese CNN (offline)

0. Training:

In [None]:
result_dir = "./models/speech/siamese_offline/n_train={}".format(offline_n_train_episodes)
summarise_tests(result_dir, result_file='train_result.txt')
summarise_train_time(result_dir)

1. One-shot 11-way spoken digit classification:

In [None]:
result_dir = "./results/speech/siamese_offline/1_shot/n_train={}".format(offline_n_train_episodes)
summarise_tests(result_dir)

2. Five-shot 11-way spoken digit classification

In [None]:
result_dir = "./results/speech/siamese_offline/5_shot/n_train={}".format(offline_n_train_episodes)
summarise_tests(result_dir)

3. Speaker invariance for one-shot 11-way spoken digit classification in the presence of query speaker distractors

In [None]:
result_dir = "./results/speech/siamese_offline/speaker_invariance/n_train={}".format(offline_n_train_episodes)
summarise_tests(result_dir, speaker_invariance=True)

### Siamese CNN (online)

0. Training:

In [None]:
result_dir = "./models/speech/siamese_online/n_train={}".format(online_n_train_episodes)
summarise_tests(result_dir, result_file='train_result.txt')
summarise_train_time(result_dir)

1. One-shot 11-way spoken digit classification:

In [None]:
result_dir = "./results/speech/siamese_online/1_shot/n_train={}".format(online_n_train_episodes)
summarise_tests(result_dir)

2. Five-shot 11-way spoken digit classification

In [None]:
result_dir = "./results/speech/siamese_online/5_shot/n_train={}".format(online_n_train_episodes)
summarise_tests(result_dir)

3. Speaker invariance for one-shot 11-way spoken digit classification in the presence of query speaker distractors

In [None]:
result_dir = "./results/speech/siamese_online/speaker_invariance/n_train={}".format(online_n_train_episodes)
summarise_tests(result_dir, speaker_invariance=True)

## 3. Unimodal one-shot vision models
<a id='vision'></a>

Similar to the unimodal speech case, we investigate Siamese neural networks for one-shot image classification,
and compare to directly matching image pixels with cosine similarity, and to transfer learning with neural network classifiers.

## 3.1. Hyperparameters
<a id='vision_params'></a>

The following hyperparameters were used to produce the (vision) results in the paper:

In [None]:
# Training options:
learn_rate = 1e-3
decay_rate = 0.96

# Network architectures:
kernel_sizes = [[3, 3], [3, 3], [3, 3]]
pool_sizes = [[2, 2], [2, 2], None]
n_filters = [32, 64, 128]
n_hidden_units = [2048]
n_linear_units = 1024
ffnn_n_hidden_units = [512, 512, 512]

# Regularization: 
dropout_keep_prob = 0.9  # 0.7
dropout_channels = True
conv_batch_norm = 'before'  # 'after'
hidden_batch_norm = 'before'  # 'after'

# Batching:
balanced_batching = True
p_batch = 128  # 32 for Siamese CNN (offline)
k_batch = 8  # 2 for Siamese CNN (offline)
ffnn_batch_size = 200
cnn_batch_size = 200

# Early stopping on one-shot validation error:
l_way_validation = 20

# Siamese CNN parameters:
triplet_margin = 0.2
# Train episodes based on 19200 examples and 964 classes
# - Siamese CNN (offline): choose 19200/32 = 600 (each epoch processes 600*32*2=38400 examples)
# - Siamese CNN (online): choose 19200/128 = 150 (each epoch processes 150*128*8=153600 examples)
# NOTE: offline variant takes long to train; online is more efficient and reaches SOTA in fewer epochs
offline_n_train_episodes = 600
online_n_train_episodes = 150

## 3.2. Background training
<a id='vision_train'></a>

Once again, neural network models are trained on a large set of images from background image data ([Omniglot handwritten characters](https://github.com/brendenlake/omniglot/); no overlap with one-shot test images), learning features which are useful for measuring similarity between inputs (even on unseen classes).

### Pixel Matching

No training, just create directories containing model options for test script

In [None]:
for random_seed in random_seeds:
    model_dir = "./models/vision/pixels/random_seed={}".format(random_seed)
    print("-----------------------------------------------------------------------------------")
    print("Create pixel matching vision model params: --random-seed={}".format(random_seed))
    print("-----------------------------------------------------------------------------------")
    !python ../src/multimodal/train_vision.py \
        --model-version=pixels \
        --model-dir={model_dir}\
        --no-unique-dir \
        --random-seed={random_seed}

### Feedforward Neural Network (FFNN) Softmax Classifier

In [None]:
for random_seed in random_seeds:
    model_dir = "./models/vision/ffnn_softmax/batch_size={}/random_seed={}".format(ffnn_batch_size,
                                                                                   random_seed)
    print("-----------------------------------------------------------------------------------")
    print("Training FFNN classifier vision model: --random-seed={}".format(random_seed))
    print("-----------------------------------------------------------------------------------")
    # Save base params file in model directory
    !python ../src/multimodal/train_vision.py \
        --model-version=feedforward_softmax \
        --model-dir={model_dir} \
        --no-unique-dir \
        --save-base-params
    # Load model params from file
    with open(os.path.join(model_dir, 'model_params.json'), 'r') as fp:
        model_params = json.load(fp)
    # Set new model params
    model_params['n_hidden_units'] = ffnn_n_hidden_units
    model_params['n_linear_units'] = None  # automatically adds output logits
    model_params['dropout_keep_prob'] = dropout_keep_prob
    model_params['hidden_batch_norm'] = hidden_batch_norm
    # Save updated model params to file
    with open(os.path.join(model_dir, 'model_params.json'), 'w') as fp:
        json.dump(model_params, fp, indent=4)
    # Train ffnn softmax model on updated model params
    !python ../src/multimodal/train_vision.py \
        --model-version=feedforward_softmax \
        --model-dir={model_dir} \
        --no-unique-dir \
        --n-max-epochs=100 \
        --random-seed={random_seed} \
        -lr={learn_rate} \
        -dr={decay_rate} \
        --batch-size={ffnn_batch_size} \
        --k-shot=1 \
        --l-way={l_way_validation}

### Convolutional Neural Network (CNN) Softmax Classifier

In [None]:
for random_seed in random_seeds:
    model_dir = "./models/vision/cnn_softmax/batch_size={}/random_seed={}".format(cnn_batch_size,
                                                                                  random_seed))
    print("-----------------------------------------------------------------------------------")
    print("Training CNN classifier vision model: --random-seed={}".format(random_seed))
    print("-----------------------------------------------------------------------------------")
    # Save base params file in model directory
    !python ../src/multimodal/train_vision.py \
        --model-version=convolutional_softmax \
        --model-dir={model_dir} \
        --no-unique-dir \
        --save-base-params
    # Load model params from file
    with open(os.path.join(model_dir, 'model_params.json'), 'r') as fp:
        model_params = json.load(fp)
    # Set new model params
    model_params['kernel_sizes'] = kernel_sizes
    model_params['pool_sizes'] = pool_sizes
    model_params['n_filters'] = n_filters
    model_params['n_hidden_units'] = n_hidden_units
    model_params['n_linear_units'] = None  # automatically adds output logits
    model_params['dropout_keep_prob'] = dropout_keep_prob
    model_params['dropout_channels'] = dropout_channels
    model_params['conv_batch_norm'] = conv_batch_norm
    model_params['hidden_batch_norm'] = hidden_batch_norm
    # Save updated model params to file
    with open(os.path.join(model_dir, 'model_params.json'), 'w') as fp:
        json.dump(model_params, fp, indent=4)
    # Train conv softmax model on updated model params
    !python ../src/multimodal/train_vision.py \
        --model-version=convolutional_softmax \
        --model-dir={model_dir} \
        --no-unique-dir \
        --n-max-epochs=100 \
        --random-seed={random_seed} \
        -lr={learn_rate} \
        -dr={decay_rate} \
        --batch-size={cnn_batch_size} \
        --k-shot=1 \
        --l-way={l_way_validation}

### Siamese CNN (offline)

In [None]:
# Adjust k_batch and p_batch since Siamese offline variant hits memory limit
# on a single Titan Xp GPU with larger batch sizes ...
p_batch_adjust = 32  # wide spread of concepts per batch
k_batch_adjust = 2   # few concept examples per batch

In [None]:
for random_seed in random_seeds:
    model_dir = "./models/vision/siamese_offline/n_train={}/random_seed={}".format(offline_n_train_episodes,
                                                                                   random_seed)
    print("-----------------------------------------------------------------------------------")
    print("Training Siamese CNN (offline) vision model: --random-seed={}".format(random_seed))
    print("-----------------------------------------------------------------------------------")
    # Save base params file in model directory
    !python ../src/multimodal/train_vision.py \
        --model-version=siamese_triplet \
        --model-dir={model_dir} \
        --no-unique-dir \
        --save-base-params
    # Load model params from file
    with open(os.path.join(model_dir, 'model_params.json'), 'r') as fp:
        model_params = json.load(fp)
    # Set new model params
    model_params['kernel_sizes'] = kernel_sizes
    model_params['pool_sizes'] = pool_sizes
    model_params['n_filters'] = n_filters
    model_params['n_hidden_units'] = n_hidden_units
    model_params['n_linear_units'] = n_linear_units
    model_params['dropout_keep_prob'] = dropout_keep_prob
    model_params['dropout_channels'] = dropout_channels
    model_params['conv_batch_norm'] = conv_batch_norm
    model_params['hidden_batch_norm'] = hidden_batch_norm
    # Save updated model params to file
    with open(os.path.join(model_dir, 'model_params.json'), 'w') as fp:
        json.dump(model_params, fp, indent=4)
    # Train siamese online vision model on updated model params
    !python ../src/multimodal/train_vision.py \
        --model-version=siamese_triplet \
        --model-dir={model_dir} \
        --no-unique-dir \
        --n-max-epochs=100 \
        --random-seed={random_seed} \
        --n-train-episodes={offline_n_train_episodes} \
        -lr={learn_rate} \
        -dr={decay_rate} \
        --balanced-batching \
        --p-batch={p_batch_adjust} \
        --k-batch={k_batch_adjust} \
        --k-shot=1 \
        --l-way={l_way_validation}

### Siamese CNN (online)

In [None]:
for random_seed in random_seeds:
    model_dir = "./models/vision/siamese_online/n_train={}/random_seed={}".format(online_n_train_episodes,
                                                                                  random_seed)
    print("-----------------------------------------------------------------------------------")
    print("Training Siamese CNN (online) vision model: --random-seed={}".format(random_seed))
    print("-----------------------------------------------------------------------------------")
    # Save base params file in model directory
    !python ../src/multimodal/train_vision.py \
        --model-version=siamese_triplet_online \
        --model-dir={model_dir} \
        --no-unique-dir \
        --save-base-params
    # Load model params from file
    with open(os.path.join(model_dir, 'model_params.json'), 'r') as fp:
        model_params = json.load(fp)
    # Set new model params
    model_params['kernel_sizes'] = kernel_sizes
    model_params['pool_sizes'] = pool_sizes
    model_params['n_filters'] = n_filters
    model_params['n_hidden_units'] = n_hidden_units
    model_params['n_linear_units'] = n_linear_units
    model_params['dropout_keep_prob'] = dropout_keep_prob
    model_params['dropout_channels'] = dropout_channels
    model_params['conv_batch_norm'] = conv_batch_norm
    model_params['hidden_batch_norm'] = hidden_batch_norm
    # Save updated model params to file
    with open(os.path.join(model_dir, 'model_params.json'), 'w') as fp:
        json.dump(model_params, fp, indent=4)
    # Train siamese online vision model on updated model params
    !python ../src/multimodal/train_vision.py \
        --model-version=siamese_triplet_online \
        --model-dir={model_dir} \
        --no-unique-dir \
        --n-max-epochs=100 \
        --n-train-episodes={online_n_train_episodes} \
        --random-seed={random_seed} \
        -lr={learn_rate} \
        -dr={decay_rate} \
        --balanced-batching \
        --p-batch={p_batch} \
        --k-batch={k_batch} \
        --k-shot=1 \
        --l-way={l_way_validation}

## 3.3. One-shot vision testing
<a id='vision_test'></a>

We now test the trained unimodal vision models on two tasks on the [MNIST handwritten digit dataset](http://yann.lecun.com/exdb/mnist/):

1. One-shot 10-way handwritten digit image classification
2. Five-shot 10-way handwritten digit image classification

### Pixel Matching

1. One-shot 10-way handwritten digit image classification

In [None]:
output_dir = "./results/vision/pixels/1_shot"
for random_seed in random_seeds:
    model_dir = "./models/vision/pixels/random_seed={}".format(random_seed)
    out_dir = os.path.join(output_dir, 'random_seed={}'.format(random_seed))
    test_vision_k_shot(model_dir, out_dir, random_seed, k_shot=1)

2. Five-shot 10-way handwritten digit image classification

In [None]:
output_dir = "./results/vision/pixels/5_shot"
for random_seed in random_seeds:
    model_dir = "./models/vision/pixels/random_seed={}".format(random_seed)
    out_dir = os.path.join(output_dir, 'random_seed={}'.format(random_seed))
    test_vision_k_shot(model_dir, out_dir, random_seed, k_shot=5)

### Feedforward Neural Network (FFNN) Softmax Classifier

1. One-shot 10-way handwritten digit image classification

In [None]:
output_dir = "./results/vision/ffnn_softmax/1_shot/batch_size={}".format(ffnn_batch_size)
for random_seed in random_seeds:
    model_dir = "./models/vision/ffnn_softmax/batch_size={}/random_seed={}".format(ffnn_batch_size,
                                                                                   random_seed)
    out_dir = os.path.join(output_dir, 'random_seed={}'.format(random_seed))
    test_vision_k_shot(model_dir, out_dir, random_seed, k_shot=1)

2. Five-shot 10-way handwritten digit image classification

In [None]:
output_dir = "./results/vision/ffnn_softmax/5_shot/batch_size={}".format(ffnn_batch_size)
for random_seed in random_seeds:
    model_dir = "./models/vision/ffnn_softmax/batch_size={}/random_seed={}".format(ffnn_batch_size,
                                                                                   random_seed)
    out_dir = os.path.join(output_dir, 'random_seed={}'.format(random_seed))
    test_vision_k_shot(model_dir, out_dir, random_seed, k_shot=5)

### Convolutional Neural Network (CNN) Softmax Classifier

1. One-shot 10-way handwritten digit image classification

In [None]:
output_dir = "./results/vision/cnn_softmax/1_shot/batch_size={}".format(cnn_batch_size)
for random_seed in random_seeds:
    model_dir = "./models/vision/cnn_softmax/batch_size={}/random_seed={}".format(cnn_batch_size,
                                                                                   random_seed)
    out_dir = os.path.join(output_dir, 'random_seed={}'.format(random_seed))
    test_vision_k_shot(model_dir, out_dir, random_seed, k_shot=1)

2. Five-shot 10-way handwritten digit image classification

In [None]:
output_dir = "./results/vision/cnn_softmax/5_shot/batch_size={}".format(cnn_batch_size)
for random_seed in random_seeds:
    model_dir = "./models/vision/cnn_softmax/batch_size={}/random_seed={}".format(cnn_batch_size,
                                                                                   random_seed)
    out_dir = os.path.join(output_dir, 'random_seed={}'.format(random_seed))
    test_vision_k_shot(model_dir, out_dir, random_seed, k_shot=5)

### Siamese CNN (offline)

1. One-shot 10-way handwritten digit image classification

In [None]:
output_dir = "./results/vision/siamese_offline/1_shot/n_train={}".format(offline_n_train_episodes)
for random_seed in random_seeds:
    model_dir = "./models/vision/siamese_offline/n_train={}/random_seed={}".format(offline_n_train_episodes,
                                                                                   random_seed)
    out_dir = os.path.join(output_dir, 'random_seed={}'.format(random_seed))
    test_vision_k_shot(model_dir, out_dir, random_seed, k_shot=1)

2. Five-shot 10-way handwritten digit image classification

In [None]:
output_dir = "./results/vision/siamese_offline/5_shot/n_train={}".format(offline_n_train_episodes)
for random_seed in random_seeds:
    model_dir = "./models/vision/siamese_offline/n_train={}/random_seed={}".format(offline_n_train_episodes,
                                                                                   random_seed)
    out_dir = os.path.join(output_dir, 'random_seed={}'.format(random_seed))
    test_vision_k_shot(model_dir, out_dir, random_seed, k_shot=5)

### Siamese CNN (online)

1. One-shot 10-way handwritten digit image classification

In [None]:
output_dir = "./results/vision/siamese_online/1_shot/n_train={}".format(online_n_train_episodes)
for random_seed in random_seeds:
    model_dir = "./models/vision/siamese_online/n_train={}/random_seed={}".format(online_n_train_episodes,
                                                                                   random_seed)
    out_dir = os.path.join(output_dir, 'random_seed={}'.format(random_seed))
    test_vision_k_shot(model_dir, out_dir, random_seed, k_shot=1)

2. Five-shot 10-way handwritten digit image classification

In [None]:
output_dir = "./results/vision/siamese_online/5_shot/n_train={}".format(online_n_train_episodes)
for random_seed in random_seeds:
    model_dir = "./models/vision/siamese_online/n_train={}/random_seed={}".format(online_n_train_episodes,
                                                                                   random_seed)
    out_dir = os.path.join(output_dir, 'random_seed={}'.format(random_seed))
    test_vision_k_shot(model_dir, out_dir, random_seed, k_shot=5)

## 3.4. Summaries
<a id='vision_summ'></a>

This section presents summaries on the training and one-shot testing of the unimodal vision models.

### Pixel Matching

1. One-shot 10-way handwritten digit image classification

In [None]:
result_dir = './results/vision/pixels/1_shot'
summarise_tests(result_dir)

2. Five-shot 10-way handwritten digit image classification

In [None]:
result_dir = './results/vision/pixels/5_shot'
summarise_tests(result_dir)

### Feedforward Neural Network (FFNN) Softmax Classifier

0. Training

In [None]:
result_dir = "./models/vision/ffnn_softmax/batch_size={}".format(ffnn_batch_size)
summarise_tests(result_dir, result_file='train_result.txt', result_file='train_result.txt')
summarise_train_time(result_dir)

1. One-shot 10-way handwritten digit image classification

In [None]:
result_dir = "./results/vision/ffnn_softmax/1_shot/batch_size={}".format(ffnn_batch_size)
summarise_tests(result_dir)

2. Five-shot 10-way handwritten digit image classification

In [None]:
result_dir = "./results/vision/ffnn_softmax/5_shot/batch_size={}".format(ffnn_batch_size)
summarise_tests(result_dir)

### Convolutional Neural Network (CNN) Softmax Classifier

0. Training

In [None]:
result_dir = "./models/vision/cnn_softmax/batch_size={}".format(cnn_batch_size)
summarise_tests(result_dir, result_file='train_result.txt', result_file='train_result.txt')
summarise_train_time(result_dir)

1. One-shot 10-way handwritten digit image classification

In [None]:
result_dir = "./results/vision/cnn_softmax/1_shot/batch_size={}".format(cnn_batch_size)
summarise_tests(result_dir)

2. Five-shot 10-way handwritten digit image classification

In [None]:
result_dir = "./results/vision/cnn_softmax/5_shot/batch_size={}".format(cnn_batch_size)
summarise_tests(result_dir)

### Siamese CNN (offline)

0. Training

In [None]:
result_dir = "./models/vision/siamese_offline/n_train={}".format(offline_n_train_episodes)
summarise_tests(result_dir, result_file='train_result.txt')
summarise_train_time(result_dir)

1. One-shot 10-way handwritten digit image classification

In [None]:
result_dir = "./results/vision/siamese_offline/1_shot/n_train={}".format(offline_n_train_episodes)
summarise_tests(result_dir)

2. Five-shot 10-way handwritten digit image classification

In [None]:
result_dir = "./results/vision/siamese_offline/5_shot/n_train={}".format(offline_n_train_episodes)
summarise_tests(result_dir)

### Siamese CNN (online)

0. Training

In [None]:
result_dir = "./models/vision/siamese_online/n_train={}".format(online_n_train_episodes)
summarise_tests(result_dir, result_file='train_result.txt')
summarise_train_time(result_dir)

1. One-shot 10-way handwritten digit image classification

In [None]:
result_dir = "./results/vision/siamese_online/1_shot/n_train={}".format(online_n_train_episodes)
summarise_tests(result_dir)

2. Five-shot 10-way handwritten digit image classification

In [None]:
result_dir = "./results/vision/siamese_online/5_shot/n_train={}".format(online_n_train_episodes)
summarise_tests(result_dir)