Copyright 2021 DeepMind Technologies Limited.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use
this file except in compliance with the License. You may obtain a copy of the
License at

[https://www.apache.org/licenses/LICENSE-2.0](https://www.apache.org/licenses/LICENSE-2.0)

Unless required by applicable law or agreed to in writing, software distributed
under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR
CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.

# A Realistic Simulation Framework for Learning with Label Noise

In this colab, we provide metadata and examples for data loading for the noisy label datasets generated using the pseudo-labeling paradigm propsed in the paper *A Realistic Simulation Framework for Learning with Label Noise*.
We also provide the associated rater features. We consider 4 tasks: CIFAR10 [1], CIFAR100 [1], Patch Camelyon [2,3], and Cats vs Dogs [4]. For each task, we generate three synthetic noisy label datasets, named as "low", "medium", and "high" according to the amount of label noise.

[1] Krizhevsky, Alex, and Geoffrey Hinton. "Learning multiple layers of features from tiny images.", 2009. \\
[2] Veeling, Bastiaan S., Jasper Linmans, Jim Winkens, Taco Cohen, and Max Welling. "Rotation equivariant CNNs for digital pathology." In International Conference on Medical image computing and computer-assisted intervention, pp. 210-218. Springer, Cham, 2018. \\
[3] Bejnordi, Babak Ehteshami, Mitko Veta, Paul Johannes Van Diest, Bram Van Ginneken, Nico Karssemeijer, Geert Litjens, Jeroen AWM Van Der Laak et al. "Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer." Jama 318, no. 22 (2017): 2199-2210. \\
[4] Elson, Jeremy, John R. Douceur, Jon Howell, and Jared Saul. "Asirra: a CAPTCHA that exploits interest-aligned manual image categorization." In ACM Conference on Computer and Communications Security, vol. 7, pp. 366-374. 2007.

In [None]:
# @title Imports and global variable
import os
import matplotlib.pyplot as plt
import json
import tensorflow as tf

root_dir = '/root/directory/to/the/dataset/'

**CIFAR10 noisy label datasets**

**Download size**
*   79MB for each of the low, medium, and high noise datasets.

**Number of examples**
*   train: 19987, valid: 5021, for each of the low, medium, and high noise datasets.

Both the train and valid splits are subsampled from the train split of the original CIFAR10 dataset.

**Data features**
*   "image/raw": images in bytes, shape = (32, 32, 3).
*   "image/class/label": clean label, tf.int64.
*   "noisy_labels": the noisy label given by rater models, a list of 10 tf.int64 integers.
*   "rater_ids": the ID of the rater models, a list of 10 tf.string.

**Rater features**
*   model_name: name of the model
*   accuracy: accuracy of the rater model on the rater validation set
*   loss: loss of the rater model on the rater validation set
*   experience: the total number of data that the rater model has seen during training

In [None]:
# @title An example for loading CIFAR10 noisy label datasets
task_name = 'cifar10'

# One of ['low', 'medium', 'high']
noise_level = 'low'

# One of ['train', 'valid']. The `valid` split should be used for
# hyperparameter tuning. The model should be tested on the original test
# slipt for these tasks.
split = 'train'

# We have 10 rater models for CIFAR10.
num_raters = 10

directory = os.path.join(root_dir, task_name, noise_level, split) + '*'
raw_image_dataset = tf.data.TFRecordDataset(tf.io.gfile.glob(directory))

# Create a dictionary describing the features.
image_feature_description = {
    # the raw image
    'image/raw': tf.io.FixedLenFeature([], tf.string),
    # the clean label
    'image/class/label': tf.io.FixedLenFeature([1], tf.int64),
    # noisy labels from all the raters
    'noisy_labels': tf.io.FixedLenFeature([num_raters], tf.int64),
    # the IDs of rater models
    'rater_ids': tf.io.FixedLenFeature([num_raters], tf.string),
}

def _parse_image_function(example_proto):
  # Parse the input tf.train.Example proto using the dictionary above.
  return tf.io.parse_single_example(example_proto, image_feature_description)

parsed_image_dataset = raw_image_dataset.map(_parse_image_function)

for features in parsed_image_dataset.take(1):
  # Check the IDs of the rater models. The rater IDs are the same for all the
  # examples in the dataset.
  rater_ids = features['rater_ids']
  rater_id_string = [r.numpy().decode('utf-8') for r in rater_ids]
  print('The IDs of the rater models for this dataset are:')
  print(rater_id_string)
  clean_label = features['image/class/label'].numpy()
  print('The clean label for the following example is %d' % clean_label)
  noisy_labels = features['noisy_labels'].numpy()
  print('The noisy labels from the rater models are:')
  print(noisy_labels)
  image = tf.reshape(tf.io.decode_raw(features['image/raw'], tf.uint8),
                     (32, 32, 3))
  plt.imshow(image)

**CIFAR100 noisy label datasets**

**Download size**
*   82MB for each of the low, medium, and high noise datasets.

**Number of examples**
*   train: 20114, valid: 4978, for each of the low, medium, and high noise datasets.

Both the train and valid splits are subsampled from the train split of the original CIFAR100 dataset.

**Data features**
*   "image/encoded": images in bytes, shape=(32, 32, 3).
*   "image/class/fine_label": clean fine-grained label, tf.int64.
*   "image/class/coarse_label": clean coarse label, tf.int64
*   "noisy_labels": the noisy label given by rater models, a list of 11 tf.int64 integers.
*   "rater_ids": the ID of the rater models, a list of 11 tf.string.

**Rater features**
*   model_name: name of the model
*   accuracy: accuracy of the rater model on the rater validation set
*   loss: loss of the rater model on the rater validation set
*   mAP: the mean average precision of the rater model on the rater validation set
*   experience: the total number of data that the rater model has seen during training


In [None]:
# @title An example for loading CIFAR100 noisy label datasets
task_name = 'cifar100'

# One of ['low', 'medium', 'high']
noise_level = 'high'

# One of ['train', 'valid']. The `valid` split should be used for
# hyperparameter tuning. The model should be tested on the original test
# slipt for these tasks.
split = 'train'

# We have 11 rater models for CIFAR100.
num_raters = 11

directory = os.path.join(root_dir, task_name, noise_level, split) + '*'
raw_image_dataset = tf.data.TFRecordDataset(tf.io.gfile.glob(directory))

# Create a dictionary describing the features.
image_feature_description = {
    # the raw image
    'image/encoded': tf.io.FixedLenFeature([], tf.string),
    # the fine-grained clean label, value in [0, 99]
    'image/class/fine_label': tf.io.FixedLenFeature([1], tf.int64),
    # the coarse clean label, value in [0, 19]
    'image/class/coarse_label': tf.io.FixedLenFeature([1], tf.int64),
    # noisy labels from all the raters
    'noisy_labels': tf.io.FixedLenFeature([num_raters], tf.int64),
    # the IDs of rater models
    'rater_ids': tf.io.FixedLenFeature([num_raters], tf.string),
}

def _parse_image_function(example_proto):
  # Parse the input tf.train.Example proto using the dictionary above.
  return tf.io.parse_single_example(example_proto, image_feature_description)

parsed_image_dataset = raw_image_dataset.map(_parse_image_function)

for features in parsed_image_dataset.take(1):
  # Check the IDs of the rater models. The rater IDs are the same for all the
  # examples in the dataset.
  rater_ids = features['rater_ids']
  rater_id_string = [r.numpy().decode('utf-8') for r in rater_ids]
  print('The IDs of the rater models for this dataset are:')
  print(rater_id_string)
  clean_label = features['image/class/fine_label'].numpy()
  print('The clean label for the following example is %d' % clean_label)
  noisy_labels = features['noisy_labels'].numpy()
  print('The noisy labels from the rater models are:')
  print(noisy_labels)
  image = tf.reshape(tf.io.decode_raw(features['image/encoded'], tf.uint8),
                     (32, 32, 3))
  plt.imshow(image)

**Patch Camelyon noisy label datasets**

**Download size**
*   3.27GB for each of the low, medium, and high noise datasets.

**Number of examples**
*   train: 130982, valid: 16394, for each of the low, medium, and high noise datasets.

The train and valid splits are subsampled from the train and valid splits of the original Patch Camelyon dataset, respectively.

**Data features**
*   "image": images in png format, shape=(96, 96, 3).
*   "label": clean label, tf.int64.
*   "id": the ID of this image in the original Patch Camelyon dataset, a tf.string that begins with "train_" or "valid_".
*   "noisy_labels": the noisy label given by rater models, a list of 20 (low and high noise) or 19 (medium noise) tf.int64 integers.
*   "rater_ids": the ID of the rater models, a list of 20 (low and high noise) or 19 (medium noise) tf.string.

**Rater features**
*   model_name: name of the model
*   accuracy: accuracy of the rater model on the rater validation set
*   loss: loss of the rater model on the rater validation set
*   experience: the total number of data that the rater model has seen during training

In [None]:
# @title An example for loading Patch Camelyon noisy label datasets
task_name = 'patch_camelyon'

# One of ['low', 'medium', 'high']
noise_level = 'medium'

# One of ['train', 'valid']. The `valid` split should be used for
# hyperparameter tuning. The model should be tested on the original test
# slipt for these tasks.
split = 'train'

# We have 20 rater models for low and high noise for Patch Camelyon.
# For medium noise, we have 19 rater models.
num_raters = 19 if noise_level == 'medium' else 20

directory = os.path.join(root_dir, task_name, noise_level, split) + '*'
raw_image_dataset = tf.data.TFRecordDataset(tf.io.gfile.glob(directory))

# Create a dictionary describing the features.
image_feature_description = {
    # the raw image
    'image': tf.io.FixedLenFeature([], tf.string),
    # the clean label, value in {0, 1}
    'label': tf.io.FixedLenFeature([1], tf.int64),
    # noisy labels from all the raters
    'noisy_labels': tf.io.FixedLenFeature([num_raters], tf.int64),
    # the IDs of rater models
    'rater_ids': tf.io.FixedLenFeature([num_raters], tf.string),
}

def _parse_image_function(example_proto):
  # Parse the input tf.train.Example proto using the dictionary above.
  return tf.io.parse_single_example(example_proto, image_feature_description)

parsed_image_dataset = raw_image_dataset.map(_parse_image_function)

for features in parsed_image_dataset.take(1):
  # Check the IDs of the rater models. The rater IDs are the same for all the
  # examples in the dataset.
  rater_ids = features['rater_ids']
  rater_id_string = [r.numpy().decode('utf-8') for r in rater_ids]
  print('The IDs of the rater models for this dataset are:')
  print(rater_id_string)
  clean_label = features['label'].numpy()
  print('The clean label for the following example is %d' % clean_label)
  noisy_labels = features['noisy_labels'].numpy()
  print('The noisy labels from the rater models are:')
  print(noisy_labels)
  image = tf.io.decode_png(features['image'])
  plt.imshow(image)

**Cats vs Dogs noisy label datasets**

**Download size**
*   2.4MB for each of the low, medium, and high noise datasets.

**Number of examples**
*   train: 9302, valid: 1184, for each of the low, medium, and high noise datasets.

Both the train and valid splits are subsampled from the original Cats vs Dogs dataset.

**Data features**
*   "noisy_labels": the noisy label given by rater models, a list of 10 tf.int64 integers. Label 0 for cats, 1 for dogs.
*   "rater_ids": the ID of the rater models, a list of 10 tf.string.
*   "image/filename": the filename of the image, corresponding to the filename in the original Cats vs Dogs dataset, tf.string.


**Rater features**
*   model_name: name of the model
*   accuracy: accuracy of the rater model on the rater validation set
*   loss: loss of the rater model on the rater validation set
*   mAP: the mean average precision of the rater model on the rater validation set
*   auc_PR: the area under curve--precision recall of the rater model on the rater validation set
*   auc_ROC: the area under curve--ROC of the rater model on the rater validation set
*   experience: the total number of data that the rater model has seen during training

In [None]:
# @title An example for loading Cats vs Dongs noisy label datasets
task_name = 'cats_vs_dogs'

# One of ['low', 'medium', 'high']
noise_level = 'medium'

# One of ['train', 'valid']. The `valid` split should be used for
# hyperparameter tuning. The model should be tested on the original test
# slipt for these tasks.
split = 'train'

# We have 10 rater models for Cats vs Dogs.
num_raters = 10

directory = os.path.join(root_dir, task_name, noise_level, split) + '*'
raw_image_dataset = tf.data.TFRecordDataset(tf.io.gfile.glob(directory))

# Create a dictionary describing the features.
image_feature_description = {
    # noisy labels from all the raters
    'noisy_labels': tf.io.FixedLenFeature([num_raters], tf.int64),
    # the IDs of rater models
    'rater_ids': tf.io.FixedLenFeature([num_raters], tf.string),
    # filename of the image
    'image/filename': tf.io.FixedLenFeature([1], tf.string),
}

def _parse_image_function(example_proto):
  # Parse the input tf.train.Example proto using the dictionary above.
  return tf.io.parse_single_example(example_proto, image_feature_description)

parsed_image_dataset = raw_image_dataset.map(_parse_image_function)

for features in parsed_image_dataset.take(1):
  # Check the IDs of the rater models. The rater IDs are the same for all the
  # examples in the dataset.
  rater_ids = features['rater_ids']
  rater_id_string = [r.numpy().decode('utf-8') for r in rater_ids]
  print('The IDs of the rater models for this dataset are:')
  print(rater_id_string)
  print('Image filename:')
  print(features['image/filename'][0].numpy().decode('utf-8'))
  noisy_labels = features['noisy_labels'].numpy()
  print('The noisy labels from the rater models are:')
  print(noisy_labels)

In [None]:
# @title An example for loading rater features
for task_name in ['cifar10', 'cifar100', 'patch_camelyon', 'cats_vs_dogs']:
  for noise_level in ['low', 'medium', 'high']:
    dir = os.path.join(root_dir, task_name, noise_level, 'rater_features.json')
    with tf.io.gfile.GFile(dir, 'rb') as fj:
      rater_features_dict = json.load(fj)
    print(rater_features_dict)