In [1]:
import os
os.environ["CUDA_VISIBLE_DEVICES"]="0"

# Including Metadata

``AUCMEDI`` allows us to include metadata. This is useful, because it might be, that we have some additional information alongside our images such as age or blood values that have some value for classification.  

We don't have such additional metadata for the dataset used here (images of colorectal cancer histology), but we will make some up for demonstation purposes.  

But fist, the data need to be loaded.  

## Downloading and preparing the data

First, the data need to be loaded and prepared for ``AUCMEDI``.  
If you have questions concering that part, just have a look in the corresponding notebook.

In [2]:
from pathlib import Path
import wget
import zipfile

cwd = !pwd
datadir = cwd[0] + "/data"
Path(datadir).mkdir(parents=True, exist_ok=True)

#print('Beginning file download with wget module')

#url = 'https://zenodo.org/record/53169/files/Kather_texture_2016_image_tiles_5000.zip?download=1'
#wget.download(url, datadir)

#with zipfile.ZipFile("data/Kather_texture_2016_image_tiles_5000.zip","r") as zip_ref:
#    zip_ref.extractall("data")

from aucmedi.data_processing.io_data import input_interface
ds_loader = input_interface("directory", path_imagedir="data/Kather_texture_2016_image_tiles_5000", path_data=None, training=True, ohe=False)
(samples, class_ohe, nclasses, class_names, image_format) = ds_loader


2022-07-27 09:43:43.506665: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.


## Generating Metadata

Now we will generate some random metadata. Of course they won't have value for classification, but we will see how to include the metadata into the ``AUCMEDI`` pipeline.

In [3]:
import numpy as np
from numpy import random

age=random.randint(low=30, high=100, size=(5000))
blood=random.randn(5000)
metadata = np.vstack((age,blood)).T

Let's see what our metadata looks like:

In [4]:
metadata

array([[ 3.50000000e+01,  1.23243337e-02],
       [ 4.60000000e+01, -1.50783165e+00],
       [ 4.80000000e+01, -4.62923412e-01],
       ...,
       [ 7.20000000e+01, -6.80484845e-01],
       [ 4.40000000e+01, -2.83978910e-01],
       [ 8.50000000e+01, -4.69024810e-01]])

So our metadata are a 2-dimensional numpy array with the dimensions (n_samples, n_variables).  
In the first column are is the age and in the second column a blood-value of the samples (randomly generated).  

## Splitting our data in train-, test- and validation set

When splitting the dataset with ``sampling_split`` we also include the metadata to the splitting (using the argument metadata) so that the metadata get split as well.

In [5]:
from aucmedi.sampling.split import sampling_split
train, validation, test = sampling_split(samples, class_ohe, metadata=metadata, sampling=[0.5, 0.25, 0.25], 
                                         stratified=True, iterative=False, seed=123)

In [6]:
train

(array(['07_ADIPOSE/15FFE_CRC-Prim-HE-03_012.tif_Row_1_Col_601.tif',
        '01_TUMOR/10264_CRC-Prim-HE-07_025.tif_Row_1801_Col_1.tif',
        '07_ADIPOSE/13CEE_CRC-Prim-HE-05_032.tif_Row_1501_Col_901.tif',
        ..., '08_EMPTY/1470E_CRC-Prim-HE-06_005.tif_Row_4351_Col_2851.tif',
        '07_ADIPOSE/14420_CRC-Prim-HE-10_020.tif_Row_601_Col_1801.tif',
        '07_ADIPOSE/163E5_CRC-Prim-HE-06_004.tif_Row_1801_Col_451.tif'],
       dtype='<U64'),
 array([[0, 0, 0, ..., 0, 1, 0],
        [1, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 1, 0],
        ...,
        [0, 0, 0, ..., 0, 0, 1],
        [0, 0, 0, ..., 0, 1, 0],
        [0, 0, 0, ..., 0, 1, 0]], dtype=uint8),
 array([[75.        ,  1.07417889],
        [88.        , -1.0580301 ],
        [90.        , -0.51070515],
        ...,
        [42.        ,  0.92283093],
        [87.        ,  0.90734877],
        [42.        ,  0.2433204 ]]))

## Define the model

Now we define our ``NeuralNetwork``. If you have questions concering that, have a look in the notebook "Custom Architecture" or "3 Pillars".  

Importantly we have  to define the argument `meta_variables` as 2 here, so that AUCMEDI knows to include these variables.

In [7]:
from aucmedi.neural_network.model import NeuralNetwork
import tensorflow_addons as tfa

f1Score = tfa.metrics.F1Score(num_classes=nclasses, threshold=0.5)

model = NeuralNetwork(n_labels=nclasses, channels=3, 
                      loss="categorical_crossentropy", metrics=["categorical_accuracy", f1Score], 
                      activation_output="softmax", pretrained_weights=False, meta_variables = 2)

2022-07-27 09:43:46.850717: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-07-27 09:43:47.419450: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 22844 MB memory:  -> device: 0, name: NVIDIA TITAN RTX, pci bus id: 0000:3f:00.0, compute capability: 7.5


## Train the model

Now, the model can be trained. If you have questions concering that, have a look at the notebook "Costum Architecture" or "3 Pillars".  

Here, it is importatnt that we include the ``metadata`` for __each__ ``DataGenerator``. As you can see in the output above, the ``metadata`` for train, are in ``train[2]`` and accordingly the ``metadata`` for validation are in ``validation[2]``.

In [8]:
from aucmedi.data_processing.data_generator import DataGenerator

train_generator = DataGenerator(samples=train[0], path_imagedir="data/Kather_texture_2016_image_tiles_5000",
                                               labels=train[1], metadata=train[2], resize=model.meta_input, 
                                               standardize_mode=model.meta_standardize, 
                                               image_format=image_format, batch_size=32, data_aug=None, 
                                               grayscale=False, subfunctions=[], prepare_images=False, 
                                               sample_weights=None, seed=123, workers=1)
val_generator = DataGenerator(samples=validation[0], path_imagedir="data/Kather_texture_2016_image_tiles_5000",
                                               labels=validation[1], metadata=validation[2], resize=model.meta_input, 
                                               standardize_mode=model.meta_standardize, 
                                               image_format=image_format, batch_size=32, data_aug=None, 
                                               grayscale=False, subfunctions=[], prepare_images=False, 
                                               sample_weights=None, seed=123, workers=1)

history = model.train(training_generator=train_generator, validation_generator=val_generator, epochs=20, iterations=None, 
                                         callbacks=None, class_weights=None, transfer_learning=False)

Epoch 1/20


2022-07-27 09:43:49.902739: I tensorflow/stream_executor/cuda/cuda_dnn.cc:384] Loaded cuDNN version 8100
2022-07-27 09:43:50.476791: I tensorflow/core/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory


Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


If you want to know how to evaluate the performance of your model, have a look in the corresponding notebook.