# PET-CT Inference Tutorial
Contacts: eyuboglu@stanford.edu, gangus@stanford.edu

In this notebook we cover:
1. Loading hyperparameters from JSON
2. Building a model from those parameters and loading weights
3. How proper input should be structured
4. How to perform inference on the model
5. How output is structured

## Setup
Import various packages. Make sure you're in an environment with the `pet_ct` package installed.

In [1]:
# import requirements
%load_ext autoreload
%autoreload 2

import os
import json

import torch

import pet_ct.model.models as models
from pet_ct.learn.datasets import MTClassifierDataset
from pet_ct.learn.dataloaders import MTExamDataLoader
from pet_ct.util.util import set_logger

In [2]:
# TODO: change to package directory
os.chdir("/home/eyuboglu/fdg-pet-ct")

# TODO: set path to the tutorial directory relative to the package directory 
experiment_dir = "notebooks/tutorial"
set_logger(log_path=os.path.join(experiment_dir, "process.log"))

In [3]:
# select your CUDA devices if available
devices = [3]
cuda = True

## Loading hyper-parameters
We've included a params file at `notebooks/tutorial/params.json`. Please take a quick look at it toget a sense of its structure and what we include in the params".

In [4]:
def load_params(path):
    """
    Loads parameters from .json file at path. 
    """
    with open(path) as f:
        params = json.load(f)["process_args"]
    
    # distribute shared params
    new_task_configs = []
    for task_config in params["task_configs"]:
        new_task_config = params["default_task_config"].copy()
        new_task_config.update(task_config)
        new_task_configs.append(new_task_config)
    task_configs = new_task_configs

    params["model_args"]["task_configs"] = task_configs
    params["dataset_args"]["task_configs"] = task_configs
    
    return params

params = load_params(os.path.join(experiment_dir, "params.json"))

# Building a model
Let's use the parameters we've loaded to build a model. We'll also load pretrained weights from `notebooks/tutorial/weights.tar`. 

In [5]:
def build_model(model_class, model_args, weights_path=None):
    model_class = getattr(models, model_class)
    model = model_class(cuda=cuda, devices=devices, **model_args)
    if weights_path is not None: 
        model.load_weights(weights_path, device=devices[0])
    return model

In [6]:
# build model and load weights, you should see 550/550 pretrained params loaded. 
model = build_model(params["model_class"], 
                    params["model_args"],
                    os.path.join(experiment_dir, "weights.tar"))

Loading I3D weights from models/i3d/model_flow.pth
Loaded 550/550 pretrained parametersfrom notebooks/tutorial/weights.tar matching 'None'.


# How to structure inputs?
To understand how to structure inputs properly we will load some training examples from our dataset. However, the `MTClassifierDataset` class below is designed for data in our databases at Stanford. You'll likely need to write your own dataset classes for your data. You should use `MTClassifierDataset` as a template.

In [7]:
# NOTE: this building this dataset will likely not work for you 
# because you don't have access to our data. 
# We do so here simply to demonstrate the structure of the data.
dataset = MTClassifierDataset(**params["dataset_args"], split="test")

In [8]:
dataloader = MTExamDataLoader(dataset=dataset, 
                              num_workers=1, 
                              batch_size=1,
                              sampler="RandomSampler",
                              num_samples=200)
iterator = iter(dataloader)

Let's load an example from the dataloader and examine its structure. Each PET-CT exam is represented by a torch tensor with 4 axes. There's an additional axis for the mini-batch. Its important that your input to the model also match this structure.

In [14]:
inputs, targets, info = iterator.next()
print(f"Input shape: {inputs.shape}")

Input shape: torch.Size([1, 205, 224, 224, 2])


# How to make a prediction?
Let's pass the inputs through the model using the `model.predict` function and examine the output.

In [15]:
output = model.predict(inputs)

# How is output structured?
Let's examine what the model output. 

In [16]:
print(f"Output is of type: {type(output)}.")
print(f"The keys of the dict are: {output.keys()}")

Output is of type: <class 'dict'>.
The keys of the dict are: dict_keys(['full', 'inguinal_lymph_node', 'left_lungs', 'carinal_lymph_node', 'cervical_lymph_node', 'paratracheal_lymph_node', 'right_lungs', 'pelvic_skeleton', 'axillary_lymph_node', 'iliac_lymph_node', 'supraclavicular_lymph_node', 'retroperitoneal_lymph_node', 'mouth', 'liver', 'abdominal_lymph_node', 'hilar_lymph_node', 'pelvis', 'spine', 'lungs', 'head', 'thoracic_lymph_node', 'neck', 'abdomen', 'skeleton', 'head_neck', 'chest'])


*Key:* The model outputs a **dictionary** with keys corresponding to each **task**.
The keys map to the predictions for the task. Let's take a look at the output for the `liver` task. 

In [17]:
output["liver"]

tensor([[0.7053, 0.2947]], device='cuda:3', grad_fn=<SoftmaxBackward>)

Notice that the **targets** (i.e. labels) that we loaded before have a very similar structure as the output.  

In [18]:
print(f"targets is of type: {type(targets)}.")
print(f"The keys of the dict are: {targets.keys()}")

targets is of type: <class 'dict'>.
The keys of the dict are: dict_keys(['full', 'inguinal_lymph_node', 'left_lungs', 'carinal_lymph_node', 'cervical_lymph_node', 'paratracheal_lymph_node', 'right_lungs', 'pelvic_skeleton', 'axillary_lymph_node', 'iliac_lymph_node', 'supraclavicular_lymph_node', 'retroperitoneal_lymph_node', 'mouth', 'liver', 'abdominal_lymph_node', 'hilar_lymph_node', 'pelvis', 'spine', 'lungs', 'head', 'thoracic_lymph_node', 'neck', 'abdomen', 'skeleton', 'head_neck', 'chest'])


In [19]:
targets["liver"]

tensor([[0.9966, 0.0034]])

In this case the target for the liver match the output of the model. 

## How to score the model on a dataset of examples?
What if we want to evaluate the model on a dataset of examples? For this we can use the `model.score` method.

In [20]:
metric_configs = [{'fn': 'accuracy'},
           {'fn': 'roc_auc'},
           {'fn': 'recall'},
           {'fn': 'precision'},
           {'fn': 'f1_score'}]

In [21]:
metrics = model.score(dataloader, metric_configs=metric_configs)

Validation
100%|██████████| 200/200 [03:36<00:00,  1.16s/it]
  'precision', 'predicted', average, warn_for)
  'precision', 'predicted', average, warn_for)


We can take a look and see how the model did for this particular subset of the test set on each of the tasks. 

In [25]:
metrics.metrics

defaultdict(dict,
            {'full': {'accuracy': 0.91,
              'roc_auc': 0.8018925518925519,
              'recall': 1.0,
              'precision': 0.91,
              'f1_score': 0.9528795811518325},
             'inguinal_lymph_node': {'accuracy': 0.945,
              'roc_auc': 0.9163059163059163,
              'recall': 0.6363636363636364,
              'precision': 0.5,
              'f1_score': 0.56},
             'left_lungs': {'accuracy': 0.915,
              'roc_auc': 0.7625679347826086,
              'recall': 0.0625,
              'precision': 0.3333333333333333,
              'f1_score': 0.10526315789473684},
             'carinal_lymph_node': {'accuracy': 0.925,
              'roc_auc': 0.9069293478260869,
              'recall': 0.0625,
              'precision': 1.0,
              'f1_score': 0.11764705882352941},
             'cervical_lymph_node': {'accuracy': 0.915,
              'roc_auc': 0.7785278045644486,
              'recall': 0.0,
              'pr