## Working with patient datasets

This notebook provides an introduction to working with patient datasets using `scikit-rt`.

Documentation for `scikit-rt` is at:  
[https://scikit-rt.github.io/scikit-rt/](https://scikit-rt.github.io/scikit-rt/)

This notebook uses the dataset:

Peihan Li, "SPECT_CT_data.zip", Figshare dataset (2020)  
https://doi.org/10.6084/m9.figshare.12579707.v1

If not already present on the computer where this notebook is run, the dataset will be downloaded to the directory specified by `topdir` in the
first code cell below.  The download file has a size of 1.6 GB, so download may take a while.

## Module import and data download

The following imports modules needed for this example, defines the path
to the data directory, downloads the example dataset if not already present, obtains the list of paths to patient folders, and sets some viewer options.

In [None]:
from pathlib import Path
from skrt import set_viewer_options, BetterViewer, Patient
from skrt.core import alphanumeric, compress_user, Defaults

# Define URL of source dataset, and local data directory.
url = "https://figshare.com/ndownloader/files/23528954/SPECT_CT_data.zip"
topdir = Path("~/data/spect_ct").expanduser()
datadir = topdir / Path(url).stem

# Download dataset if not already present.
if not datadir.exists():
    download(url, topdir, unzip=True)
    
# Obtain sorted list of paths to patient folders.
paths = sorted(list(datadir.glob("0*")))
    
# Set Matplotlib runtime configuration (optional).
set_viewer_options()

# In place of interactive images, display static graphics that can be saved with notebook.
# Defaults().no_ui = True
# Omit user part of paths when printing object attributes.
Defaults().compress_user = True

## Utility functions

The following utility functions are defined for this notebook.
- get_n_file(): count the number of files below a directory;
- print_paths(): print file paths below a directory, printing one file per line, and with optional maximum.

In [None]:
def get_n_file(data_dir):
    """
    Count number of files below a directory, ignoring hidden files.
    
    **Parameter:**
    data_dir: str, pathlib.Path
        Path to directory below which files are to be counted.
    """
    return len([path for path in Path(data_dir).glob("**/[!.]*") if path.is_file()]) 

def print_paths(data_dir, max_path=None):
    """
    Print paths to files below a directory, ignoring hidden files.
    
    File paths are listed in natural order, with one path per line.
    
    **Parameters:**
    data_dir: str, pathlib.Path
        Path to directory below which file paths are to be printed.
        
    max_path: int/None, default=None
        Indication of maximum number of paths to print.  If a positive
        integer, the first <max_path> paths are printed.  If a negative
        integer, the last <max_path> paths are printed.  If None,
        all paths are printed.
    """
    local_paths = sorted(list(Path(data_dir).glob("**/[!.]*")), key=alphanumeric)
    if max_path is None:
        selected_paths = local_paths
    else:
        if max_path >= 0:
            selected_paths = local_paths[: max_path]
        else:
            selected_paths = local_paths[max_path:]
    for path in selected_paths:
        print(compress_user(path))

## Unsorted DICOM data

Each element of the list `paths` identifies a folder containing unsorted DICOM data for a single patient.  Choose any element, then print the number of files that it containts, and a listing of the file paths, for example:

```
idx = 4
print_paths(paths[idx])
```

Q1: What types of data can you identify from the listing of file paths?

Read the data from the chosen path, and print the study attributes, for example:

```
p1 = Patient(paths[idx], unsorted_dicom=True)
print(p1.get_studies())
```

This will show the images, structure sets, doses, plans in each study.  The attributes include a number of unique identifiers (UIDs), used in the data sorting.  In a future release of `scikit-rt`, these are likely to be deleted once the sorting is completed. 

## Sorted DICOM data

Copy the data, sorted by data types, for example:

```
sorted_dir = f"sorted_{p1.id}"
p1.copy_dicom(outdir)
```

Warnings about missing DICOM tags may be produced during copying, but these can be ignored.

Print the number of sorted files, and a listing of the file paths.

Q2: What might cause the number of sorted files to be different from the number of unsorted files?

Q3: What types of data can you identify from the listing of sorted files?

Read the sorted data, and print the study attributes.

Q4: What differences are there between the study attributes for sorted data and those for unsorted data?

## Accessing data objects

The images, structure sets, doses and plans associated with a patient (across all studies) or with a single study can be accessed with the methods:

```
get_images()
get_structure_sets()
get_doses()
get_plans()
```

It's possible to require that the objects returned relate to one or more imaging modalities, or that the objects be linked to other types of objects, for example:

```
# Obtain ct images with associated structure set(s) and dose(s).
get_images("ct", associations=["structure_sets", "doses"])
```

For the patient dataset that you chose earlier, print the lengths of the lists of:

- all images;
- all images of type "ct";
- all images of type "ct" with associated structure set(s) and dose(s).

## Viewing data objects

Images, structure sets and doses can be viewed interactively using the `BetterViewer` class, for example:

```
structure_set = p1.get_structure_sets("ct")[0]
image = structure_set.get_image()
BetterViewer(images=[image], rois=[structure_set])
```

If more than one image is passed as argument, the images are displayed side by side.

Images, structure sets and doses have a `view()` method, which creates a `BetterViewer` instance, passing the calling object as argument, for example:

```
image.view(rois=structure_set)
```

Try viewing, side by side, a ct image, and a ct image with structure set superimposed.

## Filtering structure sets and standardising ROI names

When working with structure sets, it can be useful to filter so as to have ROIs relevant to an anlysis.  The same ROI may be labelled differently in different structure sets, so standardising names can also be useful.  A new structure set can be obtained, with the original left unaltered, for example:

```
filtered_structure_set = p1.get_structure_sets("ct")[0].filtered_copy(
    names={"spinal_cord": "cord*"}, keep_renamed_only=True)
```

Try creating and viewing a structure set that contains "heart", "lung_left", "lung_right", "spinal_cord".  Include a legend identifying the ROIs.

Q5: What happens to the legend as you scroll through the image?