# Kili Tutorial: Importing medical data into a video project 

In this tutorial, we will show you how to import dicom data into a [Video Kili project](https://docs.kili-technology.com/docs/labeling-video-assets#grouping-objects-on-multiple-frames). Such projects allow you to annotate volumes of image data.

The data we use comes from [The Cancer Genome Atlas Lung Adenocarcinoma (TCGA-LUAD) data collection](https://wiki.cancerimagingarchive.net/display/Public/TCGA-LUAD). We selected 3 scans out of this dataset.

## Downloading data

Let's first import the scans. We host these files in a .zip on GDrive.

In [None]:
import os
import shutil
import subprocess

import requests
import tqdm

In [None]:
if "recipes" in os.getcwd():
    os.chdir("..")

In [None]:
os.makedirs(os.path.expanduser("~/Downloads"), exist_ok=True)

We will use a small package to help downloading the file hosted on Google Drive

In [None]:
content_url = "https://storage.googleapis.com/label-public-staging/recipes/assets/TCGA-LUAD.zip"
with open(os.path.expanduser("~/Downloads/TCGA-LUAD.zip"), "wb") as f:
    content = requests.get(content_url).content
    f.write(content)

In [None]:
shutil.unpack_archive(
    os.path.expanduser("~/Downloads/TCGA-LUAD.zip"), os.path.expanduser("~/Downloads/")
)

## Reading data

We can then read the dicom files with [pydicom](https://pydicom.github.io/pydicom/stable/).

In [None]:
ASSET_ROOT = os.path.expanduser("~/Downloads/TCGA-LUAD")

sorted_files = {}
asset_number = 0
for root, dirs, files in os.walk(ASSET_ROOT):
    if len(files) > 0:
        file_paths = list(map(lambda path: os.path.join(root, path), files))
        sorted_files[f"asset-{asset_number+1}"] = sorted(
            [fp for fp in file_paths if fp.endswith(".dcm")],
            key=lambda path: int(os.path.split(path)[-1].split("-")[1].split(".")[0]),
        )
        asset_number += 1

Let's see what is inside the dataset :

In [None]:
%pip install Pillow pydicom
import pydicom
from PIL import Image


def read_dcm_image(path):
    dicom = pydicom.dcmread(path)
    image = dicom.pixel_array
    # Currently, Kili does not support windowing in the application.
    # This will soon change, but until then we advise you to reduce the range to 256 values.
    image = (image - image.min()) / (image.max() - image.min()) * 256
    return Image.fromarray(image).convert("RGB")


for asset_key in sorted_files.keys():
    print(asset_key)
    im = read_dcm_image(sorted_files[asset_key][20])
    im.save(f"./recipes/img/frame_dicom_data_{asset_key}.png")

![asset-1](./img/frame_dicom_data_asset-1.png)

![asset-2](./img/frame_dicom_data_asset-2.png)

![asset-3](./img/frame_dicom_data_asset-3.png)

## Extracting and serving images

For each of the dicom `.dcm` files, let's extract its content (image) and save it into a `.jpeg` image.

In [None]:
sorted_images = {}
for asset_key, files in sorted_files.items():
    images = []
    for file in tqdm.tqdm(files):
        print(file)
        im = read_dcm_image(file)
        im_file = file.replace(".dcm", ".jpeg")
        im.save(im_file, format="JPEG")
        images.append(im_file)
    sorted_images[asset_key] = images

We now have extracted jpeg images processable by Kili.

## Creating the project

We can now import those assets into a VIDEO project !

Let's begin by creating a project

In [None]:
## You can also directly create the interface on the application.
interface = {
    "jobs": {
        "JOB_0": {
            "mlTask": "OBJECT_DETECTION",
            "tools": ["rectangle"],
            "instruction": "Segment the right class",
            "required": 1,
            "isChild": False,
            "content": {
                "categories": {
                    "BONE": {"name": "Bone", "children": [], "color": "#0755FF"},
                    "LUNG": {"name": "Lung", "children": [], "color": "#EEBA00"},
                    "TISSUE_0": {"name": "Tissue", "children": [], "color": "#941100"},
                },
                "input": "radio",
            },
        }
    }
}

In [None]:
## Authentication
from kili.client import Kili

api_endpoint = os.getenv(
    "KILI_API_ENDPOINT"
)  # If you use Kili SaaS, use the url 'https://cloud.kili-technology.com/api/label/v2/graphql'
kili = Kili(api_endpoint=api_endpoint)

## Project creation
project = kili.create_project(
    description="Demo Video project",
    input_type="VIDEO",
    json_interface=interface,
    title="[Kili SDK Notebook]: Frame DICOM data",
)
project_id = project["id"]

## Importing images

Finally, let's import the volumes using `appendManyToDataset` (see [link](https://staging.cloud.kili-technology.com/docs/python-graphql-api/python-api/#append_many_to_dataset)). The key argument is `json_content_array`, which is a list of list of strings. Each element is the list of urls or paths pointing to images of the volume considered.
 - Let's host these images locally to demonstrate how we would do it with cloud URLs for example :

In [None]:
subprocess.Popen(
    f"python -m http.server 8001 --directory {ASSET_ROOT}",
    shell=True,
    stdin=None,
    stdout=None,
    stderr=None,
    close_fds=True,
)
ROOT_URL = "http://localhost:8001/"

In [None]:
def files_to_urls(files):
    return list(map(lambda file: ROOT_URL + file.split("TCGA-LUAD")[1], files))

In [None]:
kili.append_many_to_dataset(
    project_id=project_id,
    external_id_array=list(sorted_images.keys()),
    json_content_array=list(map(files_to_urls, sorted_images.values())),
)

Or, as mentionned, you can simply provide the paths to your images, and call the function like below : 

In [None]:
kili.append_many_to_dataset(
    project_id=project_id,
    external_id_array=list(map(lambda key: f"local-path-{key}", sorted_images.keys())),
    json_content_array=list(sorted_images.values()),
)

## Back to the interface

We can see our assets were imported...

In [None]:
ds_size = kili.count_assets(project_id=project_id)
print(ds_size)
assert ds_size == 6

![assets_inserted](img/assets_inserted.png)

...we can now annotate those assets !

![frame_annotation](img/frame_annotation.png)

In [None]:
kili.delete_project(project_id=project_id)