<a href="https://colab.research.google.com/github/kili-technology/kili-python-sdk/blob/main/recipes/importing_coco.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# How to import COCO annotations into Kili

In this tutorial, we will demonstrate how to import [COCO](https://cocodataset.org/) annotations into Kili.

## Setup

Let's start by install Kili:

In [None]:
%pip install kili[image-utils] numpy

In [None]:
import getpass
import json
import os
from pprint import pprint

import numpy as np

from kili.client import Kili

## Data collection

We will use the [COCO dataset](https://cocodataset.org/#download) to illustrate how to import COCO annotations into Kili.

For this tutorial, we will use a subset of the `val2017` dataset. The full dataset can be downloaded [here](https://cocodataset.org/#download).

In [None]:
!curl https://raw.githubusercontent.com/kili-technology/kili-python-sdk/main/recipes/datasets/coco2017/annotations/captions_val2017_filtered.json --output captions_val2017_filtered.json
!curl https://raw.githubusercontent.com/kili-technology/kili-python-sdk/main/recipes/datasets/coco2017/annotations/instances_val2017_filtered.json --output instances_val2017_filtered.json
!curl https://raw.githubusercontent.com/kili-technology/kili-python-sdk/main/recipes/datasets/coco2017/annotations/person_keypoints_val2017_filtered.json --output person_keypoints_val2017_filtered.json

## COCO format

The format is described [here](https://cocodataset.org/#format-data).

The file `instances_val2017_filtered.json` contains the following keys:

In [None]:
instances_val2017 = json.load(open("instances_val2017_filtered.json"))
print(instances_val2017.keys())

dict_keys(['annotations', 'categories', 'images', 'info', 'licenses'])


Each annotation contains a the image id to which it belongs, the category id, the segmentation and the bounding box:

In [None]:
pprint(instances_val2017["annotations"][0])

{'area': 88.52115000000006,
 'bbox': [102.49, 118.47, 7.9, 17.31],
 'category_id': 64,
 'id': 22328,
 'image_id': 37777,
 'iscrowd': 0,
 'segmentation': [[110.39,
                   135.78,
                   110.39,
                   127.62,
                   110.01,
                   119.6,
                   106.87,
                   118.47,
                   104.37,
                   120.1,
                   102.49,
                   122.73,
                   103.74,
                   125.49,
                   105.24,
                   128.88,
                   106.87,
                   132.39,
                   107.38,
                   135.78,
                   110.39,
                   135.65]]}


We can print the categories of COCO this way:

In [None]:
for category in instances_val2017["categories"]:
    print(category["id"], category["name"])

1 person
2 bicycle
3 car
4 motorcycle
5 airplane
6 bus
7 train
8 truck
9 boat
10 traffic light
11 fire hydrant
13 stop sign
14 parking meter
15 bench
16 bird
17 cat
18 dog
19 horse
20 sheep
21 cow
22 elephant
23 bear
24 zebra
25 giraffe
27 backpack
28 umbrella
31 handbag
32 tie
33 suitcase
34 frisbee
35 skis
36 snowboard
37 sports ball
38 kite
39 baseball bat
40 baseball glove
41 skateboard
42 surfboard
43 tennis racket
44 bottle
46 wine glass
47 cup
48 fork
49 knife
50 spoon
51 bowl
52 banana
53 apple
54 sandwich
55 orange
56 broccoli
57 carrot
58 hot dog
59 pizza
60 donut
61 cake
62 chair
63 couch
64 potted plant
65 bed
67 dining table
70 toilet
72 tv
73 laptop
74 mouse
75 remote
76 keyboard
77 cell phone
78 microwave
79 oven
80 toaster
81 sink
82 refrigerator
84 book
85 clock
86 vase
87 scissors
88 teddy bear
89 hair drier
90 toothbrush


The file `captions_val2017_filtered.json` contains transcription data:

In [None]:
captions_val2017 = json.load(open("captions_val2017_filtered.json"))
print(captions_val2017.keys())

dict_keys(['annotations', 'images', 'info', 'licenses'])


In [None]:
print(captions_val2017["annotations"][0])

{'caption': 'A small closed toilet in a cramped space.', 'id': 441, 'image_id': 331352}


In this dataset, each image has 5 captions given by different annotators.

The file `person_keypoints_val2017_filtered.json` contains keypoints data:

In [None]:
person_keypoints_val2017 = json.load(open("person_keypoints_val2017_filtered.json"))
print(person_keypoints_val2017.keys())

dict_keys(['annotations', 'categories', 'images', 'info', 'licenses'])


In [None]:
pprint(person_keypoints_val2017["annotations"][0])

{'area': 17376.91885,
 'bbox': [388.66, 69.92, 109.41, 277.62],
 'category_id': 1,
 'id': 200887,
 'image_id': 397133,
 'iscrowd': 0,
 'keypoints': [433,
               94,
               2,
               434,
               90,
               2,
               0,
               0,
               0,
               443,
               98,
               2,
               0,
               0,
               0,
               420,
               128,
               2,
               474,
               133,
               2,
               396,
               162,
               2,
               489,
               173,
               2,
               0,
               0,
               0,
               0,
               0,
               0,
               419,
               214,
               2,
               458,
               215,
               2,
               411,
               274,
               2,
               458,
               273,
               2,
               

## Kili project creation

Let's create the Kili project that will contain the images and annotations of the COCO dataset.

In [None]:
if "KILI_API_KEY" not in os.environ:
    KILI_API_KEY = getpass.getpass("Please enter your API key: ")
else:
    KILI_API_KEY = os.environ["KILI_API_KEY"]

In [None]:
kili = Kili(
    api_key=KILI_API_KEY,  # no need to pass the API_KEY if it is already in your environment variables
    # api_endpoint="https://cloud.kili-technology.com/api/label/v2/graphql",
    # the line above can be uncommented and changed if you are working with an on-premise version of Kili
)

In [None]:
json_interface = {"jobs": {}}

In [None]:
json_interface["jobs"]["TRANSCRIPTION_JOB"] = {
    "content": {"input": "textField"},
    "instruction": "Caption",
    "mlTask": "TRANSCRIPTION",
    "required": 1,
    "isChild": False,
}

In [None]:
category_id_to_name = {
    category["id"]: category["name"] for category in instances_val2017["categories"]
}

In [None]:
categories = {
    category["name"]: {"children": [], "name": category["name"], "id": category["id"]}
    for category in instances_val2017["categories"]
}
pprint(categories)

{'airplane': {'children': [], 'id': 5, 'name': 'airplane'},
 'apple': {'children': [], 'id': 53, 'name': 'apple'},
 'backpack': {'children': [], 'id': 27, 'name': 'backpack'},
 'banana': {'children': [], 'id': 52, 'name': 'banana'},
 'baseball bat': {'children': [], 'id': 39, 'name': 'baseball bat'},
 'baseball glove': {'children': [], 'id': 40, 'name': 'baseball glove'},
 'bear': {'children': [], 'id': 23, 'name': 'bear'},
 'bed': {'children': [], 'id': 65, 'name': 'bed'},
 'bench': {'children': [], 'id': 15, 'name': 'bench'},
 'bicycle': {'children': [], 'id': 2, 'name': 'bicycle'},
 'bird': {'children': [], 'id': 16, 'name': 'bird'},
 'boat': {'children': [], 'id': 9, 'name': 'boat'},
 'book': {'children': [], 'id': 84, 'name': 'book'},
 'bottle': {'children': [], 'id': 44, 'name': 'bottle'},
 'bowl': {'children': [], 'id': 51, 'name': 'bowl'},
 'broccoli': {'children': [], 'id': 56, 'name': 'broccoli'},
 'bus': {'children': [], 'id': 6, 'name': 'bus'},
 'cake': {'children': [], 'id

In [None]:
json_interface["jobs"]["OBJECT_DETECTION_JOB"] = {
    "content": {"categories": categories, "input": "radio"},
    "instruction": "BBox",
    "mlTask": "OBJECT_DETECTION",
    "required": 0,
    "tools": ["rectangle"],
    "isChild": False,
}

In [None]:
json_interface["jobs"]["SEGMENTATION_JOB"] = {
    "content": {"categories": categories, "input": "radio"},
    "instruction": "Segment",
    "mlTask": "OBJECT_DETECTION",
    "required": 0,
    "tools": ["semantic"],
    "isChild": False,
}

In [None]:
project = kili.create_project(
    title="[Kili SDK Notebook]: COCO 2017",
    input_type="IMAGE",
    json_interface=json_interface,
)

## Importing images

Now that our project is created, let's import the images:

In [None]:
content_array = []
external_id_array = []
for image in instances_val2017["images"]:
    content_array.append(image["flickr_url"].replace("http://", "https://"))
    external_id_array.append(str(image["id"]))

In [None]:
kili.append_many_to_dataset(
    project["id"], content_array=content_array, external_id_array=external_id_array
)

100%|██████████| 10/10 [00:01<00:00,  8.93it/s]


{'id': 'cljilughf00kc0j5yh367gp3v'}

![image.png](attachment:02ee1a22-27b9-4889-a87b-2b31a91aa8fa.png)

## Importing annotations

In [None]:
import itertools

from kili.utils.labels.bbox import (
    bbox_points_to_normalized_vertices,
    point_to_normalized_point,
)
from kili.utils.labels.image import mask_to_normalized_vertices

In [None]:
json_response_array = []

for image_id in external_id_array:
    img_info = [img for img in instances_val2017["images"] if img["id"] == int(image_id)][0]
    img_width = img_info["width"]
    img_height = img_info["height"]

    # json response contains the label data for the image
    json_resp = {}

    # Transcription job
    img_captions = [
        ann for ann in captions_val2017["annotations"] if ann["image_id"] == int(image_id)
    ]
    json_resp["TRANSCRIPTION_JOB"] = {
        "text": img_captions[0]["caption"]  # only take the 1st caption
    }

    # Object detection and segmentation jobs
    coco_annotations = [
        ann for ann in instances_val2017["annotations"] if ann["image_id"] == int(image_id)
    ]
    kili_annotations_bbox = []
    kili_annotations_segm = []
    for coco_ann in coco_annotations:
        # Object detection job
        x, y, width, height = coco_ann["bbox"]
        kili_ann_bbox = {
            "children": {},
            "boundingPoly": [
                {
                    "normalizedVertices": bbox_points_to_normalized_vertices(
                        bottom_left={"x": x, "y": y + height},
                        bottom_right={"x": x + width, "y": y + height},
                        top_right={"x": x + width, "y": y},
                        top_left={"x": x, "y": y},
                        img_height=img_height,
                        img_width=img_width,
                        origin_location="top_left",
                    )
                }
            ],
            "categories": [{"name": category_id_to_name[coco_ann["category_id"]]}],
            "type": "rectangle",
            "mid": str(coco_ann["id"]) + "_bbox",
        }
        kili_annotations_bbox.append(kili_ann_bbox)

        # Segmentation job
        coco_segmentations = coco_ann["segmentation"]
        if coco_ann["iscrowd"] == 0:
            # a single object (iscrowd=0 in which case polygons are used)
            for coco_segm in coco_segmentations:
                kili_ann_segm = {
                    "children": {},
                    "boundingPoly": [
                        {
                            "normalizedVertices": [
                                point_to_normalized_point(
                                    point={"x": x, "y": y},
                                    img_height=img_height,
                                    img_width=img_width,
                                    origin_location="top_left",
                                )
                                for x, y in itertools.zip_longest(*[iter(coco_segm)] * 2)
                            ]
                        }
                    ],
                    "categories": [{"name": category_id_to_name[coco_ann["category_id"]]}],
                    "type": "semantic",
                    "mid": str(coco_ann["id"]) + "_segm",
                }
                kili_annotations_segm.append(kili_ann_segm)
        else:
            # a crowd (iscrowd=1 in which case RLE (run-length encoding) is used)
            rle_counts = coco_segmentations["counts"]
            # we work with a flat image to simplify the code
            mask = np.zeros(img_height * img_width, dtype=np.uint8)
            pixel_index = 0
            for i, count in enumerate(rle_counts):
                if i % 2 == 0:
                    # we skip pixels
                    pixel_index += count
                else:
                    # we set pixels' value
                    mask[pixel_index : pixel_index + count] = 255
                    pixel_index += count

            # we reshape the mask to its original shape
            # and we transpose it to have the same shape as the image
            # (i.e. (height, width))
            mask = mask.reshape((img_width, img_height)).T

            # we convert the mask to normalized vertices
            # hierarchy is not used here. It is used for polygons with holes.
            normalized_vertices, hierarchy = mask_to_normalized_vertices(mask)
            for contour in normalized_vertices:
                kili_ann_segm = {
                    "children": {},
                    "boundingPoly": [{"normalizedVertices": contour}],
                    "categories": [{"name": category_id_to_name[coco_ann["category_id"]]}],
                    "type": "semantic",
                    "mid": str(coco_ann["id"]) + "_segm_crowd",
                }
                kili_annotations_segm.append(kili_ann_segm)

    json_resp["OBJECT_DETECTION_JOB"] = {"annotations": kili_annotations_bbox}
    json_resp["SEGMENTATION_JOB"] = {"annotations": kili_annotations_segm}

    json_response_array.append(json_resp)

In [None]:
kili.append_labels(
    asset_external_id_array=external_id_array,
    project_id=project["id"],
    json_response_array=json_response_array,
)

100%|██████████| 10/10 [00:01<00:00,  9.32it/s]


[{'id': 'cljinqsjc00050jb98m9c58j7'},
 {'id': 'cljinqsjc00060jb90cgb68bs'},
 {'id': 'cljinqsjd00070jb99t2bg1po'},
 {'id': 'cljinqsjd00080jb947f6et8k'},
 {'id': 'cljinqsjd00090jb9d6nl4mj9'},
 {'id': 'cljinqsjd000a0jb98k28cdmu'},
 {'id': 'cljinqsjd000b0jb98hoofpwf'},
 {'id': 'cljinqsjd000c0jb9elhih8la'},
 {'id': 'cljinqsjd000d0jb9733ldicu'},
 {'id': 'cljinqsjd000e0jb9emvuclwn'}]