<a href="https://colab.research.google.com/github/kili-technology/kili-python-sdk/blob/master/recipes/label_parsing.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# How to use the label parser

In [None]:
%pip install kili

In [None]:
from kili.client import Kili
from kili.utils.labels.parsing import ParsedLabel

In [None]:
kili = Kili()

## `ParsedLabel` class

The `ParsedLabel` class represents a Kili label.

This class directly inherits from `dict`, and thus behaves like a dictionary.

In [None]:
print(ParsedLabel.__bases__[0])

<class 'dict'>


Converting a label to a `ParsedLabel` is as simple as:

In [None]:
json_interface = {
    "jobs": {
        "CLASSIFICATION_JOB": {
            "content": {
                "categories": {
                    "A": {"children": [], "name": "A"},
                    "B": {"children": [], "name": "B"},
                },
                "input": "radio",
            },
            "instruction": "Class",
            "mlTask": "CLASSIFICATION",
            "required": 1,
            "isChild": False,
        }
    }
}

my_label = {
    "author": {"email": "first.last@kili-technology.com", "id": "123456"},
    "id": "clh0fsi9u0tli0j666l4sfhpz",
    "jsonResponse": {"CLASSIFICATION_JOB": {"categories": [{"confidence": 100, "name": "A"}]}},
    "labelType": "DEFAULT",
    "secondsToLabel": 5,
}

my_parsed_label = ParsedLabel(my_label, json_interface=json_interface, input_type="IMAGE")

In [None]:
print(my_parsed_label["author"]["email"])

'first.last@kili-technology.com'

The `jsonResponse` dict key is not accessible anymore:

In [None]:
try:
    my_parsed_label["jsonResponse"]
except KeyError as err:
    print(f"The key {err} is not accessible anymore.")

The key 'jsonResponse' is not accessible anymore.


It is replaced with the `.jobs` attribute instead.

## `.jobs` attribute

The `.jobs` attribute of a `ParsedLabel` class is a dictionary-like object that contains the parsed labels.

The keys are the names of the jobs, and the values are the parsed job responses.

Let's create a simple Kili project to illustrate this.

## Classification job

We define a json interface for a two classification jobs:

- a single-class classification job, with name `SINGLE_CLASS_JOB` and three categories `A`, `B` and `C`
- a multi-class classification job, with name `MULTI_CLASS_JOB` and three categories `D`, `E` and `F`.

In [None]:
json_interface = {
    "jobs": {
        "SINGLE_CLASS_JOB": {
            "content": {
                "categories": {
                    "A": {"children": [], "name": "A"},
                    "B": {"children": [], "name": "B"},
                    "C": {"children": [], "name": "C"},
                },
                "input": "radio",
            },
            "instruction": "Class",
            "mlTask": "CLASSIFICATION",
            "required": 1,
            "isChild": False,
        },
        "MULTI_CLASS_JOB": {
            "content": {
                "categories": {
                    "D": {"children": [], "name": "D"},
                    "E": {"children": [], "name": "E"},
                    "F": {"children": [], "name": "F"},
                },
                "input": "checkbox",
            },
            "instruction": "Class",
            "mlTask": "CLASSIFICATION",
            "required": 1,
            "isChild": False,
        },
    }
}
project_id = kili.create_project(
    input_type="TEXT", json_interface=json_interface, title="Label parsing tutorial"
)["id"]

We also upload some assets to the project:

In [None]:
kili.append_many_to_dataset(
    project_id,
    content_array=["text1", "text2", "text3"],
    external_id_array=["asset1", "asset2", "asset3"],
);

100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:02<00:00,  1.43it/s]


{'id': 'clh0hbf8e12hv0j960bzya8wm'}

Once the assets are uploaded, we can start labeling them manually through the Kili UI.

For this tutorial, we will just only upload already existing labels.

In [None]:
labels_to_upload = [
    {
        "SINGLE_CLASS_JOB": {"categories": [{"confidence": 75, "name": "A"}]},
        "MULTI_CLASS_JOB": {
            "categories": [{"confidence": 1, "name": "D"}, {"confidence": 1, "name": "E"}]
        },
    },
    {
        "SINGLE_CLASS_JOB": {"categories": [{"confidence": 50, "name": "B"}]},
        "MULTI_CLASS_JOB": {
            "categories": [{"confidence": 2, "name": "E"}, {"confidence": 2, "name": "F"}]
        },
    },
    {
        "SINGLE_CLASS_JOB": {"categories": [{"confidence": 25, "name": "C"}]},
        "MULTI_CLASS_JOB": {
            "categories": [{"confidence": 3, "name": "F"}, {"confidence": 3, "name": "D"}]
        },
    },
]
kili.append_labels(
    json_response_array=labels_to_upload,
    project_id=project_id,
    asset_external_id_array=["asset1", "asset2", "asset3"],
);

100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00,  4.54it/s]


[{'id': 'clh0hbi8u12j30j965fe86ss6'},
 {'id': 'clh0hbi8u12j40j969nvx8vj5'},
 {'id': 'clh0hbi8u12j50j96holkcb2j'}]

When querying labels using `kili.labels()`, it is possible to automatically parse the labels using the `output_format` argument:

In [None]:
# labels is a list of ParsedLabel object
labels = kili.labels(project_id, output_format="parsed_label")

100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00,  3.20it/s]


In [None]:
print(len(labels))

3


In [None]:
print(type(labels[0]))

<class 'kili.utils.labels.parsing.ParsedLabel'>


Using the `.jobs` attribute with the job name, one can access the label's data:

In [None]:
print(labels[0].jobs["SINGLE_CLASS_JOB"])

{'categories': [{'name': 'A', 'confidence': 75}]}


In [None]:
print(labels[0].jobs["SINGLE_CLASS_JOB"].categories)

[{'name': 'A', 'confidence': 75}]


In [None]:
print(labels[0].jobs["SINGLE_CLASS_JOB"].categories[0].name)

A


Since `SINGLE_CLASS_JOB` is a single-category classification job, the `.category` attribute is available, and is an alias for `.categories[0]`:

In [None]:
print(labels[0].jobs["SINGLE_CLASS_JOB"].category.name)
print(labels[0].jobs["SINGLE_CLASS_JOB"].category.confidence)

A
75


The `.category` attribute is forbidden for multi-categories classification jobs:

In [None]:
try:
    print(labels[0].jobs["MULTI_CLASS_JOB"].category.name)
except Exception as err:
    print("Error: ", err)

Error:  The attribute 'category' is not compatible with the job.


It is also possible to iterate over the job names:

In [None]:
for i, label in enumerate(labels):
    print(f"\nLabel {i}")
    for job_name, job_data in label.jobs.items():
        print(job_name, ": ", job_data.categories)


Label 0
SINGLE_CLASS_JOB :  [{'name': 'A', 'confidence': 75}]
MULTI_CLASS_JOB :  [{'name': 'D', 'confidence': 1}, {'name': 'E', 'confidence': 1}]

Label 1
SINGLE_CLASS_JOB :  [{'name': 'B', 'confidence': 50}]
MULTI_CLASS_JOB :  [{'name': 'E', 'confidence': 2}, {'name': 'F', 'confidence': 2}]

Label 2
SINGLE_CLASS_JOB :  [{'name': 'C', 'confidence': 25}]
MULTI_CLASS_JOB :  [{'name': 'F', 'confidence': 3}, {'name': 'D', 'confidence': 3}]


## Convert to Python dict

A `ParsedLabel` is a custom class and is not serializable by default. However, it is possible to convert it to a Python dict using the `to_dict` method:

In [None]:
label = labels[0]
print(type(label))

<class 'kili.utils.labels.parsing.ParsedLabel'>


In [None]:
label_as_dict = label.to_dict()
print(type(label_as_dict))

<class 'dict'>


## Object detection job

## Transcription job

## Video job

## Named entities recognition job

## Named entities recognition in PDF job

## Relation job

### Named entities relation job

### Object detection relation job

## Pose estimation job

## Children jobs