# How to export a Kili project

## Outline

This tutorial explains the multiple ways to export a Kili project. It describes:

 * Methods to export the labels one by one, after filtering
 * The solutions for performing a full-project export

The methods are illustrated with code snippets.

## Export methods

With Kili, once you have annotated enough assets, you can export the data programmatically to train a machine learning algorithm with it. There are several ways to do it:

 * Fetch the assets and/or the labels one by one using [`.assets`](https://python-sdk-docs.kili-technology.com/latest/sdk/asset/#kili.queries.asset.__init__.QueriesAsset.assets) or [`.labels`](https://python-sdk-docs.kili-technology.com/latest/sdk/label/#kili.queries.label.__init__.QueriesLabel.labels), perform the data transformation yourself and then write the data to one or several output files.
 * Export the whole project as a dataset. To do that, use the [`.export_labels`](https://python-sdk-docs.kili-technology.com/latest/sdk/label/#kili.queries.label.__init__.QueriesLabel.export_labels) method that creates an archive containing the labels in your chosen format.

## Preliminary steps

1) Fetch the project ID from the Kili UI (in Settings / Admin):
![Get Project ID from UI](../../assets/get_project_id.jpg)

2) Ensure that your Kili API key has been set as an environment variable:
```bash
export KILI_API_KEY=<YOUR_API_KEY>
```

3) If Kili has not been installed yet, install Kili.

In [None]:
!pip install --upgrade kili

4) Import packages and instantiate `Kili`:

In [71]:
from kili.client import Kili
from pathlib import Path

kili = Kili()

## Setup a project

In [72]:
# create project
json_interface_object_detection = {
    "jobs": {
        "JOB_0": {
            "content": {
                "categories": {
                    "OBJECT_A": {"children": [], "name": "Object A", "color": "#733AFB"},
                    "OBJECT_B": {"children": [], "name": "Object B", "color": "#3CD876"},
                },
                "input": "radio",
            },
            "instruction": "Categories",
            "isChild": False,
            "tools": ["rectangle"],
            "mlTask": "OBJECT_DETECTION",
            "models": {},
            "isVisible": True,
            "required": 1,
        }
    }
}
your_project_id = kili.create_project(
    input_type="IMAGE",
    json_interface=json_interface_object_detection,
    title="export kili project tutorial",
)["id"]
kili.update_properties_in_project(
    your_project_id,
    use_honeypot=True,
    honeypot_mark=0.95,
    consensus_mark=0.95,
    min_consensus_size=2,
    consensus_tot_coverage=1,
)

# add assets
assets = kili.append_many_to_dataset(
    project_id=your_project_id,
    content_array=[
        "https://storage.googleapis.com/label-public-staging/car/car_1.jpg",
        "https://storage.googleapis.com/label-public-staging/car/car_2.jpg",
    ],
    external_id_array=["car_1", "car_2"],
    disable_tqdm=True,
    is_honeypot_array=[True, True],
)
kili.update_properties_in_assets(
    external_ids=["car_1"], is_used_for_consensus_array=[True], project_id=your_project_id
)

# add members
roles = kili.append_to_roles(
    project_id=your_project_id, user_email="john.doe@kili-technology.com", role="LABELER"
)
john_doe_id = [
    user for user in roles["roles"] if user["user"]["email"] == "john.doe@kili-technology.com"
][0]["user"]["id"]
roles = kili.append_to_roles(
    project_id=your_project_id, user_email="john.smith@kili-technology.com", role="LABELER"
)
john_smith_id = [
    user for user in roles["roles"] if user["user"]["email"] == "john.smith@kili-technology.com"
][0]["user"]["id"]

# add honeypot label
json_response_ground_truth = {
    "JOB_0": {
        "annotations": [
            {
                "categories": [{"name": "OBJECT_A"}],
                "mid": "20230111125258113-44528",
                "type": "rectangle",
                "boundingPoly": [
                    {
                        "normalizedVertices": [
                            {"x": 0.6101435505380516, "y": 0.7689773770786136},
                            {"x": 0.6101435505380516, "y": 0.39426226491370664},
                            {"x": 0.8962087421313937, "y": 0.39426226491370664},
                            {"x": 0.8962087421313937, "y": 0.7689773770786136},
                        ]
                    }
                ],
                "polyline": [],
                "children": {},
            }
        ]
    }
}
kili.create_honeypot(
    json_response_ground_truth, asset_external_id="car_1", asset_id=None, project_id=your_project_id
)

# add some annotations
json_response_labeler_a = {
    "JOB_0": {
        "annotations": [
            {
                "boundingPoly": [
                    {
                        "normalizedVertices": [
                            {"x": 0.437473570026755, "y": 0.662407024077585},
                            {"x": 0.437473570026755, "y": 0.28769191191267807},
                            {"x": 0.7235387616200971, "y": 0.28769191191267807},
                            {"x": 0.7235387616200971, "y": 0.662407024077585},
                        ]
                    }
                ],
                "categories": [{"name": "OBJECT_A"}],
                "mid": "20230111125258113-44528",
                "type": "rectangle",
                "children": {},
            }
        ]
    }
}
json_response_labeler_b = {
    "JOB_0": {
        "annotations": [
            {
                "boundingPoly": [
                    {
                        "normalizedVertices": [
                            {"x": 0.537473570026755, "y": 0.662407024077585},
                            {"x": 0.537473570026755, "y": 0.28769191191267807},
                            {"x": 0.7235387616200971, "y": 0.28769191191267807},
                            {"x": 0.7235387616200971, "y": 0.662407024077585},
                        ]
                    }
                ],
                "categories": [{"name": "OBJECT_A"}],
                "mid": "20230111125258113-44528",
                "type": "rectangle",
                "children": {},
            }
        ]
    }
}
kili.append_labels(
    asset_external_id_array=["car_1", "car_1"],
    json_response_array=[json_response_labeler_a, json_response_labeler_b],
    disable_tqdm=True,
    author_id_array=[john_doe_id, john_smith_id],
    project_id=your_project_id,
);

## Exporting assets and labels one by one

To retrieve all assets of a project one by one, perform the following steps:

### Exporting the latest labels per asset

First, fetch the assets:

In [73]:
assets = kili.assets(your_project_id, fields=["externalId", "latestLabel.jsonResponse"])

100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00,  4.01it/s]


Now if you print an asset, you will see that you can access its `latestLabel`:

In [74]:
print(assets[0])

{'latestLabel': {'jsonResponse': {'JOB_0': {'annotations': [{'categories': [{'name': 'OBJECT_A'}], 'mid': '20230111125258113-44528', 'type': 'rectangle', 'boundingPoly': [{'normalizedVertices': [{'x': 0.6101435505380516, 'y': 0.7689773770786136}, {'x': 0.6101435505380516, 'y': 0.39426226491370664}, {'x': 0.8962087421313937, 'y': 0.39426226491370664}, {'x': 0.8962087421313937, 'y': 0.7689773770786136}]}], 'polyline': [], 'children': {}}]}}}, 'externalId': 'car_1'}


You can now get your label, and write the category name into a text file for example:

In [75]:
for asset in assets:
    if asset["latestLabel"]:  # check if asset has annotations
        class_ = asset["latestLabel"]["jsonResponse"]["JOB_0"]["annotations"][0]["categories"][0][
            "name"
        ]
        with Path(asset["externalId"] + ".txt").open("w", encoding="utf-8") as f:
            f.write(class_)

### Filtering specific labels per asset through the method filters

You can specify label filters directly in the [`.assets`](https://python-sdk-docs.kili-technology.com/latest/sdk/asset/#kili.queries.asset.__init__.QueriesAsset.assets) and the [`.labels`](https://python-sdk-docs.kili-technology.com/latest/sdk/label/#kili.queries.label.__init__.QueriesLabel.labels) methods. The available filters are listed in the arguments list
for each one of these methods.

When done, you can write the conversion code to get the data in the format that you need.

**Get only the assets with a consensus mark above 0.5:**

In [76]:
assets = kili.assets(
    your_project_id, fields=["externalId", "id", "consensusMark"], consensus_mark_gt=0.5
)
print(assets)
# + asset conversion code

100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  1.92it/s]

[{'externalId': 'car_1', 'id': 'clcrqd4ux001gj7vzb6undd6p', 'consensusMark': 0.6504290982818591}]





**Get all the labels with a honeypot mark above 0.1:**

In [77]:
labels = kili.labels(
    your_project_id,
    fields=["labelOf.externalId", "honeypotMark", "author.email", "id"],
    honeypot_mark_gte=0.1,
)
print(labels)
# + label conversion code

100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00,  7.97it/s]

[{'author': {'email': 'john.doe@kili-technology.com'}, 'labelOf': {'externalId': 'car_1'}, 'honeypotMark': 0.16527040499137607, 'id': 'clcrqd79309zw0lq23m0yentr'}, {'author': {'email': 'john.smith@kili-technology.com'}, 'labelOf': {'externalId': 'car_1'}, 'honeypotMark': 0.20754115450190522, 'id': 'clcrqd79309zx0lq25wny43sf'}]





**Get all the labels added by a specific project member:**

In [78]:
labels = kili.labels(
    your_project_id, fields=["labelOf.externalId", "author.email", "id"], user_id=john_doe_id
)
print(labels)
# + label conversion code

100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  4.02it/s]

[{'author': {'email': 'john.doe@kili-technology.com'}, 'labelOf': {'externalId': 'car_1'}, 'id': 'clcrqd79309zw0lq23m0yentr'}]





This code will return a list of labels authored by John Doe.

One can also use the `author_in` parameter to filter by name directly.

### Filtering specific labels per asset through the label properties

You can also look for specific labels, for example the last "review" status label per user, and dump the result into a json file. You can use the field `"labels.isLatestReviewLabelForUser"` to check if the label is the latest per user.

In [79]:
import json

assets = kili.assets(
    your_project_id,
    fields=["externalId", "labels.jsonResponse", "labels.isLatestReviewLabelForUser"],
)

for asset in assets:
    if asset["labels"]:  # check if asset has annotations
        for label in asset["labels"]:
            if label["isLatestReviewLabelForUser"] and "JOB_0" in label["jsonResponse"]:
                annotation = label["jsonResponse"]["JOB_0"]
                with Path(asset["externalId"] + ".json").open("w", encoding="utf-8") as f:
                    f.write(json.dumps(annotation))
                break  # once we find a latest label done by a reviewer, we move on to the next asset.

100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00,  4.05it/s]


### Filtering the latest label per annotator

When working on project with [consensus](https://docs.kili-technology.com/docs/consensus-overview) enabled, it can be useful to export the latest label made by each annotator:

In [80]:
from collections import defaultdict

assets = kili.assets(
    "clb54wfkn01zb0kyadscgaf5j",
    fields=[
        "externalId",
        "labels.author.email",
        "labels.createdAt",
        "labels.labelType",
        "labels.jsonResponse",
    ],
)

for asset in assets:
    if asset["labels"]:
        latest_label_by_user = defaultdict(list)
        for label in asset["labels"]:
            if label["labelType"] == "DEFAULT":
                latest_label_by_user[label["author"]["email"]].append(label)
        latest_label_per_user = {
            email: max(labels, key=lambda x: x["createdAt"])
            for email, labels in latest_label_by_user.items()
        }
        with (Path("/tmp") / (asset["externalId"] + ".json")).open("w", encoding="utf-8") as f:
            f.write(json.dumps(latest_label_per_user))

## Exporting a whole project

Kili has a method to export the whole project into specific export formats. It can be useful when your goal is to use one of the standard output formats.

### Available formats


| Format        | UI  | Python Client | Command Line Interface |
| ------------- | --- | ------------- | ---------------------- |
| Kili (raw)    | ✅   | ✅             | ✅                      |
| Kili (simple) | ✅   | ❌             | ❌                      |
| YOLO V4       | ✅   | ✅             | ✅                      |
| YOLO V5       | ✅   | ✅             | ✅                      |
| YOLO V7       | ❌   | ✅             | ✅                      |
| Pascal VOC    | ✅   | ✅             | ✅                      |
| COCO          | ❌   | ✅             | ✅                      |

### The `.export_labels` method

The [`.export_labels`](https://python-sdk-docs.kili-technology.com/latest/sdk/label/#kili.queries.label.__init__.QueriesLabel.export_labels) method enables the export of a full project. It does the following preprocessing:

* Only fetches the labels of types `"DEFAULT"` and `"REVIEW"` (see the [label types explanations](https://docs.kili-technology.com/docs/asset-lifecycle#label-types-and-definitions-throughout-an-asset-lifecycle)).
* If specified, selects a subset of asset ids.
* Exports labels to one of the standard formats (only available for a restricted set of ML tasks).
* Using various method arguments, you can decide:
    * Whether or not to include the assets in the export
    * Whether to export just the latest label or all the labels
    * Whether to create one folder for all the jobs or one folder per job
    * Whether or not to export the label-related data into one single file

Note that some formats are by default single-file, while others use many files:

|Format|Single file|Multiple files|
|---|---|---|
|Kili|✅ |✅ |
|Yolo|❌|✅ |
|Pascal VOC|❌|✅ |
|COCO|✅|❌|

For all the formats, in the output archive, a `README.kili.txt` file is also created. Here is an example of its contents:
```
Exported Labels from KILI
=========================

- Project name: Awesome annotation project
- Project identifier: abcdefghijklmnop
- Project description: This project contains labels, most of which are awesome.
- Export date: 20221125-093324
- Exported format: kili
- Exported labels: latest
```


### Kili format, one file per asset

The following code snippet exports the whole asset payload and the associated labels, with one json file per asset, into the `/tmp/export.zip` folder.

In [82]:
kili.export_labels(
    project_id=your_project_id,
    filename="/tmp/export.zip",
    fmt="kili",
)

Fetching assets...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  2.00it/s]
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00,  1.24s/it]
/tmp/export.zip


### Kili format, one file for the whole project

This code snippet exports the whole asset payload and the associated labels as one file for the whole project, into the `/tmp/export.zip` folder.

In [83]:
kili.export_labels(
    project_id=your_project_id,
    filename="/tmp/export.zip",
    fmt="kili",
    single_file=True,
)

Fetching assets...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  2.00it/s]
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  1.02it/s]
/tmp/export.zip


### YOLO formats

When you have at least one Object Detection job with bounding boxes, you can also export to one of the YOLO formats. You can choose `"yolo_v4"`, `"yolo_v5"` or `"yolo_v7"`. The difference between each format is the structure of the metadata YAML file, which specifies the object classes. In all the cases, one file per asset is produced, containing the last created `DEFAULT` or `REVIEW` label. Each YOLO label has the following shape:
```
2        0.25 0.67 0.26 0.34
^        ^    ^    ^    ^
class    x    y    w    h
```
where:

   * `class` is the class index in the classes list contained in the YOLO metadata file.
   * `x` is the x-coordinate relative to the image width (between 0.0 and 1.0) of the center of the bounding box.
   * `y` is the y-coordinate relative to the image height (between 0.0 and 1.0) of the center of the bounding box.
   * `w` is the width relative to the image width (between 0.0 and 1.0) of the bounding box.
   * `h` is the height relative to the image height (between 0.0 and 1.0) of the bounding box.

Here is an example of a YOLO annotation over an image:

![yolo on an image](../../assets/teslabb.jpg)

Here is how to export to YOLO (in this example, YOLOv5):

In [84]:
kili.export_labels(
    project_id=your_project_id,
    filename="/tmp/export.zip",
    fmt="yolo_v5",
)

Fetching assets...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  1.99it/s]
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  1.00it/s]
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 629.30it/s]
/tmp/export.zip


Note that a standard YOLO file format must also include:

* The path root to the assets
* The `train`, `val` and `test` subfolders

Placing specific data in specific folders is the decision of an ML engineer or a Data scientist, so we are not providing a code snippet here.

### COCO format

To export your data into the COCO format, run the following code:

In [85]:
kili.export_labels(
    project_id=your_project_id,
    filename="/tmp/export.zip",
    fmt="coco",
)

Project needs at least one OBJECT_DETECTION task.


This will create an archive containing both:

 * The COCO annotation file
 * The `data/` folder with all the assets


## Summary

In this tutorial, we have seen several ways to export labels from a Kili project:

* Using [`.assets`](https://python-sdk-docs.kili-technology.com/latest/sdk/asset/#kili.queries.asset.__init__.QueriesAsset.assets) and [`.labels`](https://python-sdk-docs.kili-technology.com/latest/sdk/label/#kili.queries.label.__init__.QueriesLabel.labels) and their filtering arguments, a subset of assets or labels can be selected and then exported.
* Using [`.export_labels`](https://python-sdk-docs.kili-technology.com/latest/sdk/label/#kili.queries.label.__init__.QueriesLabel.export_labels), the whole project can be exported into a standard output format.

In [None]:
kili.delete_project(your_project_id);