# How to convert a Tagtog project to Kili

## Setup

In [None]:
%pip install kili rich

Note: you may need to restart the kernel to use updated packages.


In [None]:
from kili.client import Kili
from pathlib import Path
from rich import print_json

In [None]:
kili = Kili()

## Get data

For this recipe, we will use data from the [Apartment reviews](https://tagtog.com/JaqenNLP/ApartmentReviews) project made by Jennifer D. Ames.

The dataset consists in guests' reviews of apartments/houses for short-term lodging and tourism-related activities.

The reviews have been obtained from Inside Airbnb.

Each review has been manually annotated in tagtog.

Let's download the project's data (you'll need to be logged in tagtog to do so) by clicking this [link](https://tagtog.com/JaqenNLP/ApartmentReviews/-downloads/dataset-as-anndoc).

Once the .zip file is downloaded, we can unzip it:

In [None]:
!tar -xvf tagtog_ApartmentReviews.zip

x ApartmentReviews/plain.html/pool/a8Wus9Ave5EJ5V38LGmGOeO8ZMTm-text.plain.html
x ApartmentReviews/plain.html/pool/aVw3oWUq3vsAeBwmyqXTqW15QHm0-text.plain.html
x ApartmentReviews/plain.html/pool/a17Fq.yqQAyG00iT.SWHQQHfrbii-text.plain.html
x ApartmentReviews/plain.html/pool/avkJCm9Pd39.lq1eJ07uJCs5fuDO-text.plain.html
x ApartmentReviews/plain.html/pool/asoNpKvw_4cUsOhh1LaRmBummt24-text.plain.html
x ApartmentReviews/plain.html/pool/a9aynETGolCx_JTJnRHMQ1lbeyUy-text.plain.html
x ApartmentReviews/ann.json/master/pool/avkJCm9Pd39.lq1eJ07uJCs5fuDO-text.ann.json
x ApartmentReviews/plain.html/pool/aAeiuwXUsrzSHQ2Rb4wdHhKEmTTe-text.plain.html
x ApartmentReviews/ann.json/master/pool/asoNpKvw_4cUsOhh1LaRmBummt24-text.ann.json
x ApartmentReviews/ann.json/master/pool/a8Wus9Ave5EJ5V38LGmGOeO8ZMTm-text.ann.json
x ApartmentReviews/ann.json/master/pool/aVw3oWUq3vsAeBwmyqXTqW15QHm0-text.ann.json
x ApartmentReviews/ann.json/master/pool/a9aynETGolCx_JTJnRHMQ1lbeyUy-text.ann.json
x ApartmentReviews/ann.js

In [None]:
print(Path("ApartmentReviews/README.md").read_text(encoding="utf-8"))

This dataset lives in: https://tagtog.com/JaqenNLP/ApartmentReviews

This zip was generated with:
  * date: _2023-05-10T11:25:54.902Z_
  * search: `*`
  * total found documents: **228**

The dataset is here written in the [anndoc format](https://docs.tagtog.com/anndoc.html). Use the `annotations-legend.json` file to help you interpret the annotations.


What great things will you do with the dataset? :-) Enjoy!



Let's take a look at the ontology:

In [None]:
print_json(Path("ApartmentReviews/annotations-legend.json").read_text(encoding="utf-8"))

The ontology shows 10 entity types (keys starting with `e_`), 1 document label (key starting with `m_`), and 14 entity labels (keys starting with `f_`).

An entity label is used to tag an entity type, and aims at giving more information about the entity being annotated. For example, the `"f_17": "LocationFeature"` label can be used to tag a `"e_5": "Location"` entity type.

Read more about the ontology [here](https://tagtog.com/JaqenNLP/ApartmentReviews/-settings).

The `plain.html/pool` folder contains the reviews in HTML format:

In [None]:
print(
    Path("ApartmentReviews/plain.html/pool/a.km05GoV2Uh1mw9QR.UNiNXWUL8-text.plain.html").read_text(
        encoding="utf-8"
    )
)

<!DOCTYPE html >
<html id="a.km05GoV2Uh1mw9QR.UNiNXWUL8-text" data-origid="text" class="anndoc" data-anndoc-version="3.6" lang="" xml:lang="" xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <meta charset="UTF-8"/>
    <meta name="generator" content="net.tagtog.anndoc.v3.parsers.general.PlainTextParser_NewParagraphAfter1Newline_v2_0_0"/>
    <title>a.km05GoV2Uh1mw9QR.UNiNXWUL8-text</title>
  </head>
  <body>
    <article>
      <section data-type="">
        <div class="content">
          <p id="s1p1">The house is beautiful, but it is next to 2 very busy roads. There are no aircons in any rooms and to open the windows you hear the cars and trucks pass - very loud. There is no braai facilities and when we asked x 2 we were told the owner had to approve... still waiting! The rooms on the lower floor is empty and depressing, very hot! Kids slept on couches with sliding doors open, so we could not arm the alarm at night. The aircons on the upper level leaks water onto table and electric

The annotations are stored in the folder `ann.json/master/pool`:

In [None]:
print_json(
    Path(
        "ApartmentReviews/ann.json/master/pool/a.km05GoV2Uh1mw9QR.UNiNXWUL8-text.ann.json"
    ).read_text(encoding="utf-8")
)

## Create the Kili project

We can start creating the Named Entity Recognition (NER) Kili project.

To do so, we will need to define a json interface using the `annotations-legend.json` file.

In [None]:
json_interface = {
    "jobs": {
        "CLASSIFICATION_JOB": {
            "content": {
                "categories": {
                    "POSITIVE": {"children": [], "name": "positive"},
                    "NEUTRAL": {"children": [], "name": "neutral"},
                    "NEGATIVE": {"children": [], "name": "negative"},
                },
                "input": "radio",
            },
            "instruction": "Sentiment",
            "mlTask": "CLASSIFICATION",
            "required": 1,
            "isChild": False,
        },
        "NAMED_ENTITIES_RECOGNITION_JOB": {
            "content": {
                "categories": {
                    "HOST_OPINION": {
                        "children": [
                            "CLASSIFICATION_JOB_1",
                            "CLASSIFICATION_JOB_5",
                            "CLASSIFICATION_JOB_7",
                        ],
                        "color": "#472CED",
                        "name": "HostOpinion",
                    },
                    "ROOM_OPINION": {
                        "children": [
                            "CLASSIFICATION_JOB_1",
                            "CLASSIFICATION_JOB_6",
                            "CLASSIFICATION_JOB_7",
                        ],
                        "name": "RoomOpinion",
                        "color": "#5CE7B7",
                    },
                    "BATHROOM_OPINION": {
                        "children": [
                            "CLASSIFICATION_JOB_1",
                            "CLASSIFICATION_JOB_7",
                            "CLASSIFICATION_JOB_9",
                        ],
                        "name": "BathroomOpinion",
                        "color": "#D33BCE",
                    },
                    "FOOD_OPINION": {
                        "children": [
                            "CLASSIFICATION_JOB_1",
                            "CLASSIFICATION_JOB_3",
                            "CLASSIFICATION_JOB_7",
                        ],
                        "name": "FoodOpinion",
                        "color": "#FB753C",
                    },
                    "LOCATION": {
                        "children": [
                            "CLASSIFICATION_JOB_0",
                            "CLASSIFICATION_JOB_1",
                            "CLASSIFICATION_JOB_7",
                            "CLASSIFICATION_JOB_11",
                        ],
                        "name": "Location",
                        "color": "#3BCADB",
                    },
                    "KITCHEN_OPINION": {
                        "children": [
                            "CLASSIFICATION_JOB_1",
                            "CLASSIFICATION_JOB_7",
                            "CLASSIFICATION_JOB_13",
                        ],
                        "name": "KitchenOpinion",
                        "color": "#199CFC",
                    },
                    "PRICE_PAYMENT": {
                        "children": [
                            "CLASSIFICATION_JOB_1",
                            "CLASSIFICATION_JOB_7",
                            "CLASSIFICATION_JOB_10",
                        ],
                        "name": "PricePayment",
                        "color": "#FA484A",
                    },
                    "APARTMENT_OPINION": {
                        "children": [
                            "CLASSIFICATION_JOB_1",
                            "CLASSIFICATION_JOB_2",
                            "CLASSIFICATION_JOB_7",
                            "CLASSIFICATION_JOB_12",
                        ],
                        "name": "ApartmentOpinion",
                        "color": "#ECB82A",
                    },
                    "LOYALTY": {
                        "children": [
                            "CLASSIFICATION_JOB_1",
                            "CLASSIFICATION_JOB_4",
                            "CLASSIFICATION_JOB_7",
                        ],
                        "name": "Loyalty",
                        "color": "#3CD876",
                    },
                    "TECH_OPINION": {
                        "children": [
                            "CLASSIFICATION_JOB_1",
                            "CLASSIFICATION_JOB_7",
                            "CLASSIFICATION_JOB_8",
                        ],
                        "name": "TechOpinion",
                        "color": "#733AFB",
                    },
                },
                "input": "radio",
            },
            "instruction": "Entity type",
            "mlTask": "NAMED_ENTITIES_RECOGNITION",
            "required": 1,
            "isChild": False,
        },
        "CLASSIFICATION_JOB_0": {
            "content": {
                "categories": {
                    "CENTER": {"children": [], "name": "center"},
                    "AIRPORT_0": {"children": [], "name": "airport"},
                    "UNDERGROUND": {"children": [], "name": "underground"},
                    "TRAM": {"children": [], "name": "tram"},
                    "BUS": {"children": [], "name": "bus"},
                    "TRAIN": {"children": [], "name": "train"},
                    "HIGHWAY_0": {"children": [], "name": "highway"},
                    "PARKING": {"children": [], "name": "parking"},
                    "SHOPPING_0": {"children": [], "name": "shopping"},
                    "TOURISM_0": {"children": [], "name": "tourism"},
                    "RESTAURANTS": {"children": [], "name": "restaurants"},
                    "PUBS_0": {"children": [], "name": "pubs"},
                    "NATURE_0": {"children": [], "name": "nature"},
                    "ARRIVAL_0": {"children": [], "name": "arrival"},
                    "SAFETY": {"children": [], "name": "safety"},
                    "OWNVEHICLE": {"children": [], "name": "ownvehicle"},
                    "TAXI": {"children": [], "name": "taxi"},
                },
                "input": "radio",
            },
            "instruction": "LocationFeature",
            "mlTask": "CLASSIFICATION",
            "required": 0,
            "isChild": True,
        },
        "CLASSIFICATION_JOB_1": {
            "content": {
                "categories": {
                    "IS_CRITICAL_PROBLEM": {"children": [], "name": "isCriticalProblem"}
                },
                "input": "checkbox",
            },
            "instruction": "isCriticalProblem",
            "mlTask": "CLASSIFICATION",
            "required": 0,
            "isChild": True,
        },
        "CLASSIFICATION_JOB_2": {
            "content": {
                "categories": {
                    "CLEANLINESS": {"children": [], "name": "cleanliness"},
                    "STAIRS": {"children": [], "name": "stairs"},
                    "COMPLIANT_DESCRIPTION_0": {"children": [], "name": "compliantDescription"},
                    "LOOK": {"children": [], "name": "look"},
                    "SPACE": {"children": [], "name": "space"},
                    "CLIMATE_0": {"children": [], "name": "climate"},
                    "PETS": {"children": [], "name": "pets,"},
                    "GYM_0": {"children": [], "name": "gym"},
                    "SAFETY": {"children": [], "name": "safety"},
                },
                "input": "radio",
            },
            "instruction": "ApartmentFeature",
            "mlTask": "CLASSIFICATION",
            "required": 0,
            "isChild": True,
        },
        "CLASSIFICATION_JOB_3": {
            "content": {
                "categories": {
                    "BREAKFAST_0": {"children": [], "name": "breakfast"},
                    "DRINKS_0": {"children": [], "name": "drinks"},
                    "SNACKS": {"children": [], "name": "snacks"},
                },
                "input": "radio",
            },
            "instruction": "FoodFeature",
            "mlTask": "CLASSIFICATION",
            "required": 0,
            "isChild": True,
        },
        "CLASSIFICATION_JOB_4": {
            "content": {
                "categories": {
                    "SOLO_0": {"children": [], "name": "solo"},
                    "COUPLES_0": {"children": [], "name": "couples"},
                    "FAMILY_0": {"children": [], "name": "family"},
                    "FRIENDS": {"children": [], "name": "friends,"},
                    "REPEAT": {"children": [], "name": "repeat"},
                },
                "input": "radio",
            },
            "instruction": "LoyaltyFeature",
            "mlTask": "CLASSIFICATION",
            "required": 0,
            "isChild": True,
        },
        "CLASSIFICATION_JOB_5": {
            "content": {
                "categories": {
                    "FRIENDLINESS_0": {"children": [], "name": "friendliness"},
                    "POLITENESS_0": {"children": [], "name": "politeness"},
                    "COMMUNICATION": {"children": [], "name": "communication"},
                },
                "input": "radio",
            },
            "instruction": "HostCharacter",
            "mlTask": "CLASSIFICATION",
            "required": 0,
            "isChild": True,
        },
        "CLASSIFICATION_JOB_6": {
            "content": {
                "categories": {
                    "BED_0": {"children": [], "name": "bed"},
                    "WARDROBE": {"children": [], "name": "wardrobe,"},
                    "CHAIR_0": {"children": [], "name": "chair"},
                    "DESK": {"children": [], "name": "desk"},
                },
                "input": "radio",
            },
            "instruction": "RoomItem",
            "mlTask": "CLASSIFICATION",
            "required": 0,
            "isChild": True,
        },
        "CLASSIFICATION_JOB_7": {
            "content": {
                "categories": {
                    "POSITIVE_0": {"children": [], "name": "positive"},
                    "NEGATIVE": {"children": [], "name": "negative"},
                },
                "input": "radio",
            },
            "instruction": "Sentiment",
            "mlTask": "CLASSIFICATION",
            "required": 0,
            "isChild": True,
        },
        "CLASSIFICATION_JOB_8": {
            "content": {
                "categories": {
                    "WIFI_0": {"children": [], "name": "wifi"},
                    "KEYPAD": {"children": [], "name": "keypad,"},
                    "SMARTHOME_0": {"children": [], "name": "smarthome"},
                    "MOBILE": {"children": [], "name": "mobile,"},
                    "TV": {"children": [], "name": "tv,"},
                    "VIDEOGAMES": {"children": [], "name": "videogames"},
                },
                "input": "radio",
            },
            "instruction": "TechFeature",
            "mlTask": "CLASSIFICATION",
            "required": 0,
            "isChild": True,
        },
        "CLASSIFICATION_JOB_9": {
            "content": {
                "categories": {
                    "SHOWER_0": {"children": [], "name": "shower"},
                    "BATHTUB": {"children": [], "name": "bathtub,"},
                    "WC": {"children": [], "name": "wc,"},
                    "AMENITIES": {"children": [], "name": "amenities,"},
                    "HAIRDRYER": {"children": [], "name": "hairdryer"},
                },
                "input": "radio",
            },
            "instruction": "BathroomFeature",
            "mlTask": "CLASSIFICATION",
            "required": 0,
            "isChild": True,
        },
        "CLASSIFICATION_JOB_10": {
            "content": {
                "categories": {
                    "REFUND_0": {"children": [], "name": "refund"},
                    "DEPOSIT": {"children": [], "name": "deposit,"},
                    "PAYMENT_METHODS": {"children": [], "name": "payment methods"},
                    "EXTRA_CHARGES_0": {"children": [], "name": "extra charges"},
                },
                "input": "radio",
            },
            "instruction": "PaymentFeature",
            "mlTask": "CLASSIFICATION",
            "required": 0,
            "isChild": True,
        },
        "CLASSIFICATION_JOB_11": {
            "content": {
                "categories": {
                    "QUIET": {"children": [], "name": "Quiet"},
                    "NOISY": {"children": [], "name": "Noisy"},
                },
                "input": "radio",
            },
            "instruction": "isLocationQuiet",
            "mlTask": "CLASSIFICATION",
            "required": 0,
            "isChild": True,
        },
        "CLASSIFICATION_JOB_12": {
            "content": {
                "categories": {
                    "GARDEN": {"children": [], "name": "garden"},
                    "PATIO": {"children": [], "name": "patio"},
                    "BALCON": {"children": [], "name": "balcon,"},
                    "TERRACE": {"children": [], "name": "terrace,"},
                    "VIEW": {"children": [], "name": "view,"},
                    "POOL": {"children": [], "name": "pool,"},
                    "BARBECUE": {"children": [], "name": "barbecue"},
                },
                "input": "radio",
            },
            "instruction": "OutdoorFeature",
            "mlTask": "CLASSIFICATION",
            "required": 0,
            "isChild": True,
        },
        "CLASSIFICATION_JOB_13": {
            "content": {
                "categories": {"APPLIANCES": {"children": [], "name": "appliances"}},
                "input": "radio",
            },
            "instruction": "KitchenFeature",
            "mlTask": "CLASSIFICATION",
            "required": 0,
            "isChild": True,
        },
    }
}

In [None]:
project_id = kili.create_project(
    input_type="TEXT", json_interface=json_interface, title="Tagto to Kili recipe"
)["id"]