# Kili Tutorial: Import rich-text assets
When dealing with textual data, style can convey a lot of meaning. If you annotate a long list or a legal text, displaying structured text instead of plain boring text allows your annotator to rapidly grasp patterns within the document.

In [None]:
import os

# !pip install kili # uncomment if you don't have kili installed already
from kili.client import Kili

api_endpoint = os.getenv(
    "KILI_API_ENDPOINT"
)  # If you use Kili SaaS, use the url 'https://cloud.kili-technology.com/api/label/v2/graphql'

kili = Kili(api_endpoint=api_endpoint)

## 1. Discover rich-text Kili format

Let's create a named-entity recognition project. We are going to label American law texts:

In [None]:
json_interface = {
    "jobs": {
        "JOB_0": {
            "mlTask": "NAMED_ENTITIES_RECOGNITION",
            "instruction": "Categories",
            "required": 1,
            "isChild": False,
            "isVisible": True,
            "content": {
                "categories": {
                    "INSTRUCTIONS": {"name": "Instructions", "children": [], "color": "#cc4125"},
                    "PREAMBULE": {"name": "Preambule", "children": [], "color": "#ffd966"},
                    "RIGHTS": {"name": "Rights", "children": [], "color": "#76a5af"},
                    "REFERENCE_TO_GOD": {
                        "name": "Reference to God",
                        "children": [],
                        "color": "#c27ba0",
                    },
                },
                "input": "radio",
            },
        }
    }
}
project = kili.create_project(json_interface=json_interface, input_type="TEXT", title="massive")
project_id = project["id"]
project = kili.update_properties_in_project(
    project_id=project_id, consensus_tot_coverage=100, min_consensus_size=2
)

When you insert normal text in Kili, you provide a `content` array. In case of rich text, you provide a `json_content` array. Given a JSON content, let's write a function that inserts a new asset:

In [None]:
from random import random


def create_asset_from_json_content(json_content):
    json_content_array = [json_content]
    kili.append_many_to_dataset(
        project_id=project_id, content_array=[""], json_content_array=json_content_array
    )

`json_content` contains nodes. Nodes can be either element nodes or text nodes.

- An element node can have children (that is a list of other element or text nodes).
  - By default, element nodes are `<div />`.
  - Possible types for an element node are:
    - `blockquote`
    - `h1`
    - `h2`
    - `h3`
    - `h4`
    - `li`
    - `ol`
    - `p`
    - `table`
    - `tbody`
    - `td`
    - `thead`
    - `tr`
    - `ul`
  - Possible styles for a node are (see [Mozilla reference](https://developer.mozilla.org/fr/docs/Web/CSS) to learn more on CSS):
    - `alignItems`
    - `alignSelf`
    - `background`
    - `backgroundColor`
    - `border`
    - `borderBottom`
    - `borderLeft`
    - `borderRadius`
    - `borderRight`
    - `borderTop`
    - `color`
    - `display`
    - `flexDirection`
    - `float`
    - `fontWeight`
    - `height`
    - `margin`
    - `marginBottom`
    - `marginLeft`
    - `marginRight`
    - `marginTop`
    - `maxHeight`
    - `maxWidth`
    - `minHeight`
    - `minWidth`
    - `padding`
    - `paddingBottom`
    - `paddingLeft`
    - `paddingRight`
    - `paddingTop`
    - `textAlign`
    - `textDecoration`
    - `textIndent`
    - `width`

- A text node can contain text.
  - By default, text nodes are `<span />`.
  - Text nodes are identified by an `id`. The ID must be unique accross all the document. This will allow for overlapping entities between two or more text nodes.
  - Possible styles for a text node are:
    - `bold: true`
    - `code: true`
    - `italic: true`
    - `underline: true`
    - `display`
    - `float`
    - `fontWeight`
    - `margin`
    - `marginBottom`
    - `marginLeft`
    - `marginRight`
    - `marginTop`
    - `padding`
    - `paddingBottom`
    - `paddingLeft`
    - `paddingRight`
    - `paddingTop`
    - `textAlign`
    - `textDecoration`
    - `textIndent`

Let's see some real examples with the corresponding result in Kili's interface!

**WARNING**: Here, IDs are randomly generated, but you probably do not want this!

In [None]:
# One simple text node
json_content = [
    {
        "children": [
            {
                "id": f"{random()}",
                "text": "The unanimous Declaration of the thirteen United States of America.",
            }
        ]
    }
]
create_asset_from_json_content(json_content)

<img src="./img/rich_text_1.png">

In [None]:
# Some basic text style
json_content = [
    {
        "children": [
            {
                "id": f"{random()}",
                "bold": True,
                "underline": True,
                "text": "The unanimous Declaration of the thirteen United States of America.",
            }
        ]
    }
]
create_asset_from_json_content(json_content)

<img src="./img/rich_text_2.png">

In [None]:
# You can mix styled text nodes with plain-text nodes
json_content = [
    {
        "children": [
            {
                "type": "p",
                "children": [
                    {
                        "id": f"{random()}",
                        "bold": True,
                        "underline": True,
                        "text": "The unanimous Declaration",
                    },
                    {
                        "id": f"{random()}",
                        "bold": True,
                        "text": " of the thirteen United States of America.",
                    },
                    {
                        "id": f"{random()}",
                        "text": "When in the Course of human events, it becomes necessary for one people to dissolve the political bands which have connected them with another, and to assume among the powers of the earth, the separate and equal station to which the Laws of Nature and of Nature's God entitle them, a decent respect to the opinions of mankind requires that they should declare the causes which impel them to the separation.",
                    },
                ],
            }
        ]
    }
]
create_asset_from_json_content(json_content)

<img src="./img/rich_text_3.png">

In [None]:
# Finalize with a title, a sub title and proper margins
json_content = [
    {
        "children": [
            {
                "type": "h1",
                "children": [
                    {
                        "border": "1px solid black",
                        "textAlign": "center",
                        "children": [{"id": f"{random()}", "text": "Declaration of Independence"}],
                    },
                ],
            },
            {
                "type": "h2",
                "children": [{"id": f"{random()}", "text": "In Congress, July 4, 1776"}],
            },
            {
                "type": "p",
                "children": [
                    {
                        "id": f"{random()}",
                        "bold": True,
                        "underline": True,
                        "text": "The unanimous Declaration",
                    },
                    {
                        "id": f"{random()}",
                        "bold": True,
                        "text": " of the thirteen United States of America.",
                    },
                    {
                        "id": f"{random()}",
                        "text": "When in the Course of human events, it becomes necessary for one people to dissolve the political bands which have connected them with another, and to assume among the powers of the earth, the separate and equal station to which the Laws of Nature and of Nature's God entitle them, a decent respect to the opinions of mankind requires that they should declare the causes which impel them to the separation.",
                    },
                ],
            },
            {
                "type": "p",
                "marginLeft": "30px",
                "marginRight": "30px",
                "border": "red",
                "children": [
                    {
                        "id": f"{random()}",
                        "text": "We hold these truths to be self-evident, that all men are created equal, that they are endowed by their Creator with certain unalienable Rights, that among these are Life, Liberty and the pursuit of Happiness.",
                    },
                    {
                        "type": "ul",
                        "children": [
                            {
                                "type": "li",
                                "children": [
                                    {
                                        "id": f"{random()}",
                                        "text": "That to secure these rights, Governments are instituted among Men, deriving their just powers from the consent of the governed,",
                                    },
                                ],
                            },
                            {
                                "type": "li",
                                "children": [
                                    {
                                        "id": f"{random()}",
                                        "text": "That whenever any Form of Government becomes destructive of these ends, it is the Right of the People to alter or to abolish it, and to institute new Government, laying its foundation on such principles and organizing its powers in such form, as to them shall seem most likely to effect their Safety and Happiness.",
                                    },
                                ],
                            },
                        ],
                    },
                    {
                        "id": f"{random()}",
                        "text": "Prudence, indeed, will dictate that Governments long established should not be changed for light and transient causes; and accordingly all experience hath shewn, that mankind are more disposed to suffer, while evils are sufferable, than to right themselves by abolishing the forms to which they are accustomed. But when a long train of abuses and usurpations, pursuing invariably the same Object evinces a design to reduce them under absolute Despotism, it is their right, it is their duty, to throw off such Government, and to provide new Guards for their future security.--Such has been the patient sufferance of these Colonies; and such is now the necessity which constrains them to alter their former Systems of Government. The history of the present King of Great Britain is a history of repeated injuries and usurpations, all having in direct object the establishment of an absolute Tyranny over these States. To prove this, let Facts be submitted to a candid world.",
                    },
                ],
            },
        ],
    },
]
create_asset_from_json_content(json_content)

<img src="./img/rich_text_4.png">

## 2. Convert HTML to rich-text Kili format

To get a better grasp of rich-text Kili format, you can transform raw snippets of HTML directly into Kili format. For that purpose, we will use [BeautifulSoup 4](https://www.crummy.com/software/BeautifulSoup/bs4/doc/), and will try to convert [a simple table from Mozilla doc](https://developer.mozilla.org/en-US/docs/Web/HTML/Element/table):

In [None]:
!pip install beautifulsoup4

In [None]:
html_doc = """
<table>
    <thead>
        <tr>
            <th colspan="2">The table header</th>
        </tr>
    </thead>
    <tbody>
        <tr>
            <td>The table body</td>
            <td border="1px solid #333">with two columns</td>
        </tr>
    </tbody>
</table>
"""

In [None]:
styles = {
    "table": {"border": "1px solid #333"},
    "td": {"border": "1px solid #333"},
    "th": {"backgroundColor": "#333", "color": "#fff"},
}


def from_html_to_kili(html):
    if html is None:
        return html
    attributes = dict(styles[html.name]) if html.name in styles else {}
    children = [from_html_to_kili(child) for child in html.findChildren(recursive=False)]
    if len(children) == 0:
        children.append({"id": f"{random()}", "text": html.text.strip("\n").strip()})
    if len(children) != 0:
        attributes["children"] = children
        if html.name != "[document]":
            attributes["type"] = html.name
    return attributes

In [None]:
from bs4 import BeautifulSoup
import pprint

soup = BeautifulSoup(html_doc, "html.parser")
json_content = from_html_to_kili(soup)
print("Inserted JSON content:")
pprint.pprint([json_content])
create_asset_from_json_content([json_content])

Once inserted in Kili, the table looks like this:

<img src="./img/rich_text_5.png">

In [None]:
assets = kili.assets(project_id=project_id)
assert len(assets) == 5