# Named Entity Recognition

In this tutorial we show you how to start with a Named Entity Recognition (NER) project.

Here are the steps in this tutorial:

1. [What is Named Entity Recognition (NER)?](#ner)
2. [Connection to Kili](#connect)
3. [Creating the project, setting up the interface](#project)
4. [Importing data](#data)
5. [Labeling](#labeling)
6. [Exporting labels](#export)
7. [Quality Management](#quality)
8. [More advanced concepts](#concepts)

# What is Named Entity Recognition (NER)?<a id='ner'></a>

The first key concept to understand is what is Named Entity Recognition (NER). NER is a sub-task of information extraction that seeks to locate and classify named entities mentioned in unstructured text into predefined categories such as names, organizations, locations, medical codes, time expressions, quantities, etc. NER has many use cases, such as anonymisation of documents containing personal data. This is especially applicable to use cases involving protected data such as medical documents or court decisions. To make things more concrete, let's examine an example. Let's say we have the text below:

![](../img/asset_ner.png)

We want to specify in this document which includes names, nouns, adjectives, and verbs. We’ll Kili in this tutorial, and afterwards, the document will look like this:

![](../img/asset_ner_annotated.png)

This task is easy for a human: recognizing nouns, adjectives, and verbs. But this process is actually helpful for a machine to learn to recognize nouns and other things in a document. Training a machine learning model requires large quantities of annotated assets. With Kili, you can label your data efficiently. In this tutorial, we will create our project on NER step-by-step.

# Connecting to Kili <a id='connect'></a>

The first step is to be able to connect to the platform.

If you use the SaaS version of Kili (see [here](https://cloud.kili-technology.com/docs/hosting/saas/)), you use by default the Auth0 login identification, or your company's authentication if it has been implemented. 

<img src="../img/auth0.png" width="400" />

If you use Kili on-premise (see [here](https://cloud.kili-technology.com/docs/hosting/on-premise-entreprise/)), you will probably use our own authentication:

<img src="../img/noauth0.png" width="400" />

You need your organization admin to create your profile, and depending on the authentication implementation, you can sign up and set your password, or use the temporary one provided to you by the admin. 

If everything succeeds, you should arrive at the Projects page shown in the beginning of the next section.

# Creating the project <a id='project'></a>

## List of projects

You arrive on a list of projects. The list can be empty if you created nothing.

![](../img/project_list.png)

You can refer to this [document](https://cloud.kili-technology.com/docs/concepts/definitions/) to find the definitions of key concepts at Kili. One of them is a project, which is a combination of:
- a dataset (a list of assets)
- members (project users; each can have different roles)
- an interface, describing the annotation plan.

## Create the project 

You can either create a project [from the interface](https://cloud.kili-technology.com/docs/projects/new-project/#docsNav) or [from the API](https://github.com/kili-technology/kili-playground/blob/master/recipes/create_project.ipynb).

To create a project from the interface, select `Create New` from the list of projects. Next, type your project's name and a description, and select `Text Named-Entities Recognition`. Finally, press `Save` as shown below:

![](../img/getting_started/create_ner_project.gif)

<details>
<summary style="display: list-item;"> Follow these instructions to create a NER project from the API </summary>

From the API, you can create a project with a single call, which allows you to store and share project interfaces:
- First, [connect to Kili](https://github.com/kili-technology/kili-playground/blob/master/README.md#get-started)


```python
# Authentication
import os

# !pip install kili # uncomment if you don't have kili installed already
from kili.client import Kili

api_key = os.getenv('KILI_USER_API_KEY')
api_endpoint = os.getenv('KILI_API_ENDPOINT') # If you use Kili SaaS, use the url 'https://cloud.kili-technology.com/api/label/v2/graphql'

kili = Kili(api_key=api_key, api_endpoint=api_endpoint)
```

- Then call the method `create_project` : <a id='command'></a>
```python
kili.create_project(
    title='Project Title',
    description='Project Description',
    input_type='TEXT',
    json_interface=interface
)
```

with `interface` such as:


```python
interface = {
	"jobRendererWidth": 0.25,
	"jobs": {
		"JOB_0": {
			"mlTask": "NAMED_ENTITIES_RECOGNITION",
			"instruction": "Categories",
			"required": 1,
			"isChild": false,
			"isVisible": true,
			"content": {
				"categories": {
					"INTERJECTION": {
						"name": "Interjection",
						"children": [],
						"color": "#0755FF"
					},
					"NOUN": {
						"name": "Noun",
						"children": [],
						"color": "#EEBA00"
					}
				},
				"input": "radio"
			}
		}
	}
}
```


```python
result = kili.create_project(
    title='Project Title',
    description='Project Description',
    input_type='TEXT',
    json_interface=interface
)
print(result)
```

```python
Out: {'id': 'ckm4pmqmk0000d49k6ewu2um5'}
```
</details>

## Access your project

This creates a project with a simple interface with two types of Named Entities: "Interjection" and "Noun". Once logged in, you can see your project in the list of projects: 

![](../img/project_ner_in_list.png)

Click on it. You arrive on the overview of the project:

![](../img/project_overview.png)

If you want to modify or view the interface, go to the “Settings” tab.

<img src="../img/sidebar_settings.png" width=100/>

You can find both the form and the JSON versions of the interface:

![](../img/project_ner_jobs.png)

Furthermore, you can also do Named Entities Relation, another task relative to information extraction. Essentially, you can create relationships between your named entities. Currently, it's only possible to establish relationships for type “One to Many” on the Kili platform. To set things up, you have to go back to the settings of the project and select the `Add a new job` button. You should see a model. Click on `Named Entities Relations` and after this, choose `Radio Button`.

![](../img/ner_relation_setting_up.png)

The `Named Entities Relations` job will be added to your project. It will look like this:

![](../img/ner_relation_settings.png)

You can now create relationships between the named entities you already created. Choose the name of your relation on the left and select which entities are in a "One to Many" relationship.

[Find out how to modify the interface dynamically!](https://cloud.kili-technology.com/docs/projects/customize-interface/#docsNav)

If you want to go back to the list of projects, either click on Kili Technology in the top bar, or on the list of projects in the sidebar:

<img src="../img/sidebar_listprojects.png" width=100>

<summary style="display: list-item;"> Follow these instructions to create a project from the API </summary>

When you run the [command](#command) to create a project, it outputs a unique identifier of the project. This identifier is used to recognize, access, and modify the project from the API.

<a id="command"></a>
```python
kili.create_project(
    title='Project Title',
    description='Project Description',
    input_type='IMAGE',
    json_interface=interface
)
```

Example of such an output:

```python
{'id': 'ckkpj7stx1bxc0jvk1gn9cu5v'}
```

Another way to get this project identifier is to look at the URL you're in:

![](../img/url_project.png)
</details>

# Importing data <a id='data'></a>

The next step is to import data.

You can import data either [from the interface](https://cloud.kili-technology.com/docs/data-ingestion/data-ingestion-made-easy/) or [from the API](https://cloud.kili-technology.com/docs/python-graphql-api/recipes/import_assets/#kili-tutorial-importing-assets). 


To import data from the interface, go to the `Dataset` section in your project and then click on `Add New`. There you'll see two tabs. From the first one called `Uplod Local Data`, you'll be able to upload files stored on your computer. From the second tab called “Connect Cloud Data,” you should provide a .csv file containing the URLs to your data stored in the cloud. These steps are shown below:

![](../img/import_assets.gif)

# Labeling <a id='labeling'></a>

When you create a project, you automatically become an admin of the project. This means that you can directly label. If you want to add members to the project, follow [](https://cloud.kili-technology.com/docs/projects/settings/#manage-project-members).

To annotate a specific asset, you can go to the “Dataset” tab (in the side panel):

<img src="../img/sidebar_dataset.png" width=100>

![](../img/ner_dataset.png)

On the table of the assets, simply click on the line/asset (i.e., image here) you want to annotate.

## Label the first asset in the queue

You can start to annotate right away with the `Start Labeling` button. 

## How to label ?

You arrive on the asset `txt - 1`: 

![](../img/text1.png)

Select the category (noun or interjections) you want by clicking on the right radio button, or by pressing the key underlined in the class name "i" for Interjections and "n" for Noun.

You annotate by clicking on the radio button of a class and selecting the word you want to annotate in your text.

![](../img/getting_started/ner_annotation.gif)

Then, click on submit to send the label.

<details>
    <summary style="display: list-item;"> Follow these instructions to add a label from the API </summary>

    For that, you need to know the identifier of the asset (image). Either from the url when you are on an asset

    ![](https://raw.githubusercontent.com/kili-technology/kili-playground/master/recipes/img/asset_id_url.png)

    or from the API, retrieving the assets of the project:


    ```python
    assets = kili.assets(
        project_id=project_id,
        fields=['id']
    )
    asset_id = assets[0]['id']
    print(asset_id)
    ```

    ```python
    kili.append_to_labels(
        json_response=json_response,
        label_asset_id=asset_id,
        project_id=project_id
    )
    ```

    {'id': 'ckm4pmzlj0009d49k1avaeubv'}

    With a `json_response` such as :

    ```
    jsonResponse: {
            JOB_0: {
            annotations: [
                {
                beginId: '__default__',
                beginOffset: 252,
                categories: [
                    {
                    name: 'NOUN',
                    confidence: 100
                    }
                ],
                content: 'Proin',
                endId: '__default__',
                endOffset: 257,
                mid: '2021050514450488-34689'
                }
            ]
            }
        },
    ```
    
</details>

# Exporting labels <a id='export'></a>

## Through the interface

In the Dataset tab, you can export your labels. 

![](../img/ner_labeled.png)

1. Choose your format and click on “Download.” An asynchronous job is triggered, preparing your data.
2. Next, you get a notification. Click on it, then click on the “Download" button to download your data.

Notification appears | Notification list
:--:|:--:
![](../img/notification_appears.png) | <img src="../img/notification_opened.png" width=400>

If you chose the Kili API Format, you get this file:

```
[
    {
        "content": "https://staging.cloud.kili-technology.com/api/label/v2/files?id=2dd1efbd-20f9-4b1e-8da5-75758562f62f", 
        "externalId": "txt 1", "id": "ckob8lwha00070j2118gr55ld", 
        "jsonMetadata": {}, 
        "labels": [
            {
                "author": 
                {
                    "email": "email of the author of the label",
                    "id": "id of the author of the label",
                    "createdAt": "2021-05-05T14:29:09.020Z", 
                    "isLatestLabelForUser": true, 
                    "jsonResponse": 
                    {
                        "JOB_0": 
                        {
                            "annotations": [
                                {
                                    "beginId": "__default__", 
                                    "beginOffset": 252, 
                                    "categories": 
                                    [
                                        {
                                            "confidence": 100, 
                                            "name": "NOUN"
                                        }
                                    ], 
                                    "content": "Proin", 
                                    "endId": "__default__", 
                                    "endOffset": 257, 
                                    "mid": "2021050514450488-34689"
                                }
                            ]
                        }
                    }, 
                    "labelType": "DEFAULT", 
                    "modelName": null, 
                    "skipped": false
                }
            ]
        }
```

[For details on the data export, click here](https://cloud.kili-technology.com/docs/data-export/data-export/#docsNav)

<details>
<summary style="display: list-item;"> Follow these instructions to export labels from the API </summary>


Our API uses GraphQL. Simply choose the fields you want to fetch by specifying a list:


```python
labels = kili.labels(
    project_id=project_id,
    fields=['id', 'createdAt', 'labelOf.externalId']
)
assert len(labels) > 0
labels
```




    [{'labelOf': {'externalId': 'txt 1'},
      'id': 'ckm4pmzlj0009d49k1avaeubv',
      'createdAt': '2021-03-11T10:10:20.984Z'}]



Of course, you have plenty more options/filters:


```python
help(kili.labels)
```

    Help on method labels in module kili.queries.label:
    
    labels(asset_id: str = None, asset_status_in: List[str] = None, asset_external_id_in: List[str] = None, author_in: List[str] = None, created_at: str = None, created_at_gte: str = None, created_at_lte: str = None, fields: list = ['author.email', 'author.id', 'id', 'jsonResponse', 'labelType', 'secondsToLabel', 'skipped'], first: int = None, honeypot_mark_gte: float = None, honeypot_mark_lte: float = None, id_contains: List[str] = None, json_response_contains: List[str] = None, label_id: str = None, project_id: str = None, skip: int = 0, skipped: bool = None, type_in: List[str] = None, user_id: str = None) method of kili.playground.Playground instance
        Get an array of labels from a project given a set of criteria
        
        Parameters
        ----------
        - asset_id : str, optional (default = None)
            Identifier of the asset.
        - asset_status_in : list of str, optional (default = None)
            Returned labels should have a status that belongs to that list, if given.
            Possible choices : {'TODO', 'ONGOING', 'LABELED', 'REVIEWED'}
        - asset_external_id_in : list of str, optional (default = None)
            Returned labels should have an external id that belongs to that list, if given.
        - author_in : list of str, optional (default = None)
            Returned labels should have a label whose status belongs to that list, if given.
        - created_at : string, optional (default = None)
            Returned labels should have a label whose creation date is equal to this date.
            Formatted string should have format : "YYYY-MM-DD"
        - created_at_gt : string, optional (default = None)
            Returned labels should have a label whose creation date is greater than this date.
            Formatted string should have format : "YYYY-MM-DD"
        - created_at_lt : string, optional (default = None)
            Returned labels should have a label whose creation date is lower than this date.
            Formatted string should have format : "YYYY-MM-DD"
        - fields : list of string, optional (default = ['author.email', 'author.id', 'id', 'jsonResponse', 'labelType', 'secondsToLabel', 'skipped'])
            All the fields to request among the possible fields for the labels.
            See [the documentation](https://cloud.kili-technology.com/docs/python-graphql-api/graphql-api/#label) for all possible fields.
        - first : int, optional (default = None)
            Maximum number of labels to return.  Can only be between 0 and 100.
        - honeypot_mark_gt : float, optional (default = None)
            Returned labels should have a label whose honeypot is greater than this number.
        - honeypot_mark_lt : float, optional (default = None)
            Returned labels should have a label whose honeypot is lower than this number.
        - id_contains : list of str, optional (default = None)
            Filters out labels not belonging to that list. If empty, no filtering is applied.
        - json_response_contains : list of str, optional (default = None)
            Returned labels should have a substring of the jsonResponse that belongs to that list, if given.
        - label_id : str
            Identifier of the label.
        - project_id : str
            Identifier of the project.
        - skip : int, optional (default = None)
            Number of labels to skip (they are ordered by their date of creation, first to last).
        - skipped : bool, optional (default = None)
            Returned labels should have a label which is skipped
        - type_in : list of str, optional (default = None)
            Returned labels should have a label whose type belongs to that list, if given.
        - user_id : str
            Identifier of the user.
        
        
        Returns
        -------
        - a result object which contains the query if it was successful, or an error message else.
        
        Examples
        -------
        >>> # List all labels of a project and their assets external ID
        >>> playground.labels(project_id=project_id, fields=['jsonResponse', 'labelOf.externalId'])
    
</details>

# Quality Management<a id='quality'></a>

To ensure that your model performs well, it's essential that your annotations are good quality. Using Kili, you have two main ways to measure the quality of the annotations: consensus and honeypot. Consensus basically is the measure of agreement between annotations from different annotators. Honeypot is measured by comparing the annotations of your annotators to a gold standard that you should provide beforehand.

To access the quality management tab, go to ”Settings”, then “Quality Management” as shown below:

![](../img/access_quality_management.png)

Plase follow the links below for detailed information:
- [Quality management](https://cloud.kili-technology.com/docs/quality/quality-management/#docsNav)
- Settings up quality metrics: [Consensus](https://cloud.kili-technology.com/docs/quality/consensus/#docsNav) and [Honeypot](https://cloud.kili-technology.com/docs/quality/honeypot/)

# More advanced concepts <a id='concepts'></a>

Here we list some of the more advanced features:

- [Importing predictions](https://cloud.kili-technology.com/docs/python-graphql-api/recipes/import_predictions/#docsNav)
- [Reviewing the labels](https://cloud.kili-technology.com/docs/quality/review-process/#docsNav)
- [Issue/Question system](https://cloud.kili-technology.com/docs/quality/question-issue/#docsNav)
- [More on Named Entities Recognition](https://cloud.kili-technology.com/docs/interfaces-text-pdf/named-entities-recognition/)

[The full API definition can be found here](https://cloud.kili-technology.com/docs/python-graphql-api/python-api/#docsNav)