# Segmentation datasets annotation standard

The standard describes:
* components of annotation specifications;
* recommendations on content of the componenst and their design;
* methods for validating annotation;
* annotation tools and services.

The actions recommended by the standard are aimed to:
* increase reproducibility of dataset generation;
* improve the quality of labels.

**The description should be sufficient enough to allow a new analyst to:**
* **write a parser for data and annotations;**
* **generate new consistent dataset;**
* **annotatate new dataset in the same way.**

## Table of contents

* [Segmentation](#Segmentation)
    * [Segmentation task](#Segmentation-task)
    * [Applications](#Applications)
    * [Types of segmentation](#Types-of-segmantation)
* [Dataset description](#Dataset-description)
* [Annotation description](#Annotation-description)
    * [Annotation dictionary](#Annotation-dictionary)
    * [Annotation guidelines](#Annotation-guidelines)
    * [Instructions for annotators](#Instructions-for-annotators-(text-+-video))
    * [Examples of annotation](#Learning-examples---good-and-bad-annotations)
    * [Annotation format](#Annotation-format)
    * [Number of annotators](#Number-of-annotators)
    * [Annotation validation](#Annotation-validation)
* [Annotation tools](#Annotation-tools)
    * [CVAT](#CVAT)
    * [LabelMe](#LabelMe)
    * [LabelBox](#LabelBox)
    * [Supervise.ly](#Supervise.ly)
    * [Yandex.Toloka](#Yandex.Toloka)
    * [DBrain](#DBrain)
    * [Playment.io](#Playment.io)
* [Annotation description example](#Annotation-description-example)

## Segmentation

### Segmentation task

Segmentation is the process of splitting an image into groups of pixels, according to the proximity of their characteristics. Segmentation also can be considered as classification of each pixel in the image.

### Applications

Segmentation is often applied to a wide range of tasks, because it allows to retrieve a very large amount of information about objects in the image.

Examples of common areas and tasks for segmentation:

* Medical image analysis
    * detection and localization of tumors
    * determination of tissue volumes
* Analysis of satellite images
    * building a road graph
    * car detection
    * detection of illegal buildings
* Self-driving cars
    * detection of pedestrians in sight
    * detection of other road users
    * identification of surrounding objects

### Types of segmentation

* **Semantic segmentation**

In semantic segmentation, pixels are combined into groups according to a semantic attribute. For example, pedestrians, cars, chairs, dogs are four different groups of pixels; one group can include to many objects without differentiation across then, as in the example below:

<img src="./images/Ex_SS.jpg" style="width: 700px;"/>

* **Instance segmentation**

Instance segmentation is a bit more difficult task. During instance segmentation, individual objects in the image are identified and classified. Thus, even objects of the same type, such as cars or chairs, will be detected separately, as in an example:

<img src="./images/Ex_IS.jpg" style="width: 700px;"/>


## Dataset description

The main purpose of dataset description is to ensure reproducibility of data collection and preparation. This may be required to update, enlarge data or re-collect data when it is lost.

The dataset description should be produced in the process of data collection/preparation or immediately after, regardless of the labeling process. Later this description can be updated with labeling procedures.

#### Data source
Data source description should include:
* Data owner - the legal entity or division that owns the data; an employee who can provide access to the data. You should also specify the method of communication with the data owner.
* Data storage - if possible, you should indicate where the data is stored: in which databases / tables, in which storages, etc.
* The labeling process - the documents described below.

#### Obtaining the data
It is necessary to describe the procedure for obtaining the data, which may include: filing a request to data owner; coordination with the security department; anonymization process, etc.

#### Format and dataset size
It is necessary to describe the formats of files the dataset and its size. It should describe in which files the data is stored and whether any transformations were made; it should include description of the metadata. Dataset annotation should be described in the same way. It is advides to specify the distribution of classes in the data.

E.g.:

`
Data is stored in .jpg files. It represents each 48th frame of the original_video.avi, starting from 1: 23: 45.678. This procedure resulted in 12345 images. Dataset is labeled for the multi-class segmentation task: class "background" is represented by 75% of the pixels, class "person" by 21%, class "helmet" by 4%. The annotation for each image is stored in xml format.
`

## Annotation description

### Annotation dictionary

The main objective of the dictionary is to clarify characteristics of the objects that should be used to assign the object to a certain class
For example, when labeling aerial images, lakes and rivers can classified as `Water`, while ponds located on the territory of cities can be can be classified as `Populated areas`.

The dictionary allows you to distinguish several semantically close objects into different classes of combine them in one class. The dictionary is the first and most high-level part of the lableing documentation.

|<p align="left">Class|<p align="left">Description|
|---|---|
|<p align="left">Dense forest|<p align="left">Forest which does not allow to see the ground|
|<p align="left">Rare forest|<p align="left">Forest allows to see the ground|
|<p align="left">Ground|<p align="left">Flat surface without large vegetation, not a swamp|
|<p align="left">Water|<p align="left">Lakes, rivers|
|<p align="left">Swamp|<p align="left">Swamps, marshes, possibly surroundings of the lakes|
|<p align="left">Rocks|<p align="left">Territory with stones scattered in a significant amount, difficult to pass on transport|
|<p align="left">Road|<p align="left">Any visible roads: asphalt, gravel, etc .; parkings along the roads|
|<p align="left">Populated areas|<p align="left">Territories with inhabitant buildings and related infrastructure|
|<p align="left">Exclusive areas|<p align="left">Pipelines, power lines, drilling and production sites|
|<p align="left">Other|<p align="left">All other objects|

### Annotation guidelines

Should be prepared beforehand and enhanced during the labeling process.

Must answer frequently asked questions that annotators may have, solve uncertainties and make recommendations on specific cases:

|<p align="left">Question|<p align="left">Answer|
|---|---|
|<p align="left">How to deal with images with poor quality?|<p align="left">If it is impossible to identify the classes described in the dictionary, you should not label it and mark the photo as incorrect.|
|<p align="left">How to deal with objects covered with clouds?|<p align="left">If an object is covered by a cloud in a way that the boundaries of the object are indistinguishable, or if the cloud covers more than 15% of the object you should draw objects' boundaries along the cloud boundary. If the cloud does not intersect with the boundaries of the object and occupies less than 15% of the object, it can be considered part of the object.|
|<p align="left">How to deal with concatenated images, collages?|<p align="left">If the photos are concatenated so that the objects match (e.g. the coastline and the roads are continuous), then the labeling is carried out as usual. If not, the photo should be marked as incorrect.|
|<p align="left">How precise the labels should be?|<p align="left">The labeling should be carried out with accuracy of 5 pixels. This means that all pixels inside the border should belong only to the class object. Pixels more than 5 pixels away from the border should belong to a background or another class. The border and the interval of 5 pisels from it to the outside can belong to both the class object and other objects / background.|
|<p align="left">How to label blurry / shuffled pixels?|<p align="left">If pixel class is not clear due to the blurring or transparency of objects, the pixel should be assigned to the class that fits closer to the color of the pixel.|
|<p align="left">What to do with intersecting / overlapping objects?|<p align="left">If objects overlap (for example, a road or railway bridge that goes across a river), then priority classes should be marked first, if any. Then, in order from the top / closest object (in the example - the bridge) to the lower / farther.
|<p align="left">Shoul small structures be labeled?|<p align="left">If possible, you should try to mark small structures (3-5 pixels). If the small structure belongs to the priority class object, then such a structure must be labeled.|


### Instructions for annotators (text + video)

You should prepare instructions for annotators, both text and video.

The text instruction should describe the process using the annotation software, including:
* data download (if required);
* tools of the annotation software, which should be used by the annotator (for example, "Brush", "Fill", "Polygon" etc.);
* annotation process;
* annotation saving (if required);

In video instructions, you need to show the process described above with comments. Can be done as a screencast.

### Learning examples - good and bad annotations

When designing an annotation task you are highly recommended to create a set of good training examples. It should contain at least a few dozen very carefully labeled images. With this examples, annotators can get familiar with annotation classes and understand details described in annotation guidelines.

Some of these examples can be given to annotators as tasks and used as ground truth to evaluate annotators performance.

In addition to good examples, you should create a set of bad examples to demonstrate typical errors.

### Annotation format

The most popular formats to store annotations:
* json
* xml
* png / jpg / jpeg
* csv / tsv

Different tools can support one or more formats. The internal structure of the json, xml, csv files differs from tool to tool and for each new tool you need to create a new script to load annotation.

### Number of annotators

When training models, the quality of input data is very important - garbage in, garbage out. Therefore, you need a large amount of quality data.

To select the number of annotators *L* required to label one image:
1. Use teaching examples to assess the quality of annotators.
2. Select annotators that meet quality requirements and follow the instructions.
3. Evaluate how many such annotators should annotate single image to get required quality.

The choice of the total number of annotators * K * should be done as follows:
1. Estimate time *t* required for one annotator to label a single image.
2. Let *T* be the total time allocated for annotation.
3. The number of annotators is calculated by the formula:

$$ K = \frac{t * N * L} {T},$$

where *N* is the total number of images in dataset.

### Annotation validation

In order to use the annotated dataset to train models, you need to validate the annotation.

There are several ways to do it:

* By intersection - using values such as PPV, DICE, IoU.
* By distance - Mean Distance, Hausdorf distance.
* Voting methods.
* Using annotators scores from learning examples.
* Validation by eye.

These methods can be applied separately and in combination.

For example, consider a two-stage annotation process of LUNA16 dataset performed by four radiologists: in the initial blinded-read phase, each radiologist independently reviewed each CT scan and marked lesions; in the subsequent unblinded-read phase, each radiologist independently reviewed their own marks along with the anonymized marks of the three other radiologists to render a final opinion.

## Annotation tools

The annotation tool is chosen based on the task, the availability of computational and human resources, and data confidentiality.

For example, if you have your own server, confidential dataset and a large number of annotators, you can choose CVAT, since it scales well and data does leave your server. If the dataset is open, it is small and the number of annotators is also small, you can choose supervise.ly or LabelBox.

The selected tool (and its specific settings, if any), must be included in the annotation description to ensure reproducibility of the process.

### CVAT

Powerful tool, suitable for large-scale annotation. The server is packed in a docker container that can be set up in an hour. The service has different roles: administrator who creates and edits jobs, annotator, etc.

Labels, lines, bboxes and polygons can be used for annotation, which is exported to xml files.

Has an extensive [user manual](https://github.com/opencv/cvat/blob/develop/cvat/apps/documentation/user_guide.md).

### LabelMe

Simple tool, easy to install and run on local machine.
Allows you to make markup with polygons; convenient for instance segmentation.
The annotation is exported to json, but it can be converted to mask ing png format, original image and additional files — list of classes.

### LabelBox

An online platform; allows to annotate a few thousand images for free each year. You can configure the interface to customize annotation process.

It has a convenient pixel-level annotation interface - a magic brush that fills in areas (like Photoshop). Magic brush sometimes reduces the accuracy, but allows you to go beyond the polygons, to make discontinuous, ragged masks; also greatly accelerates the annotation process.

The platform allow to use consensus of annotations, but for segmentation it works oddly.

It is quite difficult to download annotation - exports json and csv files with a bunch of links to png masks and markup.

There are examples of annotation of the original image (in the middle) with a LabelBox magic brush (on the right) and Supervise.ly polygons (on the left).

<img src="./images/1_compare.jpg" style="width: 1500px;"/>

### Supervise.ly

Convenient online tool. Allows you to upload images without any accompanying files. You can have several datasets within one project, and classes between them can be shared.

It has polygons and bitmap filling for the segmentation. Allows you to download masks, as well as json files with classes and polygon points.

In the examples below, the image was annotated for classes "Water", "Forest", "Mountains" and "Other." The process took about 5-10 minutes.

Mask, original image and overlay:
<img src="./images/2_compare.jpg" style="width: 1200px;"/>

### Yandex.Toloka

Online platform that allows you to perform a number of annotation tasks - labeling, segmentation, bboxes, comparing pictures side by side, walking tasks.

It has customizable interfaces; the price for the markup is set by the user - you can speed up the process by increasing the price and attracting more annotators. The quality of annotations is low - out of five annotations of asmall image 2 were performed moderately, one badly and two did not even make sense. A big image had only 1 annotation out of five with acceptable quality.

The can be exported as a tsv file containing an image identifier and a json with markup and some other parameters.

### DBrain

A platform in the alpha testing stage.

Allows to make segmentation markup with polygons. Has a large pool of annotators, each has a rating based on quality and speed. The platform also claims to monitor the behavior of annotators: whether they take breaks or not, etc.

It claims own validation algorithms through consensus but does not disclose the details.

## Annotation description example

The annotation description should be a single text file, with links to supplementary materials, such as videos and examples.

The file structure should correspond to the structure of the "Annotation description" section. Text description and tables should be palced in the corresponding subsections of the file; other materials should be available via links to the files:


* Annotation dictionary and recommendations should be arranged as two tables.
* Annotation instructions should be arranged as a subsection of the description. It may also contain a link to a video file stored in the same folder.
* Examples of good and bad markup should be stored in a separate directory, the description should provide a link to this directory.
* Annotation format should be described in detail in the appropriate section of the file. The description should be sufficient enough for a new analyst to write a parser on his own. You can also attach a link to the parsing script.
* The number of annotators should also be specified in corresponding section of the file. You should specify the total number of annotators *K*, the number of annotators per image *L* and time *t* required for one annotator to process one image.
* Description of validation procedure should include: the selected validation method(s), metrics and criteria used to differ good and bad markup. You should also attach a link to the code used for validation.

The annotation description, along with all the materials, dataset description and dataset itself, must be archived and stored and distributed as archive.

Example structure of the annotation description:

```
Annotation_%dataset_name%/
|--- Description.doc
|--- Instructions.avi
|--- Examples
|    |--- Good
|    |    |--- Example1 
|    |    |--- Example2
|    |--- Bad
|    |    |--- Example1
|    |    |--- Example2
|--- Annotation_parser.py
|--- Validation_script.py
|--- Annotation_tool_configuration
|--- ...
```