# Converter from LabelBox raw format to CCAgT format

In [1]:
import json

from CCAgT_utils.converters.LabelBox import LabelBox_Annotations

## Open files
- helper/auxiliary categories file: File with information about each category as: Color representation, category name, category ID, category unique ID at LabelBox file.
- label box file: File with the raw data from the [LabelBox](https://docs.labelbox.com/docs/export-labels), the file at this repository have just some samples with all fields sanitized.

In [2]:
helper_path = '../../data/samples/CCAgT_dataset_metadata.json'
labelbox_raw_path = '../../data/samples/sanitized_sample_labelbox.json'

In [3]:
with open(helper_path, 'r') as hf:
    dataset_helper = json.load(hf)
    
categories_helpper = dataset_helper['categories']
categories_helpper[0], len(categories_helpper)

({'color': [21, 62, 125],
  'name': 'Nucleus',
  'id': 1,
  'labelbox_schemaId': '<Unique ID for category Nucleus>',
  'minimal_area': 500,
  'supercategory': ''},
 7)

In [4]:
with open(labelbox_raw_path, 'r') as hf:
    labelbox_raw = json.load(hf)

labelbox_raw[0], len(labelbox_raw)

({'ID': 'ID 1',
  'DataRow ID': 'DataRow ID 1',
  'Labeled Data': '<URL for the image>',
  'Label': {'objects': [{'featureId': '<ID for this annotation - 0>',
     'schemaId': '<Unique ID for category Nucleus>',
     'color': '#1CE6FF',
     'title': 'nucleus',
     'value': 'nucleus',
     'polygon': [{'x': 474.332, 'y': 783.996},
      {'x': 478.165, 'y': 785.663},
      {'x': 481.165, 'y': 787.329},
      {'x': 484.332, 'y': 789.663},
      {'x': 487.499, 'y': 792.163},
      {'x': 491.332, 'y': 795.829},
      {'x': 493.832, 'y': 797.829},
      {'x': 497.999, 'y': 801.829},
      {'x': 500.832, 'y': 804.329},
      {'x': 502.665, 'y': 806.996},
      {'x': 503.832, 'y': 810.996},
      {'x': 504.499, 'y': 814.829},
      {'x': 504.832, 'y': 818.829},
      {'x': 505.332, 'y': 822.829},
      {'x': 505.165, 'y': 826.996},
      {'x': 504.499, 'y': 830.163},
      {'x': 503.999, 'y': 832.996},
      {'x': 501.165, 'y': 837.496},
      {'x': 498.832, 'y': 840.829},
      {'x': 497.16

## Initialize the lb annotations class

In [5]:
lb_ann = LabelBox_Annotations(labelbox_raw, categories_helpper)

## Using the converter

The `LabelBox_Annotations` from `converter.LabelBox` module, is responsible to convert the data from LabelBox format to a CCAgT format (dataframe). To read more about the LabelBox format, read the [labelbox docs](https://docs.labelbox.com/docs). In summary, the labelbox raw format is a JSON for each *IMAGE* labeled, at field **label** of each image, have the label data for all annotations of that image.

The `LabelBox_Annotations` just will get the JSON raw data from labelbox, and the auxiliary file of categories, and generate the CCAgT format (using the method `to_CCAg`).

### The to_CCAgT method

This method will do:
- Transform the JSON labels to a pandas DataFrame;
- Remove skipped images at labelling process;
- Remove / drop information not utilized for the converting process;
- Get the name of each image, based on the External ID field;
- Remove duplicated image labels, some images (~10) have been uploaded more than on time to the LabelBox, so, based on the name of the image check the duplicated cases, and keep those that have more labels;
- Transform the geometry from JSON, to a [shapely](https://pypi.org/project/Shapely/) geometry;
- Transform the DataFrame to CCAgT_Annotation class.


In [6]:
CCAgT_ann = lb_ann.to_CCAgT()

  arr = construct_1d_object_array_from_listlike(values)


In [7]:
CCAgT_ann.df

Unnamed: 0,image_name,geometry,category_id
0,H_10177_-266400_93960,"POLYGON ((474.332 783.996, 478.165 785.663, 48...",1
1,H_10177_-266400_93960,"POLYGON ((273.936 780.0170000000001, 279.936 7...",1
2,H_10177_-266400_93960,"POLYGON ((196.728 861.475, 201.394 866.1420000...",1
3,H_10177_-266400_93960,"POLYGON ((81.04000000000001 712.246, 85.874 71...",1
4,H_10177_-266400_93960,"POLYGON ((358.957 450.621, 363.207 455.621, 36...",1
...,...,...,...
243,M_2042_-257760_58320,"POLYGON ((879.61 868.979, 877.36 868.104, 874....",2
244,M_2042_-257760_58320,POINT (511.444 1178.395),3
245,M_2042_-257760_58320,"POLYGON ((1580.444 84.479, 1571.11 83.812, 156...",4
246,M_2042_-257760_58320,"POLYGON ((489.944 800.229, 483.194 803.229, 47...",4
