Skip to content

supervisely-ecosystem/group-reference-objects-into-batches

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

76 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Group reference objects into batches

OverviewHow To RunHow To UseResult JSON format

GitHub release (latest SemVer) views runs

Overview

Labeling for tagging or classification tasks becomes complex when annotation team have to deal with hundreds or thousands of tags or classes. This app groups items from catalog by one or several columns and then splits groups into small batches. This app is a part of complex tagging/classification pipline. For example, you can see all apps from retail collection.

Let's consider retail case as example:

  • We need to label product shelves: draw bounding boxes around every object and assign correct class from catalog
  • The size of catalog: 1350 unique items
  • The size of annotation team: 150 labelers
  • Images look like this:

Put bounding box around every object - it's a feasible task. But it is hard and time consuming to assign correct product identifier from huge catalog to every bbox. One of the approaches is to split catalog across all labelers: in our case 1350 unique items / 150 labelers = 9 items in a batch. Labeler will work with his batch the following way: go through all bboxes and match them with only 9 items.

Key advantages of this approach:

  • labeler knows his batch very well: it's easy to keep in mind 9 items
  • the chance of error is reduced significantly
  • if bbox is matched with one of 9 items from batch it takes just few clicks from labeler to assign correct tag

How To Run

Step 1: Add app to your team from Ecosystem if it is not there.

Step 2: Run app

Step 3: What until UI is ready

How To Use

Watch the video

Step 1: Define the path to CSV product catalog in Team Files and press Preview catalog button

Before:

After:

Step 2: Define the path to directory with JSON reference files that created with the app "Create JSON with reference items", press Preview files button, select files that should be used and then press Validate button.

Before:

After:

Step 3: Match reference item with column from CSV catalog, choose groupBy columns from catalog (order matters), define batch size and press Create groups button. Then preview groups. You can change some grouping parameters and press Create groups button again. If you satisfied with results, setup save path and press Save button. Resulting groups will be saved to Team Files in JSON format.

Before:

After:

Step 4: Stop app manually

Result JSON format

[
  {
    "batch_index": 0,
    "items_count": 3,
    "group_columns": {
      "category": "Accessories",
      "sub-category": "Portable Power Banks",
      "brand": "Samsung"
    },
    "key_col_name": "upc",
    "references": {
      "6750711": ["..."],
      "7930356": ["..."],
      "9994737": ["..."]
    },
    "references_catalog_info": {
      "6750711": {
        "brand": "Samsung",
        "name": "Samsung Universal 3100mAh Portable External Battery Charger - White",
        "upc": 6750711,
        "weight": "5.6 ounces",
        "category": "Accessories",
        "sub-category": "Portable Power Banks",
        "price": 17.99,
        "merchant": "Bestbuy.com"
      },
      "7930356": {
        "brand": "Samsung",
        "name": "Samsung Universal 3100mAh Portable External Battery Charger - White",
        "upc": 7930356,
        "weight": "5.6 ounces",
        "category": "Accessories",
        "sub-category": "Portable Power Banks",
        "price": 14.84,
        "merchant": "accessorynet"
      },
      "9994737": {
        "brand": "Samsung",
        "name": "Samsung Universal 3100mAh Portable External Battery Charger - White",
        "upc": 9994737,
        "weight": "5.6 ounces",
        "category": "Accessories",
        "sub-category": "Portable Power Banks",
        "price": 22.99,
        "merchant": "Bestbuy.com"
      }
    },
    "catalog_path": "/reference_items/1120-water-catalog.csv"
  },
  {
    "batch_index": 1,
    "...": "..."
  }
]

Result JSON - list of objects, that describe every batch of reference objects:

  • batch_index - index of the batch
  • items_count - number of items in batch
  • group_columns - the names of columns and corresponding values used to group items (groupBy operation)
  • key_col_name - name of the column in CSV catalog that is used to match reference item with correct row from product catalog
  • references - dictionary with reference examples for every item (format is the same as in reference items format)
  • references_catalog_info - information from catalog for every reference item
  • catalog_path - path to the catalog in Team Files