# Annotate a new image dataset using PyLabel and jupyter-bbox-widget
Start from a new dataset without annotations and use [jupyter-bbox-widget](https://github.com/gereleth/jupyter-bbox-widget) and PyLabel to label images and save the annotations in coco, voc, or yolo format--all within a Jupyter notebook. 

In [1]:
import logging
logging.getLogger().setLevel(logging.CRITICAL)
!pip install pylabel > /dev/null


In [2]:
from pylabel import importer

## Import Images to Create a New Dataset
In this example there are no annotations created yet. The path should be the path to a directory with the images that you want to annotate.

In [3]:
import os, zipfile

#Download sample yolo dataset 
os.makedirs("data", exist_ok=True)
!wget "https://github.com/ultralytics/yolov5/releases/download/v1.0/coco128.zip" -O data/coco128.zip
with zipfile.ZipFile("data/coco128.zip", 'r') as zip_ref:
   zip_ref.extractall("data")

--2021-11-07 09:45:51--  https://github.com/ultralytics/yolov5/releases/download/v1.0/coco128.zip
Resolving github.com (github.com)... 192.30.255.113
Connecting to github.com (github.com)|192.30.255.113|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://objects.githubusercontent.com/github-production-release-asset-2e65be/264818686/7a208a00-e19d-11eb-94cf-5222600cc665?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20211107%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20211107T174551Z&X-Amz-Expires=300&X-Amz-Signature=90b601b583e3e4b779986a8197d42e7e0f8cf154f9c7530228db08c765c22261&X-Amz-SignedHeaders=host&actor_id=0&key_id=0&repo_id=264818686&response-content-disposition=attachment%3B%20filename%3Dcoco128.zip&response-content-type=application%2Foctet-stream [following]
--2021-11-07 09:45:51--  https://objects.githubusercontent.com/github-production-release-asset-2e65be/264818686/7a208a00-e19d-11eb-94cf-5222600cc665?X-Amz-Algori

In [4]:
path_to_images = "data/coco128/images/train2017"
dataset = importer.ImportImagesOnly(path=path_to_images, ends_with=".jpg", name="coco128")
dataset.df.head()

Unnamed: 0_level_0,img_folder,img_filename,img_path,img_id,img_width,img_height,img_depth,ann_segmented,ann_bbox_xmin,ann_bbox_ymin,...,ann_area,ann_segmentation,ann_iscrowd,ann_pose,ann_truncated,ann_difficult,cat_id,cat_name,cat_supercategory,split
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,,000000000612.jpg,,0,640,480,3,,,,...,,,,,,,,,,
1,,000000000404.jpg,,1,426,640,3,,,,...,,,,,,,,,,
2,,000000000438.jpg,,2,640,480,3,,,,...,,,,,,,,,,
3,,000000000389.jpg,,3,640,480,3,,,,...,,,,,,,,,,
4,,000000000564.jpg,,4,520,640,3,,,,...,,,,,,,,,,


## Edit Annotations
Use the jupyter_bbox_widget to inspect, edit, and save annotations without leaving the Jupyter notebook. 

In [5]:
classes = ['person','boat', 'bear', "cousin"]
dataset.labeler.UseBBoxWidget(new_classes=classes)

VBox(children=(IntProgress(value=0, description='Progress', max=128), BBoxWidget(classes=['cousin', 'bear', 'p…

# Instructions 
- Select class 'bird' in the above widget
- Draw a box around the owl 
- Click **Submit**

When you click submit the annotations for that image are updated. Run the cell below to verify that there are now 2 annotations for that image. 

You can repeat the steps to add and view additional bounding boxes. 

In [10]:
dataset.df['cat_name'].value_counts()


          127
cousin      1
Name: cat_name, dtype: int64

In [7]:
#Export the annotations in Yolo format
dataset.path_to_annotations = 'data/coco128/labels/newlabels/'
os.makedirs(dataset.path_to_annotations, exist_ok=True)
dataset.export.ExportToYoloV5()

#View the Yolo annotations for the above image
!cat data/coco128/labels/newlabels/../../images/train2017/000000000078.txt


74 0.762851 0.196119 0.349886 0.385474


In [31]:
from math import isnan
mimi = dataset.df[dataset.df.cat_name=='cousin'].copy()

categories  = dict(zip(dataset.df.cat_name, dataset.df.cat_id))
#Remove invalid entries
categories.pop("", None)
categories = {k: v for k, v in categories.items() if not isnan(v)}

#widget_output['cat_id'] = GetCatId(widget_output['cat_name'], categories)




mimi

Unnamed: 0,img_folder,img_filename,img_path,img_id,img_width,img_height,img_depth,ann_segmented,ann_bbox_xmin,ann_bbox_ymin,...,ann_area,ann_segmentation,ann_iscrowd,ann_pose,ann_truncated,ann_difficult,cat_id,cat_name,cat_supercategory,split
127,,000000000612.jpg,,0,640,480,3,,158,293,...,9078,,,,,,0,cousin,,


{}

In [21]:
[v for v in categories.values()]

[nan]