Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[unreleased]

New features

Enhancements

Bug fixes

Q3 2024 Release 1.8.0

New features

Add TabularValidator (#1498)
Add Clean Transform for tabular data type (#1520)

Enhancements

Set label name with parents to avoid duplicates for AstypeAnnotations (#1492)
Pass Keyword Argument to TabularDataBase (#1522)
Support hierarchical structure for ImageNet dataset format (#1528)
Enable dtype argument when calling media.data (#1546)

Bug fixes

Preserve end_frame information of a video when it is zero. (#1541)
Changed the Datumaro format to ensure exported videos have relative paths and to prevent the same video from being overwritten. (#1547)

Q2 2024 Release 1.7.0

New features

Support 'Video' media type in datumaro format (#1491)
Add ann_types property for dataset (#1422, #1479)
Add AnnotationType.rotated_bbox for oriented object detection (#1459)
Add DOTA data format for oriented object detection task (#1475)
Add AstypeAnnotations Transform (#1484)
Enhance DatasetItem annotations for semantic segmentation model training use case (#1503)
Add TabularValidator (#1498)
Add Clean Transform for tabular data type (#1520)
Add notebook for data handling of kaggle dataset (#1534)

Enhancements

Fix ambiguous COCO format detector (#1442)
Get target information for tabular dataset (#1471)
Add ExtractedMask and update importers who can use it to use it (#1480)
Improve PIL and COLOR_BGR context image decode performance (#1501)
Improve get_area() of Polygon through Shoelace formula (#1507)
Improve _Shape point converter (#1508)

Bug fixes

Split the video directory into subsets to avoid overwriting (#1485)
Doc update to replace --save-images is replaced with --save-media (#1514)

May 2024 Release 1.6.1

Enhancements

Prevent AcLauncher for OpenVINO 2024.0 (#1450)

Bug fixes

Modify lxml dependency constraint (#1460)
Fix CLI error occurring when installed with default option only (#1444, #1454)
Relax Pillow dependency constraint (#1436)
Modify Numpy dependency constraint (#1435)
Relax old pandas version constraint (#1467)

Apr. 2024 Release 1.6.0

New features

Changed supported Python version range (>=3.9, <=3.11) (#1269)
Support MMDetection COCO format (#1213)
Develop JsonSectionPageMapper in Rust API (#1224)
Add Filtering via User-Provided Python Functions (#1230, #1233)
Remove supporting MacOS platform (#1235)
Support Kaggle image data (KaggleImageCsvBase, KaggleImageTxtBase, KaggleImageMaskBase, KaggleVocBase, KaggleYoloBase) (#1240)
Add __getitem__() for random accessing with O(1) time complexity (#1247)
Add Data-aware Anchor Generator (#1251)
Support bounding box import within Kaggle extractors and add KaggleCocoBase (#1273)

Enhancements

Optimize Python import to make CLI entrypoint faster (#1182)
Add ImageColorScale context manager (#1194)
Enhance visualizer to toggle plot title visibility (#1228)
Enhance Datumaro data format detect() to be memory-bounded and performant (#1229)
Change RoIImage and MosaicImage to have np.uint8 dtype as default (#1245)
Enable image backend and color channel format to be selectable (#1246)
Boost up CityscapesBase and KaggleImageMaskBase by dropping np.unique (#1261)
Enhance RISE algortihm for explainable AI (#1263)
Enhance explore unit test to use real dataset from ImageNet (#1266)
Fix each method of the comparator to be used separately (#1290)
Bump ONNX version to 1.16.0 (#1376)
Print the color channel format (RGB) for datum stats command (#1389)
Add ignore_index argument to Mask.as_class_mask() and Mask.as_instance_mask() (#1409)

Bug fixes

Fix wrong example of Datumaro dataset creation in document (#1195)
Fix wrong command to install datumaro from github (#1202, #1207)
Update document to correct wrong datum project import command and add filtering example to filter out items containing annotations. (#1210)
Fix label compare of distance method (#1205)
Fix Datumaro visualizer's import errors after introducing lazy import (#1220)
Fix broken link to supported formats in readme (#1221)
Fix Kinetics data format to have media data (#1223)
Handling undefined labels at the annotation statistics (#1232)
Add unit test for item rename (#1237)
Fix a bug in the previous behavior when importing nested datasets in the project (#1243)
Fix Kaggle importer when adding duplicated labels (#1244)
Fix input tensor shape in model interpreter for OpenVINO 2023.3 (#1251)
Add default value for target in prune cli (#1253)
Remove deprecated MediaManager (#1262)
Fix explore command without project (#1271)
Fix enable COCO to import only bboxes (#1360)
Fix resize transform for RleMask annotation
(#1361)
Fix import YOLO variants from extractor when urls is not specified (#1362)

Jan. 2024 Release 1.5.2

Enhancements

Add memory bounded datumaro data format detect to release 1.5.1 (#1241)
Bump version string to 1.5.2 (#1249)
Remove Protobuf version limitation (<4) (#1248)

Nov. 2023 Release 1.5.1

Enhancements

Enhance Datumaro data format stream importer performance (#1153)
Change image default dtype from float32 to uint8 (#1175)
Add comparison level-up doc (#1174)
Add ImportError to catch GitPython import error (#1174)

Bug fixes

Modify the draw function in the visualizer not to raise an error for unsupported annotation types. (#1180)
Correct explore path in the related document. (#1176)
Fix errata in the voc document. Color values in the labelmap.txt should be separated by commas, not colons. (#1162)
Fix hyperlink errors in the document (#1159, #1161)
Fix memory unbounded Arrow data format export/import (#1169)
Update CVAT format doc to bypass warning (#1183)

15/09/2023 - Release 1.5.0

New features

Add SAMAutomaticMaskGeneration transform (#1168)
Add tabular data import/export (#1089)
Support video annotation import/export (#1124)
Add multiframework (PyTorch, Tensorflow) converter (#1125)
Add SAM OVMS and Triton server Docker image builders (#1129)
Add SAMBboxToInstanceMask transform (#1133, #1134)
Add ConfigurableValidator (#1142)

Enhancements

Enhance ClassificationValidator for multi-label classification datasets with label_groups (#1116)
Replace Roboflow xml.etree with defusedxml (#1117)
Define GroupType with IntEnum for, where 0 is EXCLUSIVE (#1116)
Add Rust API to optimize COCOPageMapper performance (#1120)
Support a dictionary input in addition to a single image input for the model launcher to support Segment Anything Model (#1133)
Remove deprecates announced to be removed in 1.5.0 (#1140)
Add multi-threading option to ModelTransform and SAMBboxToInstanceMask (#1145, #1149)

Bug fixes

Coco exporter can export annotations even if there is no media, except for mask annotations which require media info. (#1147)(#1158)
Fix bugs for Tile transform (#1123)
Disable Roboflow Tfrecord format when Tensorflow is not installed (#1130)
Raise VcsAlreadyExists error if vcs directory exists (#1138)

27/07/2023 - Release 1.4.1

Bug fixes

Report errors for COCO (stream) and Datumaro importers (#1110)

21/07/2023 - Release 1.4.0

New features

Add documentation and notebook example for Prune API (#1070)
Changed supported Python version range (>=3.8, <=3.11) (#1083)
Migrate OpenVINO v2023.0.0 (#1036)
Add Roboflow data format support (COCO JSON, Pascal VOC XML, YOLOv5-PyTorch, YOLOv7-PyTorch, YOLOv8, YOLOv5 Oriented Bounding Boxes, Multiclass CSV, TFRecord, CreateML JSON) (#1044)
Add MissingAnnotationDetection transform (#1049, #1063, #1064)
Add OVMSLauncher (#1056)
Add Prune API (#1058)
Add TritonLauncher (#1059)
Migrate DVC v3.0.0 (#1072)
Stream dataset import/export (#1077, #1081, #1082, #1091, #1093, #1098, #1102)
Support mask annotations for CVAT data format (#1078)

Enhancements

Support list query for explorer (#1087)
update contributing.md (#1094)
Update 3rd-party.txt for release 1.4.0 (#1099)
Give notice that the deprecation works will be done in datumaro==1.5.0 (#1085)
Unify COCO, Datumaro, VOC, YOLO importer/exporter progress reporter descriptions (#1100)
Enhance import performance for built-in plugins (#1031)
Change default dtype of load_image() to np.uint8 (#1041)
Add OTX ATSS detector model interpreter & refactor interfaces (#1047)
Refactor Launcher and ModelInterpreter (#1055)
Add CVAT data format document (#1060)
Reduce peak memory usage when importing COCO and Datumaro formats (#1061)
Enhance the error message for datum stats to be more user friendly (#1069)
Refactor dataset.py to seperate DatasetStorage (#1073)

Bug fixes

Create cache dir under only writable filesystem (#1088)
Fix: Dataset infos() can be broken if a transform not redefining infos() is stacked on the top (#1101)
Fix warnings in test_visualizer.py (#1039)
Fix LabelMe data format (#1053)
Prevent installing protobuf>=4 (#1054)
Fix UnionMerge (#1086)

26/05/2023 - Release 1.3.2

Enhancements

Let CocoBase continue even if an InvalidAnnotationError is raised (#1050)

Bug fixes

Install dvc version to 2.x (#1048)
Replace np.append() in Validator (#1050)

26/05/2023 - Release 1.3.1

Bug fixes

Fix Cityscapes format mis-detection (#1029)

25/05/2023 - Release 1.3.0

New features

Add CocoRoboflowImporter (#976, #1000)
Add SynthiaSfImporter and SynthiaAlImporter (#987)
Add intermediate skill docs for filter (#996)
Add VocInstanceSegmentationImporter and VocInstanceSegmentationExporter (#997)
Add Segment Anything data format support (#1005, #1009)
Add Correct transformation (#1006)
Implement ReindexAnnotations transform (#1008)
Add notebook examples for importing/exporting detection and segmentation data (#1020, #1023)
Update CLI from diff to compare, add TableComparator (#1012)

Enhancements

Use autosummary for fully-automatic Python module docs generation (#973)
Enrich stack trace for better user experience when importing (#992)
Save and load hashkey for explorer (#981) (#1003)
Add MOT and MOTS data format docs (#999)
Improve RemoveAnnotations to remove specific annotations with ids (#1004)
Add Jupyter notebook example of noisy label detection for detection tasks (#1011)

Bug fixes

Fix Mapillary Vistas data format (#977)
Fix bytes property returning None if function is given to data (#978)
Fix Synthia-Rand data format (#987)
Fix person_layout categories and action_classification attributes in imported Pascal-VOC dataset (#997)
Drop a malformed transform from StackedTransform automatically (#1001)
Fix Cityscapes to drop ImgsFine directory (#1023)

04/05/2023 - Release 1.2.1

Bug fixes

Fix project level CVAT for images format import (#980)
Fix an info message when using the convert CLI command with no args.input_format (#982)
Fix media contents not returning bytes in arrow format (#986)

20/04/2023 - Release 1.2.0

New features

Add Skill Up section to documentation (#920, #933, #935, #945, #949, #953, #959, #960, #967)
Add LossDynamicsAnalyzer for noisy label detection (#928)
Add Apache Arrow format support (#931, #948)
Add sort transform (#931)

Enhancements

Add multiprocessing to DatumaroBinaryBase (#897)
Refactor merge code (#901, #906)
Refactor download CLI commands (#909)
Refactor CLI commands w/ and w/o project (#910, #952)
Refactor Media to be initialized from explicit sources (#911 #921, #944)
Refactor hl_ops.py (#912)
Add tfds:uc_merced and tfds:eurosat download (#914)
Migrate documentation framework to Sphinx (#917, #922, #947, #954, #958, #961, #962, #963, #964, #965, #969)
Update merge tutorial for real life usecase (#930)
Abbreviate "detect-format" to "detect" for prettifying (#951)

Bug fixes

Add UserWarning if an invalid media_type comes to image statistics computation (#891)
Fix negated is_encrypted (#907)
Save extra images of PointCloud when exporting to datumaro format (#918)
Fix log issue when importing celeba and align celeba dataset (#919)

28/03/2023 - Release 1.1.1

Bug fixes

Fix to not export absolute media path in Datumaro and DatumaroBinary formats (#896)
Change pypi_publish.yml to publish_sdist_to_pypi.yml (#895)

23/03/2023 - Release 1.1.0

New features

Add with_subset_dirs decorator (Add ImagenetWithSubsetDirsImporter) (#816)
Add CommonSemanticSegmentationWithSubsetDirsImporter (#826)
Add DatumaroBinary format (#828, #829, #830, #831, #880, #883)
Add Explorer CLI documentation (#838)
Add version to dataset exported as datumaro format (#842)
Add Ava action data format support (#847)
Add Shift Analyzer (both covariate and label shifts) (#855)
Add YOLO Loose format (#856)
Add Ultralytics YOLO format (#859)

Enhancements

Refactor Datumaro format code and test code (#824)
Add publish to PyPI Github action (#867)
Add --no-media-encryption option (#875)

Bug fixes

Fix image filenames and anomaly mask appearance in MVTec exporter (#835)
Fix CIFAR10 and 100 detect function (#836)
Fix celeba and align_celeba detect function (#837)
Choose the top priority detect format for all directory depths (#839)
Fix MVTec format detect function (#843)
Fix wrong __len__() of Subset when the item is removed (#854)
Fix mask visualization bug (#860)
Fix detect unit tests to test false negatives as well (#868)

24/02/2023 - Release v1.0.0

New features

Add Data Explorer (#773)
Add Ellipse annotation type (#807)
Add MVTec anomaly data support (#810)

Enhancements

Refactor existing tests (#803)
Raise ImportError on importing malformed COCO directory (#812)
Remove the duplicated and cyclical category context in documentation (#822)

Bug fixes

Fix for importing CVAT image 1.1 data format exported to project level (#795)
Fix a problem on setting log-level via CLI (#800)
Fix code format with the latest black==23.1.0 (#802)
Fix Explain command cannot find the model (#721) (#804)
Fix a problem found on model remove CLI command (#805)

27/01/2023 - Release v0.5.0

New features

Add Tile transformation (#790)
Add Video keyframe extraction (#791)
Add TileTransform documentation and Jupyter notebook example (#794)
Add MergeTile transformation (#796)

Enhancements

Improved mask_to_rle performance (#770)

Deprecated

N/A

Removed

N/A

Bug fixes

Fix MacOS CI failures (#789)
Fix auto-documentation for the data_format plugins (#793)

Security

Add security.md file for the SDL (#798)

06/12/2022 - Release v0.4.0.1

New features

Support for exclusive of labels with LabelGroup (#742)
Jupyter samples
- Introducing how to merge datasets (#738)
- Introducing how to visualize dataset (#747)
- Introducing how to filter dataset (#748)
- Introducing how to transform dataset (#759)
Visualization Python API
- Bbox feature (#744)
- Label, Points, Polygon, PolyLine, and Caption visualization features (#746)
- Mask, SuperResolution, Depth visualization features (#747)
Documentation for Python API (#753)
- dataset handler, visualizer, filter descriptions (#761)
__repr__ for Dataset (#750)
Support for exporting as CVAT video format (#757)
CodeCov coverage reporting feature to CI/CD (#756)
Jupyter notebook example rendering to documentation (#758)
An interface to manipulate 'infos' to store the dataset meta-info (#767)
'bbox' annotation when importing a COCO dataset (#772)

Enhancements

Wrap title text according to its plot width (#769)
Get list of subsets and support only Image media type in visualizer (#768)

Deprecated

N/A

Removed

N/A

Bug fixes

Correcting static type checking (#743)
Fixing a VOC dataset export when a label contains 'space' (#771)

Security

N/A

06/09/2022 - Release v0.3.1

New features

Support for custom media types, new PointCloud media type, DatasetItem.media and .media_as(type) members (#539)
[API] A way to request dataset and extractor media type with media_type (#539)
BraTS format (import-only) (.npy and .nii.gz), new MultiframeImage media type (#628)
Common Semantic Segmentation dataset format (import-only) (#685)
An option to disable data/ prefix inclusion in YOLO export (#689)
New command describe-downloads to print information about downloadable datasets (#678)
Detection for Cityscapes format (#680)
Maximum recursion --depth parameter for detect-dataset CLI command (#680)
An option to save a single subset in the download command (#697)
Common Super Resolution dataset format (import-only) (#700)
Kinetics 400/600/700 dataset format (import-only) (#706)
NYU Depth Dataset V2 format (import-only) (#712)

Enhancements

env.detect_dataset() now returns a list of detected formats at all recursion levels instead of just the lowest one (#680)
Open Images: allowed to store annotations file in root path as well (#680)
Improved parsing error messages in COCO, VOC and YOLO formats (#684, #686, #687)
YOLO format now supports almost any subset names, except backup, names and classes (instead of just train and valid). The reserved names now raise an error on exporting. (#688)

Deprecated

--save-images is replaced with --save-media in CLI and converter API (#539)
[API] image, point_cloud and related_images of DatasetItem are replaced with media and media_as(type) members and c-tor parameters (#539)

Removed

N/A

Bug fixes

Detection for LFW format (#680)
Adding depth value of image when dataset is exported in VOC format (#726)
Adding to handle the numerical labels in task chains properly (#726)
Fixing the issue that annotations inside another annotation (polygon) are duplicated during import for VOC format (#726)

Security

N/A

21/02/2022 - Release v0.3

New features

Ability to import a video as frames with the video_frames format and to split a video into frames with the datum util split_video command (#555)
--subset parameter in the image_dir format (#555)
MediaManager API to control loaded media resources at runtime (#555)
Command to detect the format of a dataset (#576)
More comfortable access to library API via import datumaro (#630)
CLI command-like free functions (export, transform, ...) (#630)
Reading specific annotation files for train dataset in Cityscapes (#632)
Random sampling transforms (random_sampler, label_random_sampler) to create smaller datasets from bigger ones (#636, #640)
API to report dataset import and export progress; API to report dataset import and export errors and take action (skip, fail) (supported in COCO, VOC and YOLO formats) (#650)
Support for downloading the ImageNetV2 and COCO datasets (#653, #659)
A way for formats to signal that they don't support detection (#665)
Removal transforms to remove items/annoations/attributes from dataset (remove_items, remove_annotations, remove_attributes) (#670)

Enhancements

Allowed direct file paths in datum import. Such sources are imported like when the rpath parameter is specified, however, only the selected path is copied into the project (#555)
Improved stats performance, added new filtering parameters, image stats (unique, repeated) moved to the dataset section, removed mean and std from the dataset section (#621)
Allowed Image creation from just size info (#634)
Added image search in VOC XML-based subformats (#634)
Added image path equality checks in simple merge, when applicable (#634)
Supported saving box attributes when downloading the TFDS version of VOC (#668)
Switched to a pyproject.toml-based build (#671)

Deprecated

TBD

Removed

Official support of Python 3.6 (due to it's EOL) (#617)
Backward compatibility annotation symbols in components.extractor (#630)

Bug fixes

Prohibited calling add, import and export commands without a project (#555)
Calling make_dataset on empty project tree now produces the error properly (#555)
Saving (overwriting) a dataset in a project when rpath is used (#613)
Output image extension preserving in the Resize transform (#606)
Memory overuse in the Resize transform (#607)
Invalid image pixels produced by the Resize transform (#618)
Numeric warnings that sometimes occurred in stats command (e.g. #607) (#621)
Added missing item attribute merging in simple merge (#634)
Inability to disambiguate VOC from LabelMe in some cases (#658)

Security

TBD

28/01/2022 - Release v0.2.3

New features

Command to download public datasets (#582)
Extension autodetection in ByteImage (#595)
MPII Human Pose Dataset (import-only) (.mat and .json) (#584)
MARS format (import-only) (#585)

Enhancements

The pycocotools dependency lower bound is raised to 2.0.4. (#449)
smooth_line from datumaro.util.annotation_util - the function is renamed to approximate_line and has updated interface (#592)

Deprecated

Python 3.6 support

Removed

TBD

Bug fixes

Fails in multimerge when lines are not approximated and when there are no label categories (#592)
Cannot convert LabelMe dataset, that has no subsets (#600)

Security

TBD

24/12/2021 - Release v0.2.2

New features

Video reading API (#521)
Python API documentation (#526)
Mapillary Vistas dataset format (Import-only) (#537)
Datumaro can now be installed on Windows on Python 3.9 (#547)
Import for SYNTHIA dataset format (#532)
Support of score attribute in KITTI detetion (#571)
Support for Accuracy Checker dataset meta files in formats (#553, #569, #575)
Import for VoTT dataset format (#573)
Image resizing transform (#581)

Enhancements

The following formats can now be detected unambiguously: ade20k2017, ade20k2020, camvid, coco, cvat, datumaro, icdar_text_localization, icdar_text_segmentation, icdar_word_recognition, imagenet_txt, kitti_raw, label_me, lfw, mot_seq, open_images, vgg_face2, voc, widerface, yolo (#531, #536, #550, #557, #558)
Allowed Pytest-native tests (#563)
Allowed export options in the datum merge command (#545)

Deprecated

Using Image, ByteImage from datumaro.util.image - these classes are moved to datumaro.components.media (#538)

Removed

Equality comparison support between datumaro.components.media.Image and numpy.ndarray (#568)

Bug fixes

Bug #560: import issue with MOT dataset when using seqinfo.ini file (#564)
Empty lines in VOC subset lists are not ignored (#587)

Security

TBD

16/11/2021 - Release v0.2.1

New features

Import for CelebA dataset format. (#484)

Enhancements

File people.txt became optional in LFW (#509)
File image_ids_and_rotation.csv became optional Open Images (#509)
Allowed underscores (_) in subset names in COCO (#509)
Allowed annotation files with arbitrary names in COCO (#509)
The icdar_text_localization format is no longer detected in every directory (#531)
Updated pycocotools version to 2.0.2 (#534)

Deprecated

TBD

Removed

TBD

Bug fixes

Unhandled exception when a file is specified as the source for a COCO or MOTS dataset (#530)
Exporting dataset without color attribute into the icdar_text_segmentation format (#556)

Security

TBD

14/10/2021 - Release v0.2

New features

A new installation target: pip install datumaro[default], which should be used by default. The simple datumaro is supposed for library users. (#238)
Dataset and project versioning capabilities (Git-like) (#238)
"dataset revpath" concept in CLI, allowing to pass a dataset path with the dataset format in diff, merge, explain and info CLI commands (#238)
import, remove, commit, checkout, log, status, info CLI commands (#238)
Coco*Extractor classes now have an option to preserve label IDs from the original annotation file (#453)
patch CLI command to patch datasets (#401)
ProjectLabels transform to change dataset labels for merging etc. (#401, #478)
Support for custom labels in the KITTI detection format (#481)
Type annotations and docs for Annotation classes (#493)
Options to control label loading behavior in imagenet_txt import (#434, #489)

Enhancements

A project can contain and manage multiple datasets instead of a single one. CLI operations can be applied to the whole project, or to separate datasets. Datasets are modified inplace, by default (#328)
CLI help for builtin plugins doesn't require project (#328)
Annotation-related classes were moved into a new module, datumaro.components.annotation (#439)
Rollback utilities replaced with Scope utilities (#444)
The Project class from datumaro.components is changed completely (#238)
diff and ediff are joined into a single diff CLI command (#238)
Projects use new file layout, incompatible with old projects. An old project can be updated with datum project migrate (#238)
Inheriting CliPlugin is not required in plugin classes (#238)
Importers do not create Projects anymore and just return a list of extractor configurations (#238)

Deprecated

TBD

Removed

import, project merge CLI commands (#238)
Support for project hierarchies. A project cannot be a source anymore (#238)
Project cannot have independent internal dataset anymore. All the project data must be stored in the project data sources (#238)
datumaro_project format (#238)
Unused path field of DatasetItem (#455)

Bug fixes

Deprecation warning in open_images_format.py (#440)
lazy_image returning unrelated data sometimes (#409)
Invalid call to pycocotools.mask.iou (#450)
Importing of Open Images datasets without image data (#463)
Return value type in Dataset.is_modified (#401)
Remapping of secondary categories in RemapLabels (#401)
VOC dataset patching for classification and segmentation tasks (#478)
Exported mask label ids in KITTI segmentation (#481)
Missing label for Points read in the LFW format (#494)

Security

TBD

24/08/2021 - Release v0.1.11

New features

The Open Images format now supports bounding box and segmentation mask annotations (#352, #388).
Bounding boxes values decrement transform (#366)
Improved error reporting in Dataset (#386)
Support ADE20K format (import only) (#400)
Documentation website at https://openvinotoolkit.github.io/datumaro (#420)

Enhancements

Datumaro no longer depends on scikit-image (#379)
Dataset remembers export options on saving / exporting for the first time (#386)

Deprecated

TBD

Removed

TBD

Bug fixes

Application of remap_labels to dataset categories of different length (#314)
Patching of datasets in formats (#348)
Improved Cityscapes export performance (#367)
Incorrect format of *_labelIds.png in Cityscapes export (#325, #342)
Item id in ImageNet format (#371)
Double quotes for ICDAR Word Recognition (#375)
Wrong display of builtin formats in CLI (#332)
Non utf-8 encoding of annotation files in Market-1501 export (#392)
Import of ICDAR, PASCAL VOC and VGGFace2 images from subdirectories on WIndows (#392)
Saving of images with Unicode paths on Windows (#392)
Calling ProjectDataset.transform() with a string argument (#402)
Attributes casting for CVAT format (#403)
Loading of custom project plugins (#404)
Reading, writing anno file and saving name of the subset for test subset (#447)

Security

Fixed unsafe unpickling in CIFAR import (#362)

14/07/2021 - Release v0.1.10

New features

Support for import/export zip archives with images (#273)
Subformat importers for VOC and COCO (#281)
Support for KITTI dataset segmentation and detection format (#282)
Updated YOLO format user manual (#295)
ItemTransform class, which describes item-wise dataset Transforms (#297)
keep-empty export parameter in VOC format (#297)
A base class for dataset validation plugins (#299)
Partial support for the Open Images format; only images and image-level labels can be read/written (#291, #315).
Support for Supervisely Point Cloud dataset format (#245, #353)
Support for KITTI Raw / Velodyne Points dataset format (#245)
Support for CIFAR-100 and documentation for CIFAR-10/100 (#301)

Enhancements

Tensorflow AVX check is made optional in API and disabled by default (#305)
Extensions for images in ImageNet_txt are now mandatory (#302)
Several dependencies now have lower bounds (#308)

Deprecated

TBD

Removed

TBD

Bug fixes

Incorrect image layout on saving and a problem with ecoding on loading (#284)
An error when XPath filter is applied to the dataset or its subset (#259)
Tracking of Dataset changes done by transforms (#297)
Improved CLI startup time in several cases (#306)

Security

Known issue: loading CIFAR can result in arbitrary code execution (#327)

03/06/2021 - Release v0.1.9

New features

Support for escaping in attribute values in LabelMe format (#49)
Support for Segmentation Splitting (#223)
Support for CIFAR-10/100 dataset format (#225, #243)
Support for COCO panoptic and stuff format (#210)
Documentation file and integration tests for Pascal VOC format (#228)
Support for MNIST and MNIST in CSV dataset formats (#234)
Documentation file for COCO format (#241)
Documentation file and integration tests for YOLO format (#246)
Support for Cityscapes dataset format (#249)
Support for Validator configurable threshold (#250)

Enhancements

LabelMe format saves dataset items with their relative paths by subsets without changing names (#200)
Allowed arbitrary subset count and names in classification and detection splitters (#207)
Annotation-less dataset elements are now participate in subset splitting (#211)
Classification task in LFW dataset format (#222)
Testing is now performed with pytest instead of unittest (#248)

Deprecated

TBD

Removed

TBD

Bug fixes

Added support for auto-merging (joining) of datasets with no labels and having labels (#200)
Allowed explicit label removal in remap_labels transform (#203)
Image extension in CVAT format export (#214)
Added a label "face" for bounding boxes in Wider Face (#215)
Allowed adding "difficult", "truncated", "occluded" attributes when converting to Pascal VOC if these attributes are not present (#216)
Empty lines in YOLO annotations are ignored (#221)
Export in VOC format when no image info is available (#239)
Fixed saving attribute in WiderFace extractor (#251)

Security

TBD

31/03/2021 - Release v0.1.8

New features

TBD

Enhancements

Added an option to allow undeclared annotation attributes in CVAT format export (#192)
COCO exports images in separate dirs by subsets. Added an option to control this (#195)

Deprecated

TBD

Removed

TBD

Bug fixes

Instance masks of background class no more introduce an instance (#188)
Added support for label attributes in Datumaro format (#192)

Security

TBD

24/03/2021 - Release v0.1.7

New features

OpenVINO plugin examples (#159)
Dataset validation for classification and detection datasets (#160)
Arbitrary image extensions in formats (import and export) (#166)
Ability to set a custom subset name for an imported dataset (#166)
CLI support for NDR(#178)

Enhancements

Common ICDAR format is split into 3 sub-formats (#174)

Deprecated

TBD

Removed

TBD

Bug fixes

The ability to work with file names containing Cyrillic and spaces (#148)
Image reading and saving in ICDAR formats (#174)
Unnecessary image loading on dataset saving (#176)
Allowed spaces in ICDAR captions (#182)
Saving of masks in VOC when masks are not requested (#184)

Security

TBD

03/02/2021 - Release v0.1.6.1 (hotfix)

New features

TBD

Enhancements

TBD

Deprecated

TBD

Removed

TBD

Bug fixes

Images with no annotations are exported again in VOC formats (#123)
Inference result for only one output layer in OpenVINO launcher (#125)

Security

TBD

02/26/2021 - Release v0.1.6

New features

Icdar13/15 dataset format (#96)
Laziness, source caching, tracking of changes and partial updating for Dataset (#102)
Market-1501 dataset format (#108)
LFW dataset format (#110)
Support of polygons' and masks' confusion matrices and mismathing classes in diff command (#117)
Add near duplicate image removal plugin (#113)
Sampler Plugin that analyzes inference result from the given dataset and selects samples for annotation(#115)

Enhancements

OpenVINO model launcher is updated for OpenVINO r2021.1 (#100)

Deprecated

TBD

Removed

TBD

Bug fixes

High memory consumption and low performance of mask import/export, #53 (#101)
Masks, covered by class 0 (background), should be exported with holes inside (#104)
diff command invocation problem with missing class methods (#117)

Security

TBD

01/23/2021 - Release v0.1.5

New features

WiderFace dataset format (#65, #90)
Function to transform annotations to labels (#66)
Dataset splits for classification, detection and re-id tasks (#68, #81)
VGGFace2 dataset format (#69, #82)
Unique image count statistic (#87)
Installation with pip by name datumaro

Enhancements

Dataset class extended with new operations: save, load, export, import_from, detect, run_model (#71)
Allowed importing Extractor-only defined formats (in Project.import_from, dataset.import_from and CLI/project import) (#71)
datum project ... commands replaced with datum ... commands (#84)
Supported more image formats in ImageNet extractors (#85)
Allowed adding Importer-defined formats as project sources (source add) (#86)
Added max search depth in ImageDir format and importers (#86)

Deprecated

datum project ... CLI context (#84)

Removed

TBD

Bug fixes

Allow plugins inherited from Extractor (instead of only SourceExtractor) (#70)
Windows installation with pip for pycocotools (#73)
YOLO extractor path matching on Windows (#73)
Fixed inplace file copying when saving images (#76)
Fixed labelmap parameter type checking in VOC converter (#76)
Fixed model copying on addition in CLI (#94)

Security

TBD

12/10/2020 - Release v0.1.4

New features

CamVid dataset format (#57)
Ability to install opencv-python-headless dependency with DATUMARO_HEADLESS=1 environment variable instead of opencv-python (#62)

Enhancements

Allow empty supercategory in COCO (#54)
Allow Pascal VOC to search in subdirectories (#50)

Deprecated

TBD

Removed

TBD

Bug fixes

TBD

Security

TBD

10/28/2020 - Release v0.1.3

New features

ImageNet and ImageNetTxt dataset formats (#41)

Enhancements

TBD

Deprecated

TBD

Removed

TBD

Bug fixes

Default label-map parameter value for VOC converter (#34)
Randomness of random split transform (#38)
Transform.subsets() method (#38)
Supported unknown image formats in TF Detection API converter (#40)
Supported empty attribute values in CVAT extractor (#45)

Security

TBD

10/05/2020 - Release v0.1.2

New features

ByteImage class to represent encoded images in memory and avoid recoding on save (#27)

Enhancements

Implementation of format plugins simplified (#22)
default is now a default subset name, instead of None. The values are interchangeable. (#22)
Improved performance of transforms (#22)

Deprecated

TBD

Removed

image/depth value from VOC export (#27)

Bug fixes

Zero division errors in dataset statistics (#31)

Security

TBD

09/24/2020 - Release v0.1.1

New features

reindex option in COCO and CVAT converters (#18)
Support for relative paths in LabelMe format (#19)
MOTS png mask format support (https://github.com/openvinotoolkit/datumaro/21)

Enhancements

TBD

Deprecated

TBD

Removed

TBD

Bug fixes

TBD

Security

TBD

09/10/2020 - Release v0.1.0

New features

Initial release

Template

## [Unreleased]
### New features
- TBD

### Enhancements
- TBD

### Deprecated
- TBD

### Removed
- TBD

### Bug fixes
- TBD

### Security
- TBD

Files

CHANGELOG.md

Latest commit

History

CHANGELOG.md

File metadata and controls

Changelog

[unreleased]

New features

Enhancements

Bug fixes

Q3 2024 Release 1.8.0

New features

Enhancements

Bug fixes

Q2 2024 Release 1.7.0

New features

Enhancements

Bug fixes

May 2024 Release 1.6.1

Enhancements

Bug fixes

Apr. 2024 Release 1.6.0

New features

Enhancements

Bug fixes

Jan. 2024 Release 1.5.2

Enhancements

Nov. 2023 Release 1.5.1

Enhancements

Bug fixes

15/09/2023 - Release 1.5.0

New features

Enhancements

Bug fixes

27/07/2023 - Release 1.4.1

Bug fixes

21/07/2023 - Release 1.4.0

New features

Enhancements

Bug fixes

26/05/2023 - Release 1.3.2

Enhancements

Bug fixes

26/05/2023 - Release 1.3.1

Bug fixes

25/05/2023 - Release 1.3.0

New features

Enhancements

Bug fixes

04/05/2023 - Release 1.2.1

Bug fixes

20/04/2023 - Release 1.2.0

New features

Enhancements

Bug fixes

28/03/2023 - Release 1.1.1

Bug fixes

23/03/2023 - Release 1.1.0

New features

Enhancements

Bug fixes

24/02/2023 - Release v1.0.0

New features

Enhancements

Bug fixes

27/01/2023 - Release v0.5.0

New features

Enhancements

Deprecated

Removed

Bug fixes

Security

06/12/2022 - Release v0.4.0.1

New features

Enhancements

Deprecated

Removed

Bug fixes

Security