[Datumaro] Add dataset statistics #1668

zhiltsov-max · 2020-06-08T11:33:58Z

Motivation and context

stats outputs json files
fixed mean and std for empty datasets
Added new metrics for the stats command:
- Image count, annotations count, unannotated images
- Per-label annotations count, histogram
- Per-type annotations count
- Per-label attributes counts, values and their histogram (like asked in Implement extended annotations statistics #1783)
- Per-label segment pixel count and histogram (i.e. class balance for segmentations)
- Segment (bbox, polygons, masks) areas histogram

Example output:

{
    "annotations": {
        "labels": {
            "attributes": {
                "gender": {
                    "count": 358,
                    "distribution": {
                        "female": [
                            149,
                            0.41620111731843573
                        ],
                        "male": [
                            209,
                            0.5837988826815642
                        ]
                    },
                    "values count": 2,
                    "values present": [
                        "female",
                        "male"
                    ]
                },
                "view": {
                    "count": 340,
                    "distribution": {
                        "__undefined__": [
                            4,
                            0.011764705882352941
                        ],
                        "front": [
                            54,
                            0.1588235294117647
                        ],
                        "left": [
                            14,
                            0.041176470588235294
                        ],
                        "rear": [
                            235,
                            0.6911764705882353
                        ],
                        "right": [
                            33,
                            0.09705882352941177
                        ]
                    },
                    "values count": 5,
                    "values present": [
                        "__undefined__",
                        "front",
                        "left",
                        "rear",
                        "right"
                    ]
                }
            },
            "count": 2038,
            "distribution": {
                "car": [
                    340,
                    0.16683022571148184
                ],
                "cyclist": [
                    194,
                    0.09519136408243375
                ],
                "head": [
                    354,
                    0.17369970559371933
                ],
                "ignore": [
                    100,
                    0.04906771344455348
                ],
                "left_hand": [
                    238,
                    0.11678115799803729
                ],
                "person": [
                    358,
                    0.17566241413150147
                ],
                "right_hand": [
                    77,
                    0.037782139352306184
                ],
                "road_arrows": [
                    326,
                    0.15996074582924436
                ],
                "traffic_sign": [
                    51,
                    0.025024533856722278
                ]
            }
        },
        "segments": {
            "area distribution": [
                {
                    "count": 1318,
                    "max": 11425.1,
                    "min": 0.0,
                    "percent": 0.9627465303140978
                },
                {
                    "count": 1,
                    "max": 22850.2,
                    "min": 11425.1,
                    "percent": 0.0007304601899196494
                },
                {
                    "count": 0,
                    "max": 34275.3,
                    "min": 22850.2,
                    "percent": 0.0
                },
                {
                    "count": 0,
                    "max": 45700.4,
                    "min": 34275.3,
                    "percent": 0.0
                },
                {
                    "count": 0,
                    "max": 57125.5,
                    "min": 45700.4,
                    "percent": 0.0
                },
                {
                    "count": 0,
                    "max": 68550.6,
                    "min": 57125.5,
                    "percent": 0.0
                },
                {
                    "count": 0,
                    "max": 79975.7,
                    "min": 68550.6,
                    "percent": 0.0
                },
                {
                    "count": 0,
                    "max": 91400.8,
                    "min": 79975.7,
                    "percent": 0.0
                },
                {
                    "count": 0,
                    "max": 102825.90000000001,
                    "min": 91400.8,
                    "percent": 0.0
                },
                {
                    "count": 50,
                    "max": 114251.0,
                    "min": 102825.90000000001,
                    "percent": 0.036523009495982466
                }
            ],
            "avg. area": 5411.624543462382,
            "pixel distribution": {
                "car": [
                    13655,
                    0.0018431496518735067
                ],
                "cyclist": [
                    939005,
                    0.12674674030446592
                ],
                "head": [
                    0,
                    0.0
                ],
                "ignore": [
                    5501200,
                    0.7425510702956085
                ],
                "left_hand": [
                    0,
                    0.0
                ],
                "person": [
                    954654,
                    0.12885903974805205
                ],
                "right_hand": [
                    0,
                    0.0
                ],
                "road_arrows": [
                    0,
                    0.0
                ],
                "traffic_sign": [
                    0,
                    0.0
                ]
            }
        }
    },
    "annotations by type": {
        "bbox": {
            "count": 548
        },
        "caption": {
            "count": 0
        },
        "label": {
            "count": 0
        },
        "mask": {
            "count": 0
        },
        "points": {
            "count": 669
        },
        "polygon": {
            "count": 821
        },
        "polyline": {
            "count": 0
        }
    },
    "annotations count": 2038,
    "dataset": {
        "image mean": [
            107.06903686941979,
            79.12831698580979,
            52.95829558185416
        ],
        "image std": [
            49.40237673503467,
            43.29600731496902,
            35.47373007603151
        ],
        "images count": 100
    },
    "images count": 100,
    "subsets": {},
    "unannotated images": [
        "img00051",
        "img00052",
        "img00053",
        "img00054",
        "img00055",
        "img00056",
        "img00057",
        "img00058",
        "img00059",
        "img00060",
        "img00061",
        "img00062",
        "img00063",
        "img00064",
        "img00065",
        "img00066",
        "img00067",
        "img00068",
        "img00069",
        "img00070",
        "img00071",
        "img00072",
        "img00073",
        "img00074",
        "img00075",
        "img00076",
        "img00077",
        "img00078",
        "img00079",
        "img00080",
        "img00081",
        "img00082",
        "img00083",
        "img00084",
        "img00085",
        "img00086",
        "img00087",
        "img00088",
        "img00089",
        "img00090",
        "img00091",
        "img00092",
        "img00093",
        "img00094",
        "img00095",
        "img00096",
        "img00097",
        "img00098",
        "img00099",
        "img00100"
    ],
    "unannotated images count": 50
}

How has this been tested?

Unit tests

How to test:

datum project import -i <path> -f <format>
datum project stats -p <project/path>

Checklist

I submit my changes into the develop branch
I have added description of my changes into CHANGELOG file
I have updated the documentation accordingly
I have added tests to cover my changes
I have linked related issues (read github docs)
I have increased versions of npm packages if it is necessary (cvat-canvas,
cvat-core, cvat-data and cvat-ui)

License

I submit my code changes under the same MIT License that covers the project.
Feel free to contact the maintainers if that's a concern.
I have updated the license header for each file (see an example below)

# Copyright (C) 2020 Intel Corporation
#
# SPDX-License-Identifier: MIT

coveralls · 2020-06-08T11:50:44Z

Pull Request Test Coverage Report for Build 6838

58 of 87 (66.67%) changed or added relevant lines in 2 files are covered.
1 unchanged line in 1 file lost coverage.
Overall coverage increased (+0.1%) to 69.036%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
datumaro/datumaro/cli/contexts/project/init.py	1	9	11.11%
datumaro/datumaro/components/operations.py	57	78	73.08%

Files with Coverage Reduction	New Missed Lines	%
cvat/apps/engine/media_extractors.py	1	77.29%

Totals
Change from base Build 6836:	0.1%
Covered Lines:	11211
Relevant Lines:	15745

💛 - Coveralls

nmanovic · 2020-06-24T15:02:44Z

@zhiltsov-max , could you please point me on a ready to use model (bin, xml, py)? How to integrate/use the feature in CVAT?

zhiltsov-max · 2020-06-24T16:35:10Z

You can either:

use an accuracy checker model. There is a model at datumaro/tests/assets/pytorch_launcher, where you need to adjust a few paths to be absolute or just available from the point of invocation. You need to install accuracy-checker with pip from their repo (https://github.com/opencv/open_model_zoo/tree/master/tools/accuracy_checker).
use an OpenVINO model. There is no examples, you should provide an xml, bin, and interpeter.py. There is a number of models in OMZ available, any detector would fit.
imagine you launched a model and obtained a project. You can specify a path another project to compare with datum quality --inference-path.

At this moment there is no interface in CVAT, but this is a topic for further development.

nmanovic · 2020-07-13T04:18:06Z

@zhiltsov-max , could you please resolve conflicts?
@azhavoro, could you please look at the PR and test it?

nmanovic · 2020-07-14T07:39:03Z

Here is an overview of what got changed by this pull request:

Complexity increasing per file
==============================
- datumaro/datumaro/cli/commands/quality.py  8
- datumaro/datumaro/components/operations.py  18

See the complete overview on Codacy

nmanovic

LGTM

zhiltsov-max changed the title ~~[Datumaro] Add basic dataset quality estimation~~ [Dependent] [Datumaro] Add basic dataset quality estimation Jun 9, 2020

zhiltsov-max requested a review from nmanovic as a code owner June 10, 2020 17:32

zhiltsov-max changed the title ~~[Dependent] [Datumaro] Add basic dataset quality estimation~~ [Datumaro] Add basic dataset quality estimation Jun 24, 2020

nmanovic requested a review from azhavoro June 25, 2020 14:07

azhavoro previously approved these changes Jun 26, 2020

View reviewed changes

zhiltsov-max dismissed azhavoro’s stale review via d365b71 July 9, 2020 15:23

zhiltsov-max changed the title ~~[Datumaro] Add basic dataset quality estimation~~ [Datumaro] Add dataset statistics Jul 10, 2020

azhavoro previously approved these changes Jul 14, 2020

View reviewed changes

zhiltsov-max dismissed azhavoro’s stale review via 9a6817d July 14, 2020 07:36

azhavoro previously approved these changes Jul 14, 2020

View reviewed changes

zhiltsov-max added 5 commits August 3, 2020 12:02

Add statistics command

0e7a85f

Add tests

a9b8232

Update changelog

bef0927

fix test

0974d88

handle image absence

cc0fb8d

zhiltsov-max dismissed azhavoro’s stale review via cc0fb8d August 3, 2020 09:11

zhiltsov-max force-pushed the zm/ann-quality-main branch from 9a6817d to cc0fb8d Compare August 3, 2020 09:11

zhiltsov-max requested a review from azhavoro August 3, 2020 09:14

Merge branch 'develop' into zm/ann-quality-main

98daedc

nmanovic approved these changes Aug 7, 2020

View reviewed changes

nmanovic merged commit eaeb67d into develop Aug 7, 2020

nmanovic deleted the zm/ann-quality-main branch August 7, 2020 19:18

snyk-bot mentioned this pull request Apr 14, 2021

[Snyk] Upgrade react-redux from 7.2.2 to 7.2.3 #3089

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Datumaro] Add dataset statistics #1668

[Datumaro] Add dataset statistics #1668

zhiltsov-max commented Jun 8, 2020 •

edited

Loading

coveralls commented Jun 8, 2020 •

edited

Loading

nmanovic commented Jun 24, 2020

zhiltsov-max commented Jun 24, 2020 •

edited

Loading

nmanovic commented Jul 13, 2020

nmanovic commented Jul 14, 2020

nmanovic left a comment

[Datumaro] Add dataset statistics #1668

[Datumaro] Add dataset statistics #1668

Conversation

zhiltsov-max commented Jun 8, 2020 • edited Loading

Motivation and context

How has this been tested?

Checklist

License

coveralls commented Jun 8, 2020 • edited Loading

Pull Request Test Coverage Report for Build 6838

💛 - Coveralls

nmanovic commented Jun 24, 2020

zhiltsov-max commented Jun 24, 2020 • edited Loading

nmanovic commented Jul 13, 2020

nmanovic commented Jul 14, 2020

nmanovic left a comment

Choose a reason for hiding this comment

zhiltsov-max commented Jun 8, 2020 •

edited

Loading

coveralls commented Jun 8, 2020 •

edited

Loading

zhiltsov-max commented Jun 24, 2020 •

edited

Loading