Add feature `with_offset` for rawframe dataset #48

kennymckormick · 2020-07-23T04:50:16Z

add feature 'with offset' for rawframe dataset, need this feature to train recognition models on untrimmed datasets

codecov · 2020-07-23T05:21:17Z

Codecov Report

Merging #48 into master will increase coverage by 0.13%.
The diff coverage is 93.15%.

@@            Coverage Diff             @@
##           master      #48      +/-   ##
==========================================
+ Coverage   84.87%   85.00%   +0.13%     
==========================================
  Files          73       73              
  Lines        3827     3889      +62     
  Branches      618      629      +11     
==========================================
+ Hits         3248     3306      +58     
- Misses        480      482       +2     
- Partials       99      101       +2

Flag	Coverage Δ
#unittests	`85.00% <93.15%> (+0.13%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
mmaction/models/backbones/resnet3d.py	`89.69% <76.47%> (-0.89%)`	⬇️
mmaction/datasets/rawframe_dataset.py	`87.36% <95.23%> (+2.00%)`	⬆️
mmaction/datasets/pipelines/loading.py	`93.13% <100.00%> (+0.04%)`	⬆️
mmaction/models/backbones/resnet3d_slowfast.py	`87.97% <100.00%> (+0.07%)`	⬆️
mmaction/models/backbones/resnet_tsm.py	`95.08% <100.00%> (+1.67%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7dc99cf...cedfbd2. Read the comment docs.

innerlee · 2020-07-23T05:23:30Z

tests/data/frame_test_list_with_offset.txt

@@ -0,0 +1,2 @@
+test_imgs 2 5 127


are you defining a new dataset annotation format?

what does each column mean?

I add an optional format for ann_file, instead of 'frame_dir num_frame label[s]', it can be 'frame_dir start_idx num_frame label[s]'. My code is compatible with original format.

what num_frame in this context mean? the length of the video or the length of the segment?

The length of the video clip (which is part of the entire untrimmed video).

I think we need a doc page to describe all possible annotation formats we used. @dreamerlin

We can prepare a annotation_description.md in docs/ to describe all possible annotation formats.

mmaction/datasets/pipelines/loading.py

mmaction/datasets/rawframe_dataset.py

innerlee · 2020-07-24T02:16:10Z

I'm thinking on some short descriptor of the annotation file which resembles the channel order description.

Say we let

X: video
Y: label
S: snippet start index (begin with 0, this number is included)
D: duration of snippet
L: length of video

Then this can be represented as " XSDY". Notice the starting space , which represents the delimiter.
Other formats could be ",XLY", meaning comma delimited, giving video name, video length and label.

For one based frame datasets, one can add an extra column S=1, with format " VSDL".
We can deprecate the many options like start_index, etc.

cc. @dreamerlin @hellock @kennymckormick

and @JoannaLXY , could this fit into localization tasks? Anything else needed to be added?

kennymckormick · 2020-07-24T10:43:57Z

I'm thinking on some short descriptor of the annotation file which resembles the channel order description.

Say we let

X: video

Y: label

S: snippet start index (begin with 0, this number is included)

D: duration of snippet

L: length of video

Then this can be represented as " XSDY". Notice the starting space , which represents the delimiter.
Other formats could be ",XLY", meaning comma delimited, giving video name, video length and label.

For one based frame datasets, one can add an extra column S=1, with format " VSDL".
We can deprecate the many options like start_index, etc.

cc. @dreamerlin @hellock @kennymckormick

and @JoannaLXY , could this fit into localization tasks? Anything else needed to be added?

That's a good idea, where should we place the description?

innerlee · 2020-07-24T12:08:49Z

Proposal 2

A second thought, maybe we have an easier solution, that is to unify all recognition (single class) annotation to
video start_index length class

This should cover all use cases, right?

kennymckormick · 2020-07-24T12:33:13Z

Proposal 2

A second thought, maybe we have an easier solution, that is to unify all recognition (single class) annotation to
video start_index length class

This should cover all use cases, right?

I think the current design is OK since:

To unify all recognition (single class) annotation, you need to change a lot of codes, like data preprocessing, dataset, etc.
The current code is both forward compatible and backward compatible, so we don't need to make extra efforts in unifying annotations. And think about that case: maybe we need to add many unforeseen features in the future. For example, to support HVU, we need to write a dataset that supports multiple attributes (with action class, scene class, sentiment class, etc.). At that time, the efforts made now may become worthless。

innerlee · 2020-07-24T12:38:38Z

The current design is not ideal because it is stacking many switches. What's worse is that those switches are not independent.

A better design is needed. Any idea?

kennymckormick · 2020-07-25T08:56:23Z

@hellock @dreamerlin Any good idea about annotation design for recognition? I think json which contains a list of dictionaries (each represents a sample) is a good choice. Each dictionary looks like dict(frame_dir='path', num_frame=100, label=1), etc. . I think this design is good since it can be extended easily to your need.

hellock · 2020-07-26T14:29:48Z

I suggest refactoring as early as possible. When there are more users, the backward compatibility issue will be a technical debt. We can define a primary annotation format described in a json file, which will be more extensible.

kennymckormick · 2020-07-28T09:11:30Z

I suggest refactoring as early as possible. When there are more users, the backward compatibility issue will be a technical debt. We can define a primary annotation format described in a json file, which will be more extensible.

You mean we can support both txtlist and json, while using json as the primary format?

innerlee · 2020-07-29T08:16:22Z

Let's merge it for now, and leave the unification of data annotation to future.

HaodongDuan added 5 commits July 23, 2020 12:45

add test list

d027aa4

add code for with_offset

16c438f

add test for with_offset

e5ce3c7

reorder, load_anno need with_offset

f22a498

fix bug in test_dataset

dfeb84a

innerlee reviewed Jul 23, 2020

View reviewed changes