Dataset Generation #2

XiongweiWu · 2020-08-10T04:05:29Z

Three questions:

In 1_split_filter.py#L46-L48, to my point, sampled image should not contain objects in voc classes. However, this implementation seems only the image with tiny objects will be excluded;
In 2_balance.py#L57, each category only contains no more than 80 instances?
How to generate final_split_voc_10_shot_instances_train2017.json ?

fanq15 · 2020-08-10T04:15:22Z

pick non-voc class
80 is the minimum instance number in each class
You can use the given final_split_voc_10_shot_instances_train2017.json in the new_annotations dir for a fair comparison.

XiongweiWu · 2020-08-10T04:33:20Z

So in your non-voc set, the images may also contain voc class instance (but not labeled) ?
It seems that you first compute the total number of instance per class across all images stored in 'all_cls_dict', and then for each image, if one contained instance category number is less than 80 in 'all_cls_dict', then save all instances in this image for training, otherwise discard all the instances and remove the instances whose number is larger than 80. I am a bit confused about this file.
Can u provide 30-shots json file?

fanq15 · 2020-08-10T04:49:48Z

Yes. The voc instances are ignored.
About the 2_balance.py:
2.1. Yes, it should be the instance number per class. I fixed the expression in the former answer.
2.2. There is a bug in the 2_balance.py and it actually does not balance the categories. But this bug does not affect the training and evaluation. I will fix this bug and see if the image balance can improve the performance.
There is no 30-shot json file currently. I will add it later.

XiongweiWu closed this as completed Aug 10, 2020

Provide feedback