Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fewview_train subset JSON contain frames that belong in both of train and test sets #49

Closed
zhizdev opened this issue Aug 2, 2022 · 2 comments

Comments

@zhizdev
Copy link

zhizdev commented Aug 2, 2022

I am trying to use the CO3Dv2 dataset, however, I ran into some weird issues with the set_lists/set_lists_fewview_train.json fewview train JSON subset lists.

As defined in `co3d.implicitron.dataset.json_index_dataset_map_provider_v2.py' line 104, each JSON file should contain the following structure:

Each `set_lists_<subset_name_l>.json` file contains the following dictionary:
{
    "train": [
        (sequence_name: str, frame_number: int, image_path: str),
        ...
    ],
    "val": [
        (sequence_name: str, frame_number: int, image_path: str),
        ...
    ],
    "test": [
        (sequence_name: str, frame_number: int, image_path: str),
        ...
    ],
}

In the case of the tv, hydrant, donut (and I believe all) categories, in set_lists_fewview_train.json, all of the frames (image_path) under "train" are also under "test".

However, set_lists_fewview_dev.json and set_lists_fewview_test.json contain clearly separated "train" and "test" frames.

I am not sure if this behavior is a design choice or a bug. My goal to is train a model only on the training set, and not the dev or test sets. What would be the correct JSON subset list and subset to use?

@zhizdev zhizdev changed the title fewview_train subset JSON contain frames that belong in all of train, val, and test sets fewview_train subset JSON contain frames that belong in both of train and test sets Aug 2, 2022
@davnov134
Copy link
Contributor

Hello, this is by design.

Tl;dr: Indeed, using the train setlist of set_lists_fewview_train is the best way to train your few-view model.

In more detail, all frames within a category are separated to 6 sets <sequence_set>_<seen|unknown>, i.e.:

train_unseen
train_known
dev_unseen
dev_known
test_unseen
test_known

The set_lists_fewview_*.json set lists are defined as follows:

set_lists_fewview_train: {
    "train": train_known,
    "val": train_known + train_unseen,
    "test": train_known + train_unseen,
}
set_lists_fewview_dev: {
    "train": train_known,
    "val": dev_known + dev_unseen,
    "test": dev_known + dev_unseen,
}
set_lists_fewview_test: {
    "train": train_known,
    "val": dev_known + dev_unseen,
    "test": test_known + test_unseen,
}

For your case specifically, the train setlist of set_lists_fewview_train contains only the train_known frames which should be used for training. However, the val setlist of set_lists_fewview_train contains train_known but ALSO train_unseen. This is why you see that all frames from train are also in val.

The "val" set contains also the "train" views because, when validating/testing, one needs to have access to the "known" source views (from the train set) in order to be able to generate the unseen views. This requires both known and unseen views to live in the same set of loaded images.

Indeed, if you inspect the eval_batches files, you will discover that the first (target) frame in an eval batch is always drawn from the unseen set of frames, while the rest of the frames comes from the known frames.

In order to find out which frames are known/unseen, feel free to inspect the meta.frame_type fields in frame_annotations.jgz.

I hope this helps, let me know if further clarification is needed.

@zhizdev
Copy link
Author

zhizdev commented Aug 3, 2022

Thank you so much for the reply! This is super helpful!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants