Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Voxel Size #56

Open
Steven-m2ai opened this issue Aug 5, 2022 · 8 comments
Open

Voxel Size #56

Steven-m2ai opened this issue Aug 5, 2022 · 8 comments

Comments

@Steven-m2ai
Copy link

Hello,

I am experimenting with a custom dataset on ImVoxelNet.
My dataset is ~2000 images, and i am running into extreme overfitting issues. For example the prediction on the validation image is in the same pattern as some of the training image predictions.

I was looking through what could be the case, I guess I could try playing with the lr and scheduler. however, I was also looking into voxel size and number. Do you think this could have any affect on the outcome? Any other advice? Thanks!

@filaPro
Copy link
Contributor

filaPro commented Aug 5, 2022

Hi @Steven-m2ai ,

The overfitting issues are strange from my point of view, 2000 image should be enough. Can you share your config, train/val metrics, some info about dataset (indoor/outdoor, number of classes, ...)?

I think number of voxels and the voxel size are important for model quality, but it is probably not connected with overfitting. Smaller voxels may lead to better accuracy, but much more memory for 3d convolutions.

Are you sure your projection matrices are fine? You can somehow visualize that your 3d object centers are projected to 2d object centers here.

You can also try smaller model to prevent overfitting, e.g. ResNet50 -> ResNet18.

@Steven-m2ai
Copy link
Author

Steven-m2ai commented Aug 5, 2022

Hello @filaPro,

Thank you for your response. A little about the dataset:
~2000 images take from 10 different video streams (~200 image frames per video). I only have one class to consider. The dataset is also Indoor. (If you do not mind, could I email you the extra details about the dataset?)

Okay I understand that the voxel size is important for quality, since the 3D convolution size relies on them. The projection matrices should be fine since I plotted all the ground truths of my dataset and they seem to be good.

I haven't tried playing with the learning rate, or smaller model yet. Maybe this is a good road to take.

The Config File

model = dict(
    type='ImVoxelNet',
    pretrained='torchvision://resnet50',
    backbone=dict(
        type='ResNet',
        depth=50,
        num_stages=4,
        out_indices=(0, 1, 2, 3),
        frozen_stages=1,
        norm_cfg=dict(type='BN', requires_grad=False),
        norm_eval=True,
        style='pytorch'),
    neck=dict(
        type='FPN',
        in_channels=[256, 512, 1024, 2048],
        out_channels=256,
        num_outs=4),
    neck_3d=dict(
        type='FastIndoorImVoxelNeck',
        in_channels=256,
        out_channels=128,
        n_blocks=[1, 1, 1]),
    bbox_head=dict(
        type='SunRgbdImVoxelHeadV2',
        n_classes=10,
        n_channels=128,
        n_reg_outs=7,
        n_scales=3,
        limit=27,
        centerness_topk=18),
    n_voxels=(40, 40, 16),                                          # number of voxels : CAN CHANGE [original: (40, 40, 16)]
    voxel_size=(0.16, 0.16, 0.16))                                     # size of voxels : CAN CHANGE [original: (0.16, 0.16, 0.16)]  0.05, 0.0325, 0.1375
train_cfg = dict()
test_cfg = dict(
    nms_pre=1000,
    nms_thr=.15,
    use_rotate_nms=True,
    score_thr=0.05)
img_norm_cfg = dict(mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)

dataset_type = 'SunRgbdMultiViewDataset'
data_root = 'data/dataset1/'

class_names = ('box',)

train_pipeline = [
    dict(type='LoadAnnotations3D'),
    dict(
        type='MultiViewPipeline',
        n_images=1,
        transforms=[
            dict(type='LoadImageFromFile'),
            dict(type='RandomFlip', flip_ratio=0.5),
            dict(type='Resize', img_scale=[(512, 384), (768, 576)], multiscale_mode='range', keep_ratio=True),      # data augmentation: CAN CHANGE
            dict(type='Normalize', **img_norm_cfg),
            dict(type='Pad', size_divisor=32)]),
    dict(type='SunRgbdRandomFlip'),
    dict(type='DefaultFormatBundle3D', class_names=class_names),
    dict(type='Collect3D', keys=['img', 'gt_bboxes_3d', 'gt_labels_3d'])]
test_pipeline = [
    dict(
        type='MultiViewPipeline',
        n_images=1,
        transforms=[
            dict(type='LoadImageFromFile'),
            dict(type='Resize', img_scale=(640, 480), keep_ratio=True),
            dict(type='Normalize', **img_norm_cfg),
            dict(type='Pad', size_divisor=32)]),
    dict(type='DefaultFormatBundle3D', class_names=class_names, with_label=False),
    dict(type='Collect3D', keys=['img'])]
data = dict(
    samples_per_gpu=4,
    workers_per_gpu=4,
    train=dict(
        type='RepeatDataset',
        times=2,
        dataset=dict(
            type=dataset_type,
            data_root=data_root,
            ann_file=data_root + 'sunrgbd_infos_train.pkl',
            pipeline=train_pipeline,
            classes=class_names,
            filter_empty_gt=True,
            box_type_3d='Depth')),
    val=dict(
        type=dataset_type,
        data_root=data_root,
        ann_file=data_root + 'sunrgbd_infos_val.pkl',
        pipeline=test_pipeline,
        classes=class_names,
        test_mode=True,
        box_type_3d='Depth'),
    test=dict(
        type=dataset_type,
        data_root=data_root,
        ann_file=data_root + 'sunrgbd_infos_val.pkl',
        pipeline=test_pipeline,
        classes=class_names,
        test_mode=True,
        box_type_3d='Depth'))

optimizer = dict(                                                               # optimizer: CAN CHANGE
    type='AdamW',
    lr=0.0001,
    weight_decay=0.0001,
    paramwise_cfg=dict(
        custom_keys={'backbone': dict(lr_mult=0.1, decay_mult=1.0)}))
optimizer_config = dict(grad_clip=dict(max_norm=35., norm_type=2))
lr_config = dict(policy='step', step=[8, 11])

total_epochs = 36                                                              # epochs: CAN CHANGE (original:12)

checkpoint_config = dict(interval=1, max_keep_ckpts=1)
log_config = dict(
    interval=50,
    hooks=[
        dict(type='TextLoggerHook'),
        dict(type='TensorboardLoggerHook')
    ])
evaluation = dict(interval=1)
dist_params = dict(backend='nccl')
find_unused_parameters = True  # todo: fix number of FPN outputs
log_level = 'INFO'
load_from = None
resume_from = None
workflow = [('train', 1)]

@filaPro
Copy link
Contributor

filaPro commented Aug 5, 2022

If you do not mind, could I email you the extra details about the dataset?

I think yes, if there is something more to share.

The projection matrices should be fine since I plotted all the ground truths

Have you plotted with our visualization functions? SUN RGB-D axis order or smth may differ from your coordinate system...

I only have one class to consider

You should set n_classes to 1.

So, what are your metrics on train / val? Are you able to achieve 100% accuracy on small subset of train?

@Steven-m2ai
Copy link
Author

Yes, I will send you an email with more information.

Yes, I plotted using by using the same method that you use. I simply modify your visualization function to show ground truth instead of predictions.

Yes, I have set n_classes to 1

I am able to achieve a super high mAP (0.8) on a test set that is a subset of the training, but when using a new unseen set, i get very low (0.24) mAP.

I will send you more information for extra visualizations via email.
Thank you for your time

@filaPro
Copy link
Contributor

filaPro commented Aug 5, 2022

Yes, I have set n_classes to 1

But in your config it is 10. However it is not important.

I saw your images. It must be much better on test for 2000 images in train. I thing something is wrong with the training or inference. Can you check without SunRgbdRandomFlip? May be it is wrong for your data. Just comment this line and the line with RandomFlip.

@Steven-m2ai
Copy link
Author

Steven-m2ai commented Aug 5, 2022

oh yes what i mean is I just changed it to 1 in my config when you pointed it out. Yes I believe that wouldn't solve the overfitting but thanks for catching that.

i am training without SunRgbdRandomFlip and RandomFlip augmentation. So far, the mAP seems to be stagnant at 0.24 +/- 0.05 as it trains from epoch 1 to 12. Maybe this means that those augmentations work for my case.

Do you think maybe the issue is the low diversity in the dataset? i.e. the frames do not look very different from each other, thus maybe there becomes a data imbalance?

@filaPro
Copy link
Contributor

filaPro commented Aug 7, 2022

Hard to help you here :( Does reducing the model size help?

@Steven-m2ai
Copy link
Author

Steven-m2ai commented Aug 9, 2022

Hello,

Yes i guess this is a hard problem to debug. I will keep thinking about this.

Conceptually, is it true that for the indoor head, there is no concept of "anchor boxes"? Rather each voxel acts as a center point, in which we use the ground truth min/max in each dimension to get the delta(x,y,z)? Then these deltas are the targets the model tries to predict?

Perhaps I can open a separate issue for conceptual questions, i am very interested in your work. i would love to really understand the pipeline implemented here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants