Input shape dimensions = C x T x H x W ? #16

darshvirbelandis · 2022-09-08T21:28:25Z

Channel x Time(or NumFrames) x Height x Width

I am attempting to load my model in the following format

Question 1:

How do input 16 frames of size 456x456 into the EfficientNet model?

I am trying to classify 16 frame snippets from video clips.

    #load model
    from efficientnet_pytorch_3d import EfficientNet3D

    model_EfficientNet3D = EfficientNet3D.from_name("efficientnet-b7", in_channels=3)
    summary(model_EfficientNet3D, input_size=(3, 16, 456, 456))

I have 16 images I want to send into the EfficientNet3D, is this possible?

A similar comment was made by @shijianjian here : #11 (comment)

"Say, change from Conv3D(kernel_size=(3, 3, 3)) to Conv3D(kernel_size=(1, 3, 3))will probably work for your case."

I am very lost here because I dont understand where to actually change this code.

I cant even find this specific code in the model file: Conv3D(kernel_size=(3, 3, 3))

Also since I am using the pip install efficientnet-pytorch for this EfficientNet3D, I am having trouble understanding how to manipulate the actual model code since its a pip install.

If I was to manually load the efficientnet-pytorch model with PyTorch, where and how would I be able to load the model weights?

Please help me in any way, this is a wonderful project and I am grateful for the contribution. Just need a bit of support on loading the model.

Question 2.

How can I use this 3D-EfficientNet model as a backbone feature extractor? I would need to export features at a certain layer instead of getting a final classification.

Thanks so much !!

shijianjian · 2022-10-11T09:09:39Z

EfficientNet-PyTorch-3D/efficientnet_pytorch_3d/model.py

Line 129 in 3e79bcd

    
           self._conv_stem = Conv3d(in_channels, out_channels, kernel_size=3, stride=2, bias=False)

Here, kernel_size=3 means kernel_size=(3, ,3, 3). You may update the corresponding code. Same for pooling layers, etc, if you need.

Plus, I do not think there is a pretrained 3D model in the wild now. If you have one, you have to make sure the model architecture is as same as in this repo. It will be a bit overwhelm if you did not train your model with this repo.

shijianjian · 2022-10-11T09:10:16Z

The input shape shall be BxCxDxHxW, where D means depth.

darshvirbelandis mentioned this issue Sep 9, 2022

Input size different to 1x64x64x64 throws Exception #11

Open

darshvirbelandis changed the title ~~Input shape dimensions = N x C x T x H x W ?~~ Input shape dimensions = C x T x H x W ? Sep 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Input shape dimensions = C x T x H x W ? #16

Input shape dimensions = C x T x H x W ? #16

darshvirbelandis commented Sep 8, 2022 •

edited

Loading

shijianjian commented Oct 11, 2022

shijianjian commented Oct 11, 2022

Input shape dimensions = C x T x H x W ? #16

Input shape dimensions = C x T x H x W ? #16

Comments

darshvirbelandis commented Sep 8, 2022 • edited Loading

Question 1:

Question 2.

shijianjian commented Oct 11, 2022

shijianjian commented Oct 11, 2022

darshvirbelandis commented Sep 8, 2022 •

edited

Loading