Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Input shape dimensions = C x T x H x W ? #16

Open
darshvirbelandis opened this issue Sep 8, 2022 · 2 comments
Open

Input shape dimensions = C x T x H x W ? #16

darshvirbelandis opened this issue Sep 8, 2022 · 2 comments

Comments

@darshvirbelandis
Copy link

darshvirbelandis commented Sep 8, 2022

Channel x Time(or NumFrames) x Height x Width

I am attempting to load my model in the following format

Question 1:

How do input 16 frames of size 456x456 into the EfficientNet model?

I am trying to classify 16 frame snippets from video clips.

    #load model
    from efficientnet_pytorch_3d import EfficientNet3D

    model_EfficientNet3D = EfficientNet3D.from_name("efficientnet-b7", in_channels=3)
    summary(model_EfficientNet3D, input_size=(3, 16, 456, 456))

I have 16 images I want to send into the EfficientNet3D, is this possible?

A similar comment was made by @shijianjian here : #11 (comment)

"Say, change from Conv3D(kernel_size=(3, 3, 3)) to Conv3D(kernel_size=(1, 3, 3))will probably work for your case."

I am very lost here because I dont understand where to actually change this code.

I cant even find this specific code in the model file: Conv3D(kernel_size=(3, 3, 3))

Also since I am using the pip install efficientnet-pytorch for this EfficientNet3D, I am having trouble understanding how to manipulate the actual model code since its a pip install.

If I was to manually load the efficientnet-pytorch model with PyTorch, where and how would I be able to load the model weights?

Please help me in any way, this is a wonderful project and I am grateful for the contribution. Just need a bit of support on loading the model.

Question 2.

How can I use this 3D-EfficientNet model as a backbone feature extractor? I would need to export features at a certain layer instead of getting a final classification.

Thanks so much !!

@darshvirbelandis darshvirbelandis changed the title Input shape dimensions = N x C x T x H x W ? Input shape dimensions = C x T x H x W ? Sep 9, 2022
@shijianjian
Copy link
Owner

self._conv_stem = Conv3d(in_channels, out_channels, kernel_size=3, stride=2, bias=False)

Here, kernel_size=3 means kernel_size=(3, ,3, 3). You may update the corresponding code. Same for pooling layers, etc, if you need.

Plus, I do not think there is a pretrained 3D model in the wild now. If you have one, you have to make sure the model architecture is as same as in this repo. It will be a bit overwhelm if you did not train your model with this repo.

@shijianjian
Copy link
Owner

The input shape shall be BxCxDxHxW, where D means depth.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants