Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tensors must have same number of dimensions : got 5 and 3 #238

Closed
satwiksunnam19 opened this issue Sep 21, 2022 · 8 comments
Closed

Tensors must have same number of dimensions : got 5 and 3 #238

satwiksunnam19 opened this issue Sep 21, 2022 · 8 comments

Comments

@satwiksunnam19
Copy link

Hello @lucidrains @stevenwalton
I have been trying to implement the standard ViT in 3d space and I have worked on some part of code ViT changed the Rearrange in patch embedding to as follows
Rearrange('b e (h p1) (w p2) (d p3) -> b (e p1 p2 p3) h w d',p1=patch_size,p2=patch_size,p3=patch_size) and this patch embbeddings are passed to map with cls_tokens cls_tokens = repeat(self.cls_token, '() n e -> b n e', b=b) which throws an error due to dimensionality mismatch so how can i change the shape of cls_tokens to match the dimensionality of the patch_embeddings.

can you help me for getting solution to this problem
Thanks & Regards
Satwik Sunnam

@stevenwalton
Copy link
Contributor

CCT is only supporting 2D images and I don't think Phil has 3D ViTs incorporated into the library. Luckily they aren't that difficult to write from scratch. I'm not very familiar with voxel based networks but a quick Google search shows some 3D ViT githubs and a few survey papers. So I would look at those to see how they transform the data.

@lucidrains
Copy link
Owner

@satwiksunnam19 let me know if this helps https://github.com/lucidrains/vit-pytorch#3d-vit

@stevenwalton would be happy to offer a 3D CCT version, if you haven't completed it already Satwik

@satwiksunnam19
Copy link
Author

@satwiksunnam19 let me know if this helps https://github.com/lucidrains/vit-pytorch#3d-vit

@stevenwalton would be happy to offer a 3D CCT version, if you haven't completed it already Satwik

@lucidrains I have completed the vit-3d model and it's working good and i can add my vit-3d project to your GitHub , if you're willing me to do so.

@lucidrains
Copy link
Owner

@satwiksunnam19 that would be a great contribution! 🙏

@lucidrains
Copy link
Owner

@satwiksunnam19 i went ahead and added it, thank you for reporting that it works well!

@satwiksunnam19
Copy link
Author

Hello @lucidrains @stevenwalton
To handle 2D images, we reshape the image x ∈ R
H×W×C into a sequence of flattened 2D patches xp ∈ RN×(P2·C), where (H, W) is the resolution of the original
image, C is the number of channels, (P, P) is the resolution of each image patch and N = HW/P2
is the resulting number of patches. This is the case of 2d model

Can you change the perspective to the 3d model and explain how the flattened patches and there sizes are determined? I'm confused at this part. please respond to this ASAP.

@stevenwalton
Copy link
Contributor

I think you're having trouble understanding the tokenization process for ViTs in general. CCT isn't that complicated (our work's main motivation is how simple changes can do a lot) and simply patch and embeds in a single action (allowing for better embedding) than working with specific patches. Compare CCT to ViT. The flattening is happening because we're creating a pixel space domain. Channels x Height x Width -> Channels x Pixels. As per the original ViT paper..
image

If you also look at the 3D code you'll notice Phil only changes a few lines (i.e. Conv2D -> Conv3D and adding frames). The frames are just another dimension in the tensor, and in this case the frames are counted in the pixel space (now frame-pixel space). The process is quite similar if you can just pay attention to these minor changes. Phil took the time to make this code very readable, explicitly specifying which variables are frame related. But you'll also need to update your equation to incorporate the frames and how many you're looking at for a given "patch".

please respond to this ASAP.

Forgive us. We have research of our own to perform and busy work schedules that we also need to address at high priority.

@satwiksunnam19
Copy link
Author

Thanks @lucidrains @stevenwalton

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants