Tensors must have same number of dimensions : got 5 and 3 #238

satwiksunnam19 · 2022-09-21T10:05:24Z

Hello @lucidrains @stevenwalton
I have been trying to implement the standard ViT in 3d space and I have worked on some part of code ViT changed the Rearrange in patch embedding to as follows
Rearrange('b e (h p1) (w p2) (d p3) -> b (e p1 p2 p3) h w d',p1=patch_size,p2=patch_size,p3=patch_size) and this patch embbeddings are passed to map with cls_tokens cls_tokens = repeat(self.cls_token, '() n e -> b n e', b=b) which throws an error due to dimensionality mismatch so how can i change the shape of cls_tokens to match the dimensionality of the patch_embeddings.

can you help me for getting solution to this problem
Thanks & Regards
Satwik Sunnam

The text was updated successfully, but these errors were encountered:

stevenwalton · 2022-09-22T22:33:48Z

CCT is only supporting 2D images and I don't think Phil has 3D ViTs incorporated into the library. Luckily they aren't that difficult to write from scratch. I'm not very familiar with voxel based networks but a quick Google search shows some 3D ViT githubs and a few survey papers. So I would look at those to see how they transform the data.

lucidrains · 2022-10-17T16:21:39Z

@satwiksunnam19 let me know if this helps https://github.com/lucidrains/vit-pytorch#3d-vit

@stevenwalton would be happy to offer a 3D CCT version, if you haven't completed it already Satwik

satwiksunnam19 · 2022-10-18T00:54:47Z

@satwiksunnam19 let me know if this helps https://github.com/lucidrains/vit-pytorch#3d-vit

@stevenwalton would be happy to offer a 3D CCT version, if you haven't completed it already Satwik

@lucidrains I have completed the vit-3d model and it's working good and i can add my vit-3d project to your GitHub , if you're willing me to do so.

lucidrains · 2022-10-18T17:01:27Z

@satwiksunnam19 that would be a great contribution! 🙏

lucidrains · 2022-10-29T18:34:03Z

@satwiksunnam19 i went ahead and added it, thank you for reporting that it works well!

0.38.1

satwiksunnam19 · 2022-11-24T21:39:55Z

Hello @lucidrains @stevenwalton
To handle 2D images, we reshape the image x ∈ R
H×W×C into a sequence of flattened 2D patches xp ∈ RN×(P2·C), where (H, W) is the resolution of the original
image, C is the number of channels, (P, P) is the resolution of each image patch and N = HW/P2
is the resulting number of patches. This is the case of 2d model

Can you change the perspective to the 3d model and explain how the flattened patches and there sizes are determined? I'm confused at this part. please respond to this ASAP.

stevenwalton · 2022-11-25T22:33:54Z

I think you're having trouble understanding the tokenization process for ViTs in general. CCT isn't that complicated (our work's main motivation is how simple changes can do a lot) and simply patch and embeds in a single action (allowing for better embedding) than working with specific patches. Compare CCT to ViT. The flattening is happening because we're creating a pixel space domain. Channels x Height x Width -> Channels x Pixels. As per the original ViT paper..

If you also look at the 3D code you'll notice Phil only changes a few lines (i.e. Conv2D -> Conv3D and adding frames). The frames are just another dimension in the tensor, and in this case the frames are counted in the pixel space (now frame-pixel space). The process is quite similar if you can just pay attention to these minor changes. Phil took the time to make this code very readable, explicitly specifying which variables are frame related. But you'll also need to update your equation to incorporate the frames and how many you're looking at for a given "patch".

please respond to this ASAP.

Forgive us. We have research of our own to perform and busy work schedules that we also need to address at high priority.

satwiksunnam19 · 2022-11-26T01:14:36Z

Thanks @lucidrains @stevenwalton

lucidrains added a commit that referenced this issue Oct 29, 2022

add a 3d version of cct, addressing #238

61450ae

lucidrains closed this as completed Oct 29, 2022

lucidrains added a commit that referenced this issue Oct 29, 2022

add a 3d version of cct, addressing #238

ad1e6df

0.38.1

lucidrains added a commit that referenced this issue Oct 29, 2022

add a 3d version of cct, addressing #238 0.38.1

cb6d749

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tensors must have same number of dimensions : got 5 and 3 #238

Tensors must have same number of dimensions : got 5 and 3 #238

satwiksunnam19 commented Sep 21, 2022

stevenwalton commented Sep 22, 2022

lucidrains commented Oct 17, 2022

satwiksunnam19 commented Oct 18, 2022

lucidrains commented Oct 18, 2022

lucidrains commented Oct 29, 2022

satwiksunnam19 commented Nov 24, 2022

stevenwalton commented Nov 25, 2022

satwiksunnam19 commented Nov 26, 2022

Tensors must have same number of dimensions : got 5 and 3 #238

Tensors must have same number of dimensions : got 5 and 3 #238

Comments

satwiksunnam19 commented Sep 21, 2022

stevenwalton commented Sep 22, 2022

lucidrains commented Oct 17, 2022

satwiksunnam19 commented Oct 18, 2022

lucidrains commented Oct 18, 2022

lucidrains commented Oct 29, 2022

satwiksunnam19 commented Nov 24, 2022

stevenwalton commented Nov 25, 2022

satwiksunnam19 commented Nov 26, 2022