Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some questions about the hyperparameters #15

Closed
qsisi opened this issue Jun 4, 2022 · 7 comments
Closed

Some questions about the hyperparameters #15

qsisi opened this issue Jun 4, 2022 · 7 comments

Comments

@qsisi
Copy link

qsisi commented Jun 4, 2022

Hello! Here are some questions for the code part on dataset 3DMatch.

  1. The dimensionality of the output from the KPConv backbone is set to 528, theoretically, any number that could be divided by 6 and 4 with no remainder is feasible here? Because in my understanding, divided by 6 is for the rotary positional embedding, and divided by 4 is for the multi-head attention, so why not choose 516 as the output dimensionality? Is there any reason behind this choice?
  2. The code part of rotary positional embedding you implemented is impressive, I noticed that you first voxelize the raw 3d coordinates (but not flooring the output coordinates) to scale the 3d coordinates, then I believe the vol_bnds = [-3.6, -2.4, 1.14] is the minimal coordinates among all 3DMatch train&val&test set? also, does voxelizing(or called scaling) the coordinates before positional embedding could get better results than just utilizing the raw 3d coordinates for positional embedding? Could you give some hints about it?

Thank you very much for your help.

@rabbityl
Copy link
Owner

rabbityl commented Jun 5, 2022

Hi.

  1. The reason is that we think the position code with the same frequency should fall into the same head of the transformer, therefore the feature needs to be divided by 24 (4*6).
  2. The voxel size controls the starting frequencies of the position code. Lower freq. lead to smoother signals that can reflect long-range distance, and higher freq. lead to fluctuating signals that reflect short-range distance.
    Using the raw coordinate lead to freq. that is too high. Fig. 2 in this paper is intuitive https://arxiv.org/pdf/2104.06405.pdf

@rabbityl rabbityl closed this as completed Jun 5, 2022
@qsisi
Copy link
Author

qsisi commented Jun 5, 2022

Thank you for your prompt reply.

Sorry to check again, you mean, by applying voxelization, the voxelized coordinates could lead to a higher freq, resulting smoother signals, right?

Also, does the vol_bnds = [-3.6, -2.4, 1.14] denotes the minimal coordinates among all 3DMatch train&val&test set?

Thanks.

@rabbityl
Copy link
Owner

rabbityl commented Jun 5, 2022

Sorry, I wrote in the wrong way. Voxelization with 0.04 m lead to Lower freq. and smoother signals.

Yes it is the min coordinate.
The vol_bnd means to cancel global translation. This is not neccesarry for rotary positional encoding as it always relveals relative distance. But could affect absolute encoding such as sinusodial.

@qsisi
Copy link
Author

qsisi commented Jun 5, 2022

Thanks.

Now I get your point, by voxelizing, the raw coordinates such as [0.3900, 0.9669, 0.7839] are scaled to [0.3900, 0.9669, 0.7839] / 0.08 = [ 4.875 , 12.08625, 9.79875], which has lower freq, leading to smoother signals.

Also, the voxel_size setting for 3DMatch seems to be 0.08m instead of 0.04m?

@qsisi
Copy link
Author

qsisi commented Jun 28, 2022

Sorry to bother you again, may I ask how to get the exact vol_bnds = [-3.6, -2.4, 1.14] for 3DMatch? because when I iterate through the train&val&test set of 3DMatch(provided by PREDATOR), the min coordinates of them turn out to be [-1.5, -1.5, 0.5], could you give some hints about it?

@Yang-L1
Copy link

Yang-L1 commented Jun 28, 2022

I remember [-3.6, -2.4, 1.14] was from 4DMatch.
Our positional encoding is relative, i.e. change the starting point does not affect the position encoding.
Therefore, the vol_bnds is not a crutial prams, you can use any number for it.

@qsisi
Copy link
Author

qsisi commented Jun 28, 2022

Thanks for your reply, indeed those boundaries are not crucial in relative positional encoding, I'm currently trying the sparse convolution library which needs min coordinates over all input point clouds, so the min coordinates calculated by myself do not consistent with that in lepard confuses me, not it make senses, anyway, thanks for your reply.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants