Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reflexion for improvement #42

Closed
tchaton opened this issue Nov 13, 2019 · 9 comments
Closed

Reflexion for improvement #42

tchaton opened this issue Nov 13, 2019 · 9 comments

Comments

@tchaton
Copy link

tchaton commented Nov 13, 2019

Dear @HuguesTHOMAS,

I was wondering if you were going to keep working on improving the model.

I think KPConv could be improved at least with 3 ways.

  1. The convolution definition doesn't use the distance to the central point within its formula.

Screenshot from 2019-11-13 09-03-43

I could have been (a): g(yi) = sum (g(norm(yi) / R) * h(yi, xk) * Wk) or (b): g(yi) = sum (g(yi)* h(yi, xk) * Wk))
(a): g takes a value between 0 and 1 and performs a guidance on it.
(b): g takes yi directly and does the same

It will be similar to this one, as an analogy for image convolution.
Screenshot from 2019-11-13 09-02-00

  1. Sigma could be variable or the density of each point could be used as in

Screenshot from 2019-11-13 09-08-18
Screenshot from 2019-11-13 09-09-05

@HuguesTHOMAS
Copy link
Owner

Hi @tchaton,

Thank you for your interest in KPConv. I agree with you that the model could still be improved. However, the modification you proposed would not be beneficial in my opinion.

  1. Adding a term depending on the distance to the center?

As one of the kernel points is already forced to be at the center. We already have a weight that depends on this distance. Now, if I look at the "pixel adaptive convolution", I believe they add a dependency to the feature located at the center, and not the distance to the center. in the figure you used, K depends on f(0,0) and not on (0,0). So the equation you propose is totally different from the example in image you gave.

  1. Sigma variable / depending on the input density?

Using a variable sigma has no sense in our convolution definition. Here is why. The input point cloud is subsampled so that its density is uniform. For example, let's imagine all input points are spaced by 1cm approximately. Then sigma is also chosen as 1cm or a bit more. So that the input weights of each kernel point are applied only to a few input points, ideally just to one. This is so that we have the smallest kernel possible like the 3x3 pixel kernels in images. With this kind of kernels any kind of normalization (like the one in Monte Carlo) is useless.

Now that being said. i do not think our work is perfect and there are many ways to look for improvement. If you want my opinion on that, I think one of the most promising direction is the relation between feature aggregation and geometric encoding. Any point convolution can be fed features or a constant value. If you feed a constant value, then the convolution is going to encode the shape of the input and if you feed features, the convolution is going to aggregate them into a more complex feature. In standard network, the first layer does geometric encoding and the next layers feature aggregation. But you could imagine ways to improve that.

Best,
Hugues

@XuyangBai
Copy link

Hi @HuguesTHOMAS

Great explanation! I have some ideas on modification 2 (but sorry if my example is not suitable.) . For CAD datasets(like ModelNet40) or very dense indoor scenes(S3DIS, Semantic3D, etc.), it is true that the density is uniform if you use grid subsampling strategy, but if using KPConv for outdoor scenes like KITTI, there is still density variations even under grid subsampling because the regions far away from Lidar center is very sparse. Besides, I think without normalisation on the number of neighbouring points in the local neighbourhood, KPConv might not able to directly be trained and tested on a different first_subsampling_dl.

Best,
Xuyang.

@tchaton
Copy link
Author

tchaton commented Nov 27, 2019

@HuguesTHOMAS, thanks for your answer.

  1. Density. You might be right, but I am so amazed by tsne/umap ability to perform dimensionality reduction that having a fixed one seemed wierd to me. I was thinking we could use their defintion of the local density or even conditional probalities. Lot of things could be tried. For example: perform the radius search using this adaptative standard deviation from the points to the kernel weights. The weights will be defined locally to every N neighboring points of the center. We could perform nested Relation Shape Encoding. One time for creating the weights, msg for the kernels points to the neighboring points, and a second time to project them on the center of the sphere. Or the weights for the message passing could be using this density. Things to brainstorm there.

Screenshot_2019-11-27-21-01-20-158_com Slack
Screenshot_2019-11-27-21-01-24-040_com Slack

  1. I have been working on ECC a lot. The best convolution I got for was perfoming attention first between weights and features, and use the message transformed through an MLP to either perform Hadamard product (similar to Relation Shape w/o attention) or matrice product. It got me 4 points up in miou over s3dis.

@tchaton tchaton closed this as completed Nov 27, 2019
@tchaton tchaton reopened this Nov 27, 2019
@tchaton
Copy link
Author

tchaton commented Nov 27, 2019

@HuguesTHOMAS, last thing. It would be great to export a much cleaner version for KPConv to pytorch.It would be amazing to have it implemented within pytorch geometric. Do you have any plan doing so ? Or would you like to participate in its implementation ?

NB: I don t think right now we have good enough models for production. Getting around 75-80 miou will allow to automate so much tasks. The current API of KPConv makes it hard, especially with the tensorflow / cuda troubles. And therefore improvement are limited. I think it is the best model out there and it deserved a lean implementation for everyone to use.

@HuguesTHOMAS
Copy link
Owner

Hi guys and thx for your interest in this work,

I will try to answer all your questions

@HuguesTHOMAS
Copy link
Owner

HuguesTHOMAS commented Nov 28, 2019

To @XuyangBai:
You are right, when dealing with varying densities, we have to take two problems into account:

  • The high densities: too many points means long computations
  • The low densities : too few points means no information

And in my answer I only explained how the subsampling deals with the first problem. Now what with the second problem? Lets take the example of the image below (from my first article)

image

We can see that the density in neighborhood C is too low at the first scale. With only one or two points, you cannot get any information on the scene geometry. If you add normalization, it won't change anything because the information is simply not there, you need to look at bigger scales. Because our network has multiple layers with increasing scales, the geometry of the sparser regions is still going to be detected in the network, just not in the first layer.

In practice, I have two examples to show that our method is very robust to density variations:

  • On Semantic3D (fixed laser scans with very sparse points far away from the laser), we have this problem and our performance are still very strong.
  • On SemanticKitti (the example you took) same problem and we still rank amongst the first methods on their benchmark

Eventually, training and testing with different first_subsampling_dl is not a problem since the ratio between convolution radius and grid size is kept the same. I don't think adding normalization would help, as the grid already normalizes the data for a given radius.

@HuguesTHOMAS
Copy link
Owner

To @tchaton,

I agree with you, a lot could be tried to perform convolutions on point clouds with adaptive neighborhoods. The ideas you describe are similar to the convolutions described in Dynamic Graph CNN for Learning on Point Clouds. They let the network learn its own neighborhood relations and it helps getting long ranged relationships between similar patterns in the point cloud. Using dimension reduction techniques to help such a network to learn the best relations possible would definitely constitute a strong base for an article. Furthermore, such high level operation might need to be placed in the last layers of the network, like deformable KPConv. As you said many things to brainstorm here.

In the case of KPConv, the whole point of the method is to understand that geometric relationship need to be carefully used if you want a spatially consistent convolutions and a "standard way" to convolve point clouds. Adding such enhancement, even with kernel point like KPConv, would just make it a different method in my opinion, which would deserve a different name Because it is not a minor adjustment, but a very big theoretical difference. I can only encourage you to look into such direction and would be very glad to help and even collaborate if you want to.

This lead me to your second message. I totally agree, a clean version of KPConv on pytorch would be great. Unfortunately, I never used pytorch and don't know its subtleties. And the code of KPConv needs a lot of implementation tricks to run efficiently. I would be happy to participate in its implementation, but without help, I won't have the time to do it.

Best,
Hugues

@tchaton
Copy link
Author

tchaton commented Nov 28, 2019

@HuguesTHOMAS, I believe I add you on Facebook. So we can talk easily about all that.

@XuyangBai
Copy link

@HuguesTHOMAS Thanks a lot. Problem brought by varying densities is really an interest point, your explanation make sense to me! I will take a look at your first paper and think it.

Best,
Xuyang.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants