Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PSTNET on SemanticKitti/nuScenes dataset #2

Closed
sandeepnmenon opened this issue May 26, 2021 · 6 comments
Closed

PSTNET on SemanticKitti/nuScenes dataset #2

sandeepnmenon opened this issue May 26, 2021 · 6 comments

Comments

@sandeepnmenon
Copy link

Great work on the Point Tubes. I am particularly interested in the 4D semantic segmentation applications.
I was wondering if you tried the PSTNet on benchmark datasets like Semantic Kitti or nuScenes dataset.
These pointcloud sequences are much more sparse than the SYNTHIA dataset.

Thank you

@hehefan
Copy link
Owner

hehefan commented Jun 3, 2021

Hi @sandeepnmenon,

Apologies for my late response. Thanks for your suggestions. However, we currently do not have a plan to apply our method to Semantic Kitti or nuScenes datasets.

Our method focuses on spatio-temporal modeling and especially temporal modeling. There have already been a lot of excellent static point cloud approaches. We would like to pay attention to different temporal structures. The sparse or dense problem is mainly about spatial modeling.

PSTNet is actually a prototype that models point cloud sequences/videos in a decomposition manner. The spatial modeling method can be directly replaced with other static point cloud approaches which are good at sparse point cloud modelling. We may try these two datasets in the future with different PSTNet variants or extensions.

Thank you.

@sandeepnmenon
Copy link
Author

Thank you @hehefan for the insights. I would like to try sequence classification with those two datasets using PSTConv.
In the given model for sequence classification


I see all layers are PSTConv.
As per your suggestion

The spatial modeling method can be directly replaced with other static point cloud approaches which are good at sparse point cloud modelling.

I would like to use my static point cloud model along with the temporal modeling mentioned in this paper. What part of the MSRAction model is the spatial modelling method.
Can you give a small example of how the PSTConv can be augmented to static point cloud classification pipelines to make them model the temporal information?

Thank you

@hehefan
Copy link
Owner

hehefan commented Jun 11, 2021

Hi @sandeepnmenon,

You might want to modify the following section
https://github.com/hehefan/Point-Spatio-Temporal-Convolution/blob/main/modules/pst_convolutions.py#L168-L193
This section aims to search neighbours and then encode the spatial local structure. You can replace the encoding logics with your code.

Best regards.

@sandeepnmenon
Copy link
Author

Hi @hehefan
There is the experiment on 4D semantic segmentation mentioned in the paper. Will that model be released?
Since my semantic segmentation spatial model also follows a Unet architecture, I am not sure how I will incorporate just the spatial features in the above mentioned code in the PSTConv.

@hehefan
Copy link
Owner

hehefan commented Jun 19, 2021

Hi @sandeepnmenon,

PSTConv is a basic module to capture the spatio-temporal local structure for point cloud sequences or videos. It is independent of the specific PSTNet architectures for 3D action recognition or 4D semantic segmentation. For the segmentation architecture, please refer to point_segmentation.py. This architecture may provide insights into how to build UNet-style frameworks.

BTW, for segmentation, the transformer-based network ("Point 4D Transformer Networks for Spatio-Temporal Modeling in Point Cloud Videos") seems to work better than the point spatio-temporal convolution. This is probably because convolution is rigid for edges or borders of objects, while transformer is flexible.

Best regards.

@sandeepnmenon
Copy link
Author

Thank you @hehefan . The code and that paper is really helpful.
Closing this issue.

PS: Is it possible to release the code for 4D semantic segmentation using the P4Transformer? Started a thread in that repo (hehefan/P4Transformer#4)

Thank you again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants