New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PSTNET on SemanticKitti/nuScenes dataset #2
Comments
Hi @sandeepnmenon, Apologies for my late response. Thanks for your suggestions. However, we currently do not have a plan to apply our method to Semantic Kitti or nuScenes datasets. Our method focuses on spatio-temporal modeling and especially temporal modeling. There have already been a lot of excellent static point cloud approaches. We would like to pay attention to different temporal structures. The sparse or dense problem is mainly about spatial modeling. PSTNet is actually a prototype that models point cloud sequences/videos in a decomposition manner. The spatial modeling method can be directly replaced with other static point cloud approaches which are good at sparse point cloud modelling. We may try these two datasets in the future with different PSTNet variants or extensions. Thank you. |
Thank you @hehefan for the insights. I would like to try sequence classification with those two datasets using PSTConv.
I see all layers are PSTConv. As per your suggestion
I would like to use my static point cloud model along with the temporal modeling mentioned in this paper. What part of the MSRAction model is the spatial modelling method. Thank you |
Hi @sandeepnmenon, You might want to modify the following section Best regards. |
Hi @hehefan |
Hi @sandeepnmenon, PSTConv is a basic module to capture the spatio-temporal local structure for point cloud sequences or videos. It is independent of the specific PSTNet architectures for 3D action recognition or 4D semantic segmentation. For the segmentation architecture, please refer to point_segmentation.py. This architecture may provide insights into how to build UNet-style frameworks. BTW, for segmentation, the transformer-based network ("Point 4D Transformer Networks for Spatio-Temporal Modeling in Point Cloud Videos") seems to work better than the point spatio-temporal convolution. This is probably because convolution is rigid for edges or borders of objects, while transformer is flexible. Best regards. |
Thank you @hehefan . The code and that paper is really helpful. PS: Is it possible to release the code for 4D semantic segmentation using the P4Transformer? Started a thread in that repo (hehefan/P4Transformer#4) Thank you again. |
Great work on the Point Tubes. I am particularly interested in the 4D semantic segmentation applications.
I was wondering if you tried the PSTNet on benchmark datasets like Semantic Kitti or nuScenes dataset.
These pointcloud sequences are much more sparse than the SYNTHIA dataset.
Thank you
The text was updated successfully, but these errors were encountered: