Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

question about the network architecture for set transformer #3

Closed
amiltonwong opened this issue Feb 16, 2020 · 3 comments
Closed

question about the network architecture for set transformer #3

amiltonwong opened this issue Feb 16, 2020 · 3 comments

Comments

@amiltonwong
Copy link

amiltonwong commented Feb 16, 2020

Hi, @yoonholee ,

Thanks a lot for adding the code for the point cloud part. After looking into the network part, it shows that SAB modules are not included in decoder part? Is that the reason of increased time complexity when appending SAB modules to enhance the expressiveness of representations ? It seems that the classification accuracy will be increased by doing so. Had you performed the related experiments?

THX!

@yoonholee
Copy link
Collaborator

I tried architectures made of various combinations of [SAB, ISAB, PMA, fc, dropout], including using SAB in the decoder, for this task when we were writing the paper. Using no SAB in the decoder worked best in my experiments.

My best guess for why this simple decoder worked best here is that classification is such a simple task that we don't need to model many high-order interactions to achieve good accuracy. Set Transformers seem to be most useful in complex (in terms of interactions among items) tasks such as clustering.

@amiltonwong
Copy link
Author

amiltonwong commented Feb 16, 2020

@yoonholee ,

Thanks for your reply. One further question, had you also tried increasing the numbers of ISAB layers (e.g. more than two) in the encoder, as the original Transformer model uses 6 layers of MAB in its encoder part? Any observed different performances or memory limitations?

THX!

@yoonholee
Copy link
Collaborator

Yes, we shared building blocks but tuned the architectures to suit each specific task. Using two worked best in point cloud classification, but we used 3 ISABs in clustering and 4 SABs in the anomaly detection experiment (see appendix).

Memory was never an issue when using ISABs (see fig 1 of the appendix); the problem was mainly that larger architectures take longer to optimize and/or overfitted. I personally think that if this is somehow solved, stacking more attention-based blocks should yield better results across the board.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants