question about the network architecture for set transformer #3

amiltonwong · 2020-02-16T03:27:53Z

Thanks a lot for adding the code for the point cloud part. After looking into the network part, it shows that SAB modules are not included in decoder part? Is that the reason of increased time complexity when appending SAB modules to enhance the expressiveness of representations ? It seems that the classification accuracy will be increased by doing so. Had you performed the related experiments?

THX!

yoonholee · 2020-02-16T05:49:53Z

I tried architectures made of various combinations of [SAB, ISAB, PMA, fc, dropout], including using SAB in the decoder, for this task when we were writing the paper. Using no SAB in the decoder worked best in my experiments.

My best guess for why this simple decoder worked best here is that classification is such a simple task that we don't need to model many high-order interactions to achieve good accuracy. Set Transformers seem to be most useful in complex (in terms of interactions among items) tasks such as clustering.

amiltonwong · 2020-02-16T08:59:55Z

@yoonholee ,

Thanks for your reply. One further question, had you also tried increasing the numbers of ISAB layers (e.g. more than two) in the encoder, as the original Transformer model uses 6 layers of MAB in its encoder part? Any observed different performances or memory limitations?

THX!

yoonholee · 2020-02-16T14:02:45Z

Yes, we shared building blocks but tuned the architectures to suit each specific task. Using two worked best in point cloud classification, but we used 3 ISABs in clustering and 4 SABs in the anomaly detection experiment (see appendix).

Memory was never an issue when using ISABs (see fig 1 of the appendix); the problem was mainly that larger architectures take longer to optimize and/or overfitted. I personally think that if this is somehow solved, stacking more attention-based blocks should yield better results across the board.

juho-lee closed this as completed Nov 26, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

question about the network architecture for set transformer #3

question about the network architecture for set transformer #3

amiltonwong commented Feb 16, 2020 •

edited

yoonholee commented Feb 16, 2020

amiltonwong commented Feb 16, 2020 •

edited

yoonholee commented Feb 16, 2020

question about the network architecture for set transformer #3

question about the network architecture for set transformer #3

Comments

amiltonwong commented Feb 16, 2020 • edited

yoonholee commented Feb 16, 2020

amiltonwong commented Feb 16, 2020 • edited

yoonholee commented Feb 16, 2020

amiltonwong commented Feb 16, 2020 •

edited

amiltonwong commented Feb 16, 2020 •

edited