Comparison of S4 with stateful transformers #48

andrewliu2001 · 2022-06-24T04:07:25Z

Hi,

Wu et al. recently published a paper on Memorizing Transformers (transformers with states/memory), which extends their perceptive field to unbounded contexts (https://www.youtube.com/watch?v=5AoOpFFjW28&list=PL0NRmB0fnLJQJ3fuIk3yVULtm6_JnQ_zI, https://arxiv.org/abs/2203.08913). I am curious to hear what you think about how S4/Sashimi might compare with this new transformer model. My hunch is that S4 might be theoretically similar if you use the exponential measure density.

albertfgu · 2022-06-24T16:49:46Z

This model seems somewhat different from S4 in spirit. The memorization mechanism seems more similar to the line of work on memory augmented neural networks (MANN), where the memory mechanisms are based on heuristic memory banks. In comparison, S4's mechanism has a precise mathematical interpretation of function reconstruction. I do think that S4's mechanism might not be sufficient for all settings and some form of memory augmentation could be useful.

For future questions that are not related to this codebase, please send an email instead.

albertfgu closed this as completed Jun 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comparison of S4 with stateful transformers #48

Comparison of S4 with stateful transformers #48

andrewliu2001 commented Jun 24, 2022

albertfgu commented Jun 24, 2022

Comparison of S4 with stateful transformers #48

Comparison of S4 with stateful transformers #48

Comments

andrewliu2001 commented Jun 24, 2022

albertfgu commented Jun 24, 2022