Large gap in reproducing ListOps results  with  S4-model

Hi, thanks for releasing the code and paper!

I am a newbie.I carefully read your paper and experiments, and tried to follow the reported hyperparameters. I am currently reproducing the S4 results on the LRA ListOps task. I implemented the S4 network myself, where each S4-block consists of:

S4 layer → activation → linear layer → activation

I used 6 such blocks, followed by average pooling and then a classification head. However, after training, the training set accuracy reaches about 59%, while the test set accuracy is only 51%, showing obvious overfitting. Additionally, when the dropout rate is adjusted to 0.2-0.3, the loss function tends to explode (become NaN).It also seems that using tanh as the activation function performs slightly better.

Could you please share more details about the experimental setup that are important for reproducing the reported results?

Thanks a lot!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Large gap in reproducing ListOps results with S4-model #166

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Large gap in reproducing ListOps results with S4-model #166

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions