Skip to content

jaaack-wang/rnn-seq2seq-learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

The repository contains the source code, data, and some of the experimental notebooks for the paper accepted at the 16 ICGI, titled Learning Transductions and Alignments with RNN Seq2seq Models. Through unified training/evaluation conditions and comprehensive experiments, I compared the learning results across tasks, different model configurations, and generalization performance etc., and highlighted factors that influence the learning capabilities and generalization capacity of RNN seq2seq models.

For more details or everything related to the experiments, including data, training logs, trained models, and experimental results, please check this project folder On Google Drive or this OSF project, which contains a complete copy of everything.

Transduction Tasks

For a given string, the four transduction tasks can be defined as follows:

  • Identity: the given string itself. Ex: abc ===> abc
  • Reversal: the reverse of the given string. Ex: abc ===> cba
  • Total reduplication: two copies of the given string. Ex: abc ===> abcabc
  • Quadratic copying: make n copies of the given string of length n. Ex: abc ===> abcabcabc

These four functions have been traditionally studied under the viewpoint of Finite State Tranducers (FSTs) and characterized accordingly. The FST-theoretic characterizations propose the following complexity hierarchy for these tasks: quadratic copying (polyregular function) > total reduplication (regular) > reversal (regular) > identity (rational).

Learning Input-target Alignments

RNN seq2seq models take an encoder-decoder structure, where the decoder only “writes” after the encoder “reads” all the input symbols, unlike the read-and-write operation seen in FSTs.

The following figure shows the conjectured mechanism for RNN seq2seq models learning identity and reversal functions. Other two functions can be learnt in a similar manner, and input specified reduplication additionally requires counting.

Basic Findings

  • Generalization abilities: RNN seq2seq models, attentional or not, are prone to learning a function that fits the training or in-distribution data. Their out-of-distribution generalization abilities are highly limited. In other words, they are not learning the underlying data generation functions, probably because of the inherent limitation of auto-regressive models.

    • Please note that, task complexity is strongly tied to the structure of the learner. For example, RNNs of few hundred parameters can easily learn the function of identity whereas RNNs seq2seq models cannot. See: RNNs-learn-identity.
  • Attention: makes learning alignment between input and target sequences significantly more efficient and robust, but does not overcome the out-of-distribution generalization limitation.

  • Task complexity: for attention-less RNN seq2seq models, it is empirically found that quadratic copying > total reduplication > identity > reversal. The relative complexity between identity and reversal is due to the long-term dependency learning issue that comes with RNNs trained with gradient descent and backpropagation.

  • RNN variants: The effect of RNN variants on the seq2seq models is a complicated one and interacts with other factors, e.g., attention and the task to learn. Generally, GRU and LSTM are generally more expressive than SRNN. SRNN cannot count.

The following figure shows full-sequence accuracy (on unseen examples) per input length across the four tasks for the three types of RNN seq2seq models, where only input lengths 6-15 are seen during training.

Reproduce the results

To reproduce the results, simply download everything in the project folder, upload the folder to your Google Drive, and re-run the notebook in notebooks in a GPU runtime. Alternatively, you can also save the project folder to your Google Drive by clicking the "Add shortcut to Drive" button. In doing so, you should be able to run any of these notebooks successfully, but the results will not be saved on your Google Drive (unless you download the entire folder and upload it to your Google Drive).

It is recommened that you subscribe to Google Pro+ in order to reproduce the results.

Citation

@misc{zw2023rnn-seq2seq,
  doi = {10.48550/ARXIV.2303.06841},
  url = {https://arxiv.org/abs/2303.06841},
  author = {Wang, Zhengxiang},
  keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
  title = {Learning Transductions and Alignments with RNN Seq2seq Models},
  publisher = {arXiv},
  year = {2023},
  copyright = {Creative Commons Attribution 4.0 International}
}

Reuse the code

  • The project folder provides a very neat but customizable pipeline to conduct experiments using RNN seq2seq models. Just make sure that your data is also saved in an identical format as the one provided inside the data folder.

  • If you prefer highly customized codebase to run experiments on command line, consider the following two repositories of mine:

About

RNN seq2seq models learning transductions and alignments

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published