References

bahdanau2015: D. Bahdanau, K. Cho, and Y. Bengio, "Neural machine translation by jointly learning to align and translate.," in 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015.
bengio2013: Y. Bengio, N. Léonard, and A. C. Courville, "Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation.," CoRR, vol. abs/1308.3432, 2013, [Online]. Available: http://arxiv.org/abs/1308.3432
bert2019: J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, "BERT: Pre-training of deep bidirectional Transformers for language understanding," Minneapolis, USA, 2019, vol. 1, pp. 4171-4186. [Online]. Available: https://aclweb.org/anthology/papers/N/N19/N19-1423/
chan2016: W. Chan, N. Jaitly, Q. V. Le, and O. Vinyals, "Listen, Attend and Spell: A neural network for Large Vocabulary Conversational Speech Recognition," Mar. 2016, pp. 4960-4964. doi: 10.1109/ICASSP.2016.7472621.
chen1994: X.-H. Chen, A. P. Dempster, and J. S. Liu, "Weighted finite population sampling to maximize entropy," Biometrika, vol. 81, no. 3, pp. 457-69, 1994, doi: 10.2307/2337119.
cho2014: K. Cho et al., "Learning phrase representations using RNN Encoder-Decoder for Statistical Machine Translation," Doha, Qatar, 2014, pp. 1724--1734. [Online]. Available: https://www.aclweb.org/anthology/D14-1179
fan1962: C. T. Fan, M. E. Muller, and I. Rezucha, "Development of sampling plans by using sequential (item by item) selection techniques and digital computers," vol. 57, no. 298, pp. 387-402, Jun. 1962, doi: 10.1080/01621459.1962.10480667.
grathwohl2017: W. Grathwohl, D. Choi, Y. Wu, G. Roeder, and D. K. Duvenaud, "Backpropagation through the Void: Optimizing control variates for black-box gradient estimation," CoRR, vol. abs/1711.00123, 2017.
graves2006: A. Graves, S. Fernández, F. Gomez, and J. Schmidhuber, "Connectionist Temporal Classification: Labelling unsegmented sequence data with recurrent neural networks," New York, NY, USA, 2006, pp. 369-376. doi: 10.1145/1143844.1143891.
gulcehre2015: Ç. Gülçehre et al., "On using monolingual corpora in neural machine translation," CoRR, vol. abs/1503.03535, 2015, [Online]. Available: http://arxiv.org/abs/1503.03535
heafield2011: K. Heafield, "KenLM: Faster and smaller language model queries," in Proceedings of the Sixth Workshop on Statistical Machine Translation, Edinburgh, Scotland, 2011, pp. 187-197.
howard1972: S. Howard, "Discussion on Professor Cox's paper," Journal of the Royal Statistical Society, vol. 34, no. 2, pp. 210-211, Jan. 1972, doi: 10.1111/j.2517-6161.1972.tb00900.x.
luong2015: T. Luong, H. Pham, and C. D. Manning, "Effective approaches to attention-based neural machine translation," in Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, 2015, pp. 1412-1421.
mengerson1996: K. L. Mengersen and R. L. Tweedie, "Rates of convergence of the Hastings and Metropolis algorithms," The Annals of Statistics, vol. 24, no. 1, pp. 101-121, Feb. 1996, doi: 10.1214/aos/1033066201.
mikolov2010: T. Mikolov, M. Karafiát, L. Burget, J. Černocký, and S. Khudanpur, "Recurrent neural network based language model," presented at Interspeech, Makuhari, Japan, 2010.
park2019: D. S. Park et al., "SpecAugment: A simple data augmentation method for automatic speech recognition," in Proc. Interspeech, 2019, pp. 2613-2617, doi: 10.21437/Interspeech.2019-2680.
park2020: D. S. Park et al., "Specaugment on large scale datasets," May 2020, pp. 6879-6883, doi: 10.1109/ICASSP40776.2020.9053205.
prabhavalkar2018: R. Prabhavalkar et al., "Minimum Word Error Rate Training for Attention-Based Sequence-to-Sequence Models," presented at the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018, pp. 4839-4843.
sabour2018: S. Sabour, W. Chan, and M. Norouzi, "Optimal Completion Distillation for Sequence Learning," CoRR, vol. abs/1810.01398, 2018.
tucker2017: G. Tucker, A. Mnih, C. J. Maddison, J. Lawson, and J. Sohl-Dickstein, "REBAR: Low-variance, unbiased gradient estimates for discrete latent variable models," in Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, Eds. Curran Associates, Inc., 2017, pp. 2627-2636.
vaswani2017: A. Vaswani et al., "Attention is all you beed," in Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, Eds. Curran Associates, Inc., 2017, pp. 5998-6008.
williams1992: R. J. Williams, "Simple statistical gradient-following algorithms for connectionist reinforcement learning," Machine Learning, vol. 8, no. 3, pp. 229-256, May 1992.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

references.rst

references.rst

References

Files

references.rst

Latest commit

History

references.rst

File metadata and controls

References