Skip to content

Latest commit

 

History

History
1137 lines (898 loc) · 38.2 KB

PUBLICATIONS.md

File metadata and controls

1137 lines (898 loc) · 38.2 KB

List of publications using Lingvo.

Translation

[1] Y. Wu, M. Schuster, Z. Chen, Q. V. Le, M. Norouzi, W. Macherey, M. Krikun, Y. Cao, Q. Gao, K. Macherey, J. Klingner, A. Shah, M. Johnson, X. Liu, L. Kaiser, S. Gouws, Y. Kato, T. Kudo, H. Kazawa, K. Stevens, G. Kurian, N. Patil, W. Wang, C. Young, J. Smith, J. Riesa, A. Rudnick, O. Vinyals, G. Corrado, M. Hughes, and J. Dean, “Google's neural machine translation system: Bridging the gap between human and machine translation,” tech. rep., 2016. [ pdf ]
[2] M. Johnson, M. Schuster, Q. V. Le, M. Krikun, Y. Wu, Z. Chen, N. Thorat, F. Viégas, M. Wattenberg, G. Corrado, M. Hughes, and J. Dean, “Google's multilingual neural machine translation system: Enabling zero-shot translation,” Transactions of the Association for Computational Linguistics, vol. 5, pp. 339--351, 2017. [ DOI | pdf ]
[3] A. Eriguchi, M. Johnson, O. Firat, H. Kazawa, and W. Macherey, “Zero-shot cross-lingual classification using multilingual neural machine translation,” arXiv preprint arXiv:1809.04686, 2018. [ pdf ]
[4] A. Bapna, M. X. Chen, O. Firat, Y. Cao, and Y. Wu, “Training deeper neural machine translation models with transparent attention,” in Proc. Conference on Empirical Methods in Natural Language Processing (EMNLP), 2018. [ pdf ]
[5] C. Cherry, G. Foster, A. Bapna, O. Firat, and W. Macherey, “Revisiting character-based neural machine translation with capacity and compression,” in Proc. Conference on Empirical Methods in Natural Language Processing (EMNLP), 2018. [ pdf ]
[6] M. X. Chen, O. Firat, A. Bapna, M. Johnson, W. Macherey, G. Foster, L. Jones, M. Schuster, N. Shazeer, N. Parmar, A. Vaswani, J. Uszkoreit, L. Kaiser, Z. Chen, Y. Wu, and M. Hughes, “The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation,” in Proc. Annual Meeting of the Association for Computational Linguistics (ACL), 2018. [ pdf ]
[7] J. Kuczmarski and M. Johnson, “Gender-aware natural language translation,” 2018. [ pdf ]
[8] R. Aharoni, M. Johnson, and O. Firat, “Massively multilingual neural machine translation,” 2019. [ pdf ]
[9] J. Luo, Y. Cao, and R. Barzilay, “Neural decipherment via minimum-cost flow: From ugaritic to linear b,” 2019. [ http ]
[10] N. Arivazhagan, C. Cherry, W. Macherey, C.-C. Chiu, S. Yavuz, R. Pang, W. Li, and C. Raffel, “Monotonic infinite lookback attention for simultaneous machine translation,” in Proc. Annual Meeting of the Association for Computational Linguistics (ACL), 2019. [ http ]
[11] M. Freitag, I. Caswell, and S. Roy, “Ape at scale and its implications on mt evaluation biases,” 2019. [ pdf | http ]
[12] N. Arivazhagan, A. Bapna, O. Firat, D. Lepikhin, M. Johnson, M. Krikun, M. X. Chen, Y. Cao, G. Foster, C. Cherry, W. Macherey, Z. Chen, and Y. Wu, “Massively multilingual neural machine translation in the wild: Findings and challenges,” 2019. [ arXiv | http ]
[13] Y. Huang, Y. Cheng, A. Bapna, O. Firat, M. X. Chen, D. Chen, H. Lee, J. Ngiam, Q. V. Le, Y. Wu, and Z. Chen, “Gpipe: Efficient training of giant neural networks using pipeline parallelism,” in Advances in Neural Information Processing Systems, 2019. [ http ]

Speech recognition

[1] C.-C.Chiu, T. N. Sainath, Y. Wu, R. Prabhavalkar, P. Nguyen, Z. Chen, A. Kannan, R. J. Weiss, K. Rao, K. Gonina, N. Jaitly, B. Li, J. Chorowski, and M. Bacchiani, “State-of-the-art speech recognition with sequence-to-sequence models,” in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2018. [ pdf ]
[2] S. Toshniwal, T. N. Sainath, R. J. Weiss, B. Li, P. Moreno, E. Weinstein, and K. Rao, “Multilingual speech recognition with a single end-to-end model,” in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2018. [ pdf ]
[3] B. Li, T. N. Sainath, K. Sim, M. Bacchiani, E. Weinstein, P. Nguyen, Z. Chen, Y. Wu, and K. Rao, “Multi-Dialect Speech Recognition With a Single Sequence-to-Sequence Model,” in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2018. [ pdf ]
[4] T. N. Sainath, P. Prabhavalkar, S. Kumar, S. Lee, A. Kannan, D. Rybach, V. Schogol, P. Nguyen, B. Li, Y. Wu, Z. Chen, and C. C. Chiu, “No Need for a Lexicon? Evaluating the Value of the Pronunciation Lexica in End-to-End Models,” in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2018. [ pdf ]
[5] D. Lawson, C. C. Chiu, G. Tucker, C. Raffel, K. Swersky, and N. Jaitly, “Learning hard alignments with variational inference,” in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2018. [ pdf ]
[6] A. Kannan, Y. Wu, P. Nguyen, T. N. Sainath, Z. Chen, and R. Prabhavalkar, “An analysis of incorporating an external language model into a sequence-to-sequence model,” in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2018. [ pdf ]
[7] R. Prabhavalkar, T. N. Sainath, Y. Wu, P. Nguyen, Z. Chen, C. C. Chiu, and A. Kannan, “Minimum Word Error Rate Training for Attention-based Sequence-to-sequence Models,” in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2018. [ pdf ]
[8] T. N. Sainath, C. C. Chiu, R. Prabhavalkar, A. Kannan, Y. Wu, P. Nguyen, and Z. C. Z, “Improving the Performance of Online Neural Transducer Models,” in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2018. [ pdf ]
[9] C. C. Chiu and C. Raffel, “Monotonic Chunkwise Attention,” in Proc. International Conference on Learning Representations (ICLR), 2018. [ pdf ]
[10] I. Williams, A. Kannan, P. Aleksic, D. Rybach, and T. N. S. TN, “Contextual Speech Recognition in End-to-End Neural Network Systems using Beam Search,” in Proc. Interspeech, 2018. [ pdf ]
[11] C. C. Chiu, A. Tripathi, K. Chou, C. Co, N. Jaitly, D. Jaunzeikare, A. Kannan, P. Nguyen, H. Sak, A. Sankar, J. Tansuwan, N. Wan, Y. Wu, and X. Zhang, “Speech recognition for medical conversations,” in Proc. Interspeech, 2018. [ pdf ]
[12] R. Pang, T. N. Sainath, R. Prabhavalkar, S. Gupta, Y. Wu, S. Zhang, and C. C. Chiu, “Compression of End-to-End Models,” in Proc. Interspeech, 2018. [ pdf ]
[13] S. Toshniwal, A. Kannan, C. C. Chiu, Y. Wu, T. N. Sainath, and K. Livescu, “A comparison of techniques for language model integration in encoder-decoder speech recognition,” in Proc. IEEE Spoken Language Technology Workshop (SLT), 2018. [ pdf ]
[14] G. Pundak, T. N. Sainath, R. Prabhavalkar, A. Kannan, and D. Zhao, “Deep context: End-to-end contextual speech recognition,” in Proc. IEEE Spoken Language Technology Workshop (SLT), 2018. [ pdf ]
[15] B. Li, Y. Zhang, T. N. Sainath, Y. Wu, and W. Chan, “Bytes are all you need: End-to-end multilingual speech recognition and synthesis with bytes,” in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2019. [ pdf ]
[16] J. Guo, T. N. Sainath, and R. J. Weiss, “A spelling correction model for end-to-end speech recognition,” in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2019. [ pdf ]
[17] U. Alon, G. Pundak, and T. N. Sainath, “Contextual speech recognition with difficult negative training examples,” in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2019. [ pdf ]
[18] Y. Qin, N. Carlini, I. Goodfellow, G. Cottrell, and C. Raffel, “Imperceptible, robust, and targeted adversarial examples for automatic speech recognition,” in Proc. International Conference on Machine Learning (ICML), 2019. [ pdf ]
[19] D. S. Park, W. Chan, Y. Zhang, C. Chiu, B. Zoph, E. D. Cubuk, and Q. V. Le, “SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition,” in arXiv, 2019. [ pdf ]
[20] B. Li, T. N. Sainath, R. Pang, and Z. Wu, “Semi-supervised training for end-to-end models via weak distillation,” in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2019. [ pdf ]
[21] S.-Y. Chang, R. Prabhavalkar, Y. He, T. N. Sainath, and G. Simko, “Joint endpointing and decoding with end-to-end models,” in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2019. [ pdf ]
[22] J. Heymann, K. C. Sim, and B. Li, “Improving ctc using stimulated learning for sequence modeling,” in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2019. [ pdf ]
[23] A. Bruguier, R. Prabhavalkar, G. Pundak, and T. N. Sainath, “Phoebe: Pronunciation-aware contextualization for end-to-end speech recognition,” in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2019. [ pdf ]
[24] Y. He, T. N. Sainath, R. Prabhavalkar, I. McGraw, R. Alvarez, D. Zhao, D. Rybach, A. Kannan, Y. Wu, R. Pang, Q. Liang, D. Bhatia, Y. Shangguan, B. Li, G. Pundak, K. C. Sim, T. Bagby, S.-Y. Chang, K. Rao, and A. Gruenstein, “Streaming end-to-end speech recognition for mobile devices,” in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2019. [ pdf ]
[25] K. Irie, R. Prabhavalkar, A. Kannan, A. Bruguier, D. Rybach, and P. Nguyen, “On the choice of modeling unit for sequence-to-sequence speech recognition,” in Proc. Interspeech, 2019. [ pdf ]
[26] C. Peyser, H. Zhang, T. N. Sainath, and Z. Wu, “Improving Performance of End-to-End ASR on Numeric Sequences,” in Proc. Interspeech, 2019. [ pdf ]
[27] D. Zhao, T. N. Sainath, D. Rybach, D. Bhatia, B. Li, and R. Pang, “Shallow-fusion end-to-end contextual biasing,” in Proc. Interspeech, 2019. [ pdf ]
[28] T. N. Sainath, R. Pang, D. Rybach, Y. He, R. Prabhavalkar, W. Li, M. Visontai, Q. Liang, T. Strohman, Y. Wu, I. McGraw, and C.-C. Chiu, “Two-pass end-to-end speech recognition,” in Proc. Interspeech, 2019. [ pdf ]
[29] C.-C. Chiu, W. Han, Y. Zhang, R. Pang, S. Kishchenko, P. Nguyen, A. Narayanan, H. Liao, S. Zhang, A. Kannan, R. Prabhavalkar, Z. Chen, T. Sainath, and Y. Wu, “A comparison of end-to-end models for long-form speech recognition,” 2019. [ pdf ]
[30] A. Narayanan, R. Prabhavalkar, C. Chiu, D. Rybach, T. Sainath, and T. Strohman, “Recognizing long-form speech using streaming end-to-end models,” 2019. [ pdf ]
[31] T. N. Sainath, R. Pang, R. Weiss, Y. He, C.-C. Chiu, and T. Strohman, “An attention-based joint acoustic and text on-device end-to-end model,” in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020.
[32] Z. Lu, L. Cao, Y. Zhang, C.-C. Chiu, and J. Fan, “Speech sentiment analysis via pre-trained features from end-to-end asr models,” in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020.
[33] D. Park, Y. Zhang, C.-C. Chiu, Y. Chen, B. Li, W. Chan, Q. Le, and Y. Wu, “Specaugment on large scale datasets,” in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020. [ pdf ]
[34] T. Sainath, Y. He, B. Li, A. Narayanan, R. Pang, A. Bruguier, S. yiin Chang, W. Li, R. Alvarez, Z. Chen, C. cheng Chiu, D. Garcia, A. Gruenstein, K. Hu, M. Jin, A. Kannan, Q. Liang, I. McGraw, C. Peyser, R. Prabhavalkar, G. Pundak, D. Rybach, Y. Shangguan, Y. Sheth, T. Strohman, M. Visontai, Y. Wu, Y. Zhang, and D. Zhao, “A streaming on-device end-to-end model surpassing server-side conventional model quality and latency,” in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020.
[35] A. Gulati, J. Qin, C.-C. Chiu, N. Parmar, Y. Zhang, J. Yu, W. Han, S. Wang, Z. Zhang, Y. Wu, and R. Pang, “Conformer: Convolution-augmented transformer for speech recognition,” in Proc. Interspeech, 2020. [ pdf ]
[36] W. Han, Z. Zhang, Y. Zhang, J. Yu, C.-C. Chiu, J. Qin, A. Gulati, R. Pang, and Y. Wu, “Contextnet: Improving convolutional neural networks for automatic speech recognition with global context,” in Proc. Interspeech, 2020. [ pdf ]
[37] W. Li, J. Qin, C.-C. Chiu, R. Pang, and Y. He, “Parallel rescoring with transformer for streaming on-device speech recognition,” in Proc. Interspeech, 2020.
[38] D. S. Park, Y. Zhang, Y. Jia, W. Han, C.-C. Chiu, B. Li, Y. Wu, and Q. V. Le, “Improved noisy student training for automatic speech recognition,” in Proc. Interspeech, 2020. [ pdf ]

Language understanding

[1] A. Kannan, K. Chen, D. Jaunzeikare, and A. Rajkomar, “Semi-Supervised Learning for Information Extraction from Dialogue,” in Proc. Interspeech, 2018. [ pdf ]
[2] S. Yavuz, C. C. Chiu, P. Nguyen, and Y. Wu, “CaLcs: Continuously Approximating Longest Common Subsequence for Sequence Level Optimization,” in Proc. Conference on Empirical Methods in Natural Language Processing (EMNLP), 2018. [ pdf ]
[3] P. Haghani, A. Narayanan, M. Bacchiani, G. Chuang, N. Gaur, P. Moreno, R. Prabhavalkar, Z. Qu, and A. Waters, “From Audio to Semantics: Approaches to End-to-End Spoken Language Understanding,” in Proc. IEEE Spoken Language Technology Workshop (SLT), 2018. [ pdf ]
[4] M. X. Chen, B. N. Lee, G. Bansal, Y. Cao, S. Zhang, J. Lu, J. Tsay, Y. Wang, A. M. Dai, Z. Chen, T. Sohn, and Y. Wu, “Gmail smart compose: Real-time assisted writing,” in Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Association for Computing Machinery, 2019. [ pdf | http ]

Speech synthesis

[1] J. Shen, R. Pang, R. J. Weiss, M. Schuster, N. Jaitly, Z. Yang, Z. Chen, Y. Zhang, Y. Wang, R. Skerry-Ryan, R. A. Saurous, Y. Agiomyrgiannakis, and Y. Wu, “Natural TTS synthesis by conditioning WaveNet on mel spectrogram predictions,” in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2018. [ sound examples | pdf ]
[2] J. Chorowski, R. J. Weiss, R. A. Saurous, and S. Bengio, “On using backpropagation for speech texture generation and voice conversion,” in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2018. [ sound examples | pdf ]
[3] Y. Jia, Y. Zhang, R. J. Weiss, Q. Wang, J. Shen, F. Ren, Z. Chen, P. Nguyen, R. Pang, I. Lopez-Moreno, and Y. Wu, “Transfer learning from speaker verification to multispeaker text-to-speech synthesis,” in Advances in Neural Information Processing Systems, 2018. [ sound examples | pdf ]
[4] W. N. Hsu, Y. Zhang, R. J. Weiss, H. Zen, Y. Wu, Y. Wang, Y. Cao, Y. Jia, Z. Chen, J. Shen, P. Nguyen, and R. Pang, “Hierarchical generative modeling for controllable speech synthesis,” in Proc. International Conference on Learning Representations (ICLR), 2019. [ sound examples | pdf ]
[5] W. N. Hsu, Y. Zhang, R. J. Weiss, Y. A. Chung, Y. Wang, Y. Wu, and J. Glass, “Disentangling correlated speaker and noise for speech synthesis via data augmentation and adversarial factorization,” in NeurIPS 2018 Workshop on Interpretability and Robustness in Audio, Speech, and Language, 2018. [ pdf ]
[6] H. Zen, V. Dang, R. Clark, Y. Zhang, R. J. Weiss, Y. Jia, Z. Chen, and Y. Wu, “LibriTTS: A corpus derived from LibriSpeech for text-to-speech,” in Proc. Interspeech, 2019. [ data | pdf ]
[7] F. Biadsy, R. J. Weiss, P. Moreno, D. Kanvesky, and Y. Jia, “Parrotron: An end-to-end speech-to-speech conversion model and its applications to hearing-impaired speech and speech separation,” in Proc. Interspeech, 2019. [ sound examples | pdf ]
[8] Y. Zhang, R. J. Weiss, H. Zen, Y. Wu, Z. Chen, R. J. Skerry-Ryan, Y. Jia, A. Rosenberg, and B. Ramabhadran, “Learning to speak fluently in a foreign language: Multilingual speech synthesis and cross-language voice cloning,” in Proc. Interspeech, 2019. [ sound examples | pdf ]
[9] G. Sun, Y. Zhang, R. J. Weiss, Y. Cao, H. Zen, and Y. Wu, “Fully-hierarchical fine-grained prosody modeling for interpretable speech synthesis,” in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020. [ sound examples | pdf ]
[10] G. Sun, Y. Zhang, R. J. Weiss, Y. Cao, H. Zen, A. Rosenberg, B. Ramabhadran, and Y. Wu, “Generating diverse and natural text-to-speech samples using a quantized fine-grained VAE and auto-regressive prosody prior,” in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020. [ sound examples | pdf ]

Speech translation

[1] R. J. Weiss, J. Chorowski, N. Jaitly, Y. Wu, and Z. Chen, “Sequence-to-sequence models can directly translate foreign speech,” in Proc. Interspeech, 2017. [ pdf ]
[2] Y. Jia, M. Johnson, W. Macherey, R. J. Weiss, Y. Cao, C. C. Chiu, N. Ari, S. Laurenzo, and Y. Wu, “Leveraging weakly supervised data to improve end-to-end speech-to-text translation,” in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2019. [ pdf ]
[3] Y. Jia, R. J. Weiss, F. Biadsy, W. Macherey, M. Johnson, Z. Chen, and Y. Wu, “Direct speech-to-speech translation with a sequence-to-sequence model,” in Proc. Interspeech, 2019. [ sound examples | pdf ]

Speech enhancement

[1] S. Ding, Y. Jia, K. Hu, and Q. Wang, “Textual echo cancellation,” 2020. [ sound examples | pdf ]

Optimization

[1] R. Anil, V. Gupta, T. Koren, K. Regan, and Y. Singer, “Second order optimization made practical,” arXiv preprint arXiv:2002.09018, 2020. [ pdf ]
[2] N. Agarwal, R. Anil, E. Hazan, T. Koren, and C. Zhang, “Disentangling adaptive gradient methods from learning rates,” arXiv preprint arXiv:2002.11803, 2020. [ pdf ]
[3] R. Anil, V. Gupta, T. Koren, and Y. Singer, “Memory efficient adaptive optimization,” in Advances in Neural Information Processing Systems, pp. 9749--9758, 2019. [ pdf ]