List of reading lists and survey papers:
- Deep Learning, Yoshua Bengio, Ian Goodfellow, Aaron Courville, MIT Press, In preparation.
- Representation Learning: A Review and New Perspectives, Yoshua Bengio, Aaron Courville, Pascal Vincent, Arxiv, 2012. cited by 1746.
- The monograph or review paper: Learning Deep Architectures for AI, Foundations & Trends in Machine Learning, 2009.
- Deep Machine Learning – A New Frontier in Artificial Intelligence Research, a survey paper by Itamar Arel, Derek C. Rose, and Thomas P. Karnowski. 2010, cited by 367.
- Supervised sequence labelling with recurrent neural networks, Graves, A. Springer(2012).
- Deep Learning in Neural Networks: An Overview, Schmidhuber, J. (2014). 75 pages, 850+ references.
- Deep learning, LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton. Nature 521, no. 7553 (2015): 436-444.
- Playing Atari with deep reinforcement learning, Mnih, Volodymyr, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. arXiv preprint arXiv:1312.5602 (2013).
- Recurrent Models of Visual Attention. Volodymyr Mnih, Nicolas Heess, Alex Graves, Koray Kavukcuoglu. ArXiv e-print, 2014.
- ImageNet Classification with Deep Convolutional Neural Networks, Alex Krizhevsky, Ilya Sutskever, Geoffrey E Hinton, NIPS 2012.
- Going Deeper with Convolutions, Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich, 19-Sept-2014.
- Learning Hierarchical Features for Scene Labeling, Clement Farabet, Camille Couprie, Laurent Najman and Yann LeCun, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013.
- Learning Convolutional Feature Hierachies for Visual Recognition, Koray Kavukcuoglu, Pierre Sermanet, Y-Lan Boureau, Karol Gregor, Michaël Mathieu and Yann LeCun, Advances in Neural Information Processing Systems (NIPS 2010), 23, 2010.
- A novel connectionist system for unconstrained handwriting recognition. Graves, Alex, et al. Pattern Analysis and Machine Intelligence, IEEE Transactions on 31.5 (2009): 855-868.
- Deep, big, simple neural nets for handwritten digit recognition. Cireşan, D. C., Meier, U., Gambardella, L. M., & Schmidhuber, J. (2010). Neural computation, 22(12), 3207-3220.
- Multi-column deep neural networks for image classification, Ciresan, Dan, Ueli Meier, and Jürgen Schmidhuber. Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE, 2012.
- A committee of neural networks for traffic sign classification. Ciresan, D., Meier, U., Masci, J., & Schmidhuber, J. (2011, July). In Neural Networks (IJCNN), The 2011 International Joint Conference on (pp. 1918-1921). IEEE.
- Joint Learning of Words and Meaning Representations for Open-Text Semantic Parsing, Antoine Bordes, Xavier Glorot, Jason Weston and Yoshua Bengio (2012), in: Proceedings of the 15th International Conference on Artificial Intelligence and Statistics (AISTATS).
- Dynamic pooling and unfolding recursive autoencoders for paraphrase detection. Socher, R., Huang, E. H., Pennington, J., Ng, A. Y., and Manning, C. D. (2011a). In NIPS’2011.
- Semi-supervised recursive autoencoders for predicting sentiment distributions. Socher, R., Pennington, J., Huang, E. H., Ng, A. Y., and Manning, C. D. (2011b). In EMNLP’2011.
- Mikolov Tomáš: Statistical Language Models based on Neural Networks. PhD thesis, Brno University of Technology, 2012.
- Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Graves, Alex, and Jürgen Schmidhuber. Neural Networks 18.5 (2005): 602-610.
- Distributed representations of words and phrases and their compositionality. Mikolov, Tomas, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeff Dean. In Advances in Neural Information Processing Systems, pp. 3111-3119. 2013.
- Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. K. Cho, B. van Merrienboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, Y. Bengio. EMNLP 2014.
- Sequence to sequence learning with neural networks. Sutskever, Ilya, Oriol Vinyals, and Quoc VV Le. Advances in Neural Information Processing Systems. 2014.
- Measuring invariances in deep networks. Goodfellow, Ian, et al. Advances in neural information processing systems 22 (2009): 646-654.
- Better Mixing via Deep Representations. Bengio, Yoshua, et al. arXiv preprint arXiv:1207.4404 (2012).
- Domain Adaptation for Large-Scale Sentiment Classification: A Deep Learning Approach. Xavier Glorot, Antoine Bordes and Yoshua Bengio. in: Proceedings of the Twenty-eight International Conference on Machine Learning (ICML’11), pages 97-110, 2011.
- Raina, Rajat, et al. Self-taught learning: transfer learning from unlabeled data. Proceedings of the 24th international conference on Machine learning. ACM, 2007.
- Xavier Glorot, Antoine Bordes and Yoshua Bengio, Domain Adaptation for Large-Scale Sentiment Classification: A Deep Learning Approach, in: Proceedings of the Twenty-eight International Conference on Machine Learning (ICML’11), pages 97-110, 2011.
- R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu and P. Kuksa. Natural Language Processing (Almost) from Scratch. Journal of Machine Learning Research, 12:2493-2537, 2011.
- Mesnil, Grégoire, et al. Unsupervised and transfer learning challenge: a deep learning approach. Unsupervised and Transfer Learning Workshop, in conjunction with ICML. 2011.
- Ciresan, D. C., Meier, U., & Schmidhuber, J. (2012, June). Transfer learning for Latin and Chinese characters with deep neural networks. In Neural Networks (IJCNN), The 2012 International Joint Conference on (pp. 1-6). IEEE.
- Goodfellow, Ian, Aaron Courville, and Yoshua Bengio. Large-Scale Feature Learning With Spike-and-Slab Sparse Coding. ICML 2012.
- Improving neural networks by preventing co-adaptation of feature detectors. Hinton, Geoffrey E., et al. arXiv preprint arXiv:1207.0580 (2012).
- Practical recommendations for gradient-based training of deep architectures, Yoshua Bengio, U. Montreal, arXiv report:1206.5533, Lecture Notes in Computer Science Volume 7700, Neural Networks: Tricks of the Trade Second Edition, Editors: Grégoire Montavon, Geneviève B. Orr, Klaus-Robert Müller, 2012.
- A practical guide to training Restricted Boltzmann Machines, by Geoffrey Hinton.
- Emergence of simple-cell receptive field properties by learning a sparse code for natural images, Bruno Olhausen, Nature 1996. cited by 4441.
- Kavukcuoglu, Koray, Marc’Aurelio Ranzato, and Yann LeCun. Fast inference in sparse coding algorithms with applications to object recognition. arXiv preprint arXiv:1010.3467 (2010).
- Goodfellow, Ian, Aaron Courville, and Yoshua Bengio. Large-Scale Feature Learning With Spike-and-Slab Sparse Coding. ICML 2012.
- Efficient sparse coding algorithms. Honglak Lee, Alexis Battle, Raina Rajat and Andrew Y. Ng. In NIPS 19, 2007.
- Sparse coding with an overcomplete basis set: A strategy employed by VI?. Olshausen, Bruno A., and David J. Field. Vision research 37.23 (1997): 3311-3326.
- Deterministic Boltzmann learning performs steepest descent in weight-space. Hinton, Geoffrey E. Neural computation 1.1 (1989): 143-150. cited by 197.
- Modeling high-dimensional discrete data with multi-layer neural networks. Bengio, Yoshua, and Samy Bengio. Advances in Neural Information Processing Systems 12 (2000): 400-406. cited by 57.
- Greedy layer-wise training of deep networks. Bengio, Yoshua, et al. Advances in neural information processing systems 19 (2007): 153.
- Nonlocal estimation of manifold structure. Bengio, Yoshua, Martin Monperrus, and Hugo Larochelle. Neural Computation 18.10 (2006): 2509-2528.
- Reducing the dimensionality of data with neural networks. Hinton, Geoffrey E., and Ruslan R. Salakhutdinov. Science 313.5786 (2006): 504-507. cited by 4567.
- Sparse feature learning for deep belief networks. Marc’Aurelio Ranzato, Y., Lan Boureau, and Yann LeCun. Advances in neural information processing systems 20 (2007): 1185-1192. cited by 448.
- Scaling learning algorithms towards AI. Bengio, Yoshua, and Yann LeCun. Large-Scale Kernel Machines 34 (2007). cited by 594.
- Representational power of restricted boltzmann machines and deep belief networks. Le Roux, Nicolas, and Yoshua Bengio. Neural Computation 20.6 (2008): 1631-1649.
- Temporal-Kernel Recurrent Neural Networks. Sutskever, Ilya, and Geoffrey Hinton. Neural Networks 23.2 (2010): 239-243. cited by 20.
- Deep belief networks are compact universal approximators. Le Roux, Nicolas, and Yoshua Bengio. Neural computation 22.8 (2010): 2192-2207. cited by 74.
- On the expressive power of deep architectures. Bengio, Yoshua, and Olivier Delalleau. Algorithmic Learning Theory. Springer Berlin/Heidelberg, 2011. cited by 110.
- When Does a Mixture of Products Contain a Product of Mixtures?. Montufar, Guido F., and Jason Morton. arXiv preprint arXiv:1206.0387 (2012). cited by 23.
- On the Number of Linear Regions of Deep Neural Networks. Montúfar, Guido, Razvan Pascanu, Kyunghyun Cho, and Yoshua Bengio. arXiv preprint arXiv:1402.1869 (2014). cited by 101.
- The Manifold Tangent Classifier. Salah Rifai, Yann Dauphin, Pascal Vincent, Yoshua Bengio and Xavier Muller, in: NIPS’2011. cited by 111.
- Discriminative Learning of Sum-Product Networks. Gens, Robert, and Pedro Domingos, NIPS 2012 Best Student Paper. cited by 105.
- Maxout networks. Goodfellow, I., Warde-Farley, D., Mirza, M., Courville, A., and Bengio, Y. (2013). Technical Report, Universite de Montreal.
- Improving neural networks by preventing co-adaptation of feature detectors. Hinton, Geoffrey E., et al. arXiv preprint arXiv:1207.0580 (2012).
- Fast dropout training. Wang, Sida, and Christopher Manning. In Proceedings of the 30th International Conference on Machine Learning (ICML-13), pp. 118-126. 2013. cited by 103.
- Deep sparse rectifier networks. Glorot, Xavier, Antoine Bordes, and Yoshua Bengio. In Proceedings of the 14th International Conference on Artificial Intelligence and Statistics. JMLR W&CP Volume, vol. 15, pp. 315-323. 2011. cited by 807.
- ImageNet Classification with Deep Convolutional Neural Networks. Alex Krizhevsky, Ilya Sutskever, Geoffrey E Hinton, NIPS 2012.
- Building High-level Features Using Large Scale Unsupervised Learning. Quoc V. Le, Marc’Aurelio Ranzato, Rajat Monga, Matthieu Devin, Kai Chen, Greg S. Corrado, Jeffrey Dean, and Andrew Y. Ng, ICML 2012.
- Neural probabilistic language models. Bengio, Yoshua, et al. Innovations in Machine Learning (2006): 137-186. Specifically Section 3 of this paper discusses the asynchronous SGD.
- Large scale distributed deep networks. Dean, Jeffrey, et al. Advances in Neural Information Processing Systems. 2012. cited by 797.
- Training Recurrent Neural Networks. Ilya Sutskever, PhD Thesis, 2012. cited by 115.
- Learning long-term dependencies with gradient descent is difficult. Bengio, Yoshua, Patrice Simard, and Paolo Frasconi. Neural Networks, IEEE Transactions on 5.2 (1994): 157-166.
- Mikolov Tomáš: Statistical Language Models based on Neural Networks. PhD thesis, Brno University of Technology, 2012.
- Long short-term memory. Hochreiter, Sepp, and Jürgen Schmidhuber. Neural computation 9.8 (1997): 1735-1780. cited by 3787.
- Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. Hochreiter, S., Bengio, Y., Frasconi, P., & Schmidhuber, J. (2001).
- Learning complex, extended sequences using the principle of history compression. Schmidhuber, J. (1992). Neural Computation, 4(2), 234-242. cited by 213.
- Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. Graves, A., Fernández, S., Gomez, F., & Schmidhuber, J. (2006, June). In Proceedings of the 23rd international conference on Machine learning (pp. 369-376). ACM.
- Training Deep and Recurrent Neural Networks with Hessian-Free Optimization, James Martens and Ilya Sutskever, Neural Networks: Tricks of the Trade, 2012. cited by 54.
- No More Pesky Learning Rates. Schaul, Tom, Sixin Zhang, and Yann LeCun. arXiv preprint arXiv:1206.1106 (2012). cited by 132.
- Topmoumoute online natural gradient algorithm. Le Roux, Nicolas, Pierre-Antoine Manzagol, and Yoshua Bengio. Neural Information Processing Systems (NIPS). 2007. cited by 85.
- SGD-QN: Careful quasi-Newton stochastic gradient descent. Bordes, Antoine, Léon Bottou, and Patrick Gallinari. The Journal of Machine Learning Research 10 (2009): 1737-1754. cited by 205.
- Understanding the difficulty of training deep feedforward neural networks. Glorot, Xavier, and Yoshua Bengio. Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS’10). Society for Artificial Intelligence and Statistics. 2010. cited by 1112.
- Deep Sparse Rectifier Networks. Glorot, Xavier, Antoine Bordes, and Yoshua Bengio. Proceedings of the 14th International Conference on Artificial Intelligence and Statistics. JMLR W&CP Volume. Vol. 15. 2011. cited by 807.
- Deep learning via Hessian-free optimization. Martens, James. Proceedings of the 27th International Conference on Machine Learning (ICML). Vol. 951. 2010. cited by 337.
- Flat minima. Hochreiter, Sepp, and Jürgen Schmidhuber. Neural Computation, 9.1 (1997): 1-42. cited by 103.
- Revisiting natural gradient for deep networks. Pascanu, Razvan, and Yoshua Bengio. arXiv preprint arXiv:1301.3584 (2013). cited by 61.
- Identifying and attacking the saddle point problem in high-dimensional non-convex optimization. Dauphin, Yann N., Razvan Pascanu, Caglar Gulcehre, Kyunghyun Cho, Surya Ganguli, and Yoshua Bengio. In Advances in Neural Information Processing Systems, pp. 2933-2941. 2014. cited by 160.
- Deep boltzmann machines. Salakhutdinov, Ruslan, and Geoffrey E. Hinton. Proceedings of the international conference on artificial intelligence and statistics. Vol. 5. No. 2. Cambridge, MA: MIT Press, 2009. cited by 889.
- Scholarpedia page on Deep Belief Networks.
- An Efficient Learning Procedure for Deep Boltzmann Machines, Ruslan Salakhutdinov and Geoffrey Hinton, Neural Computation August 2012, Vol. 24, No. 8: 1967 — 2006. cited by 273.
- Deep Boltzmann Machines and the Centering Trick. Montavon, Grégoire, and Klaus-Robert Müller. Neural Networks: Tricks of the Trade (2012): 621-637.
- Efficient learning of deep boltzmann machines. Salakhutdinov, Ruslan, and Hugo Larochelle. International Conference on Artificial Intelligence and Statistics. 2010. cited by 271.
- Learning deep generative models. Salakhutdinov, Ruslan. Diss. University of Toronto, 2009. cited by 93.
- Multi-prediction deep Boltzmann machines. Goodfellow, Ian, et al. Advances in Neural Information Processing Systems. 2013.
- Unsupervised Models of Images by Spike-and-Slab RBMs, Aaron Courville, James Bergstra and Yoshua Bengio, in: ICML’2011. cited by 49.
- A practical guide to training restricted Boltzmann machines. Hinton, Geoffrey. Momentum 9.1 (2010): 926. cited by 894.
- Regularized Auto-Encoders Estimate Local Statistics, Guillaume Alain, Yoshua Bengio and Salah Rifai, Université de Montréal, arXiv report 1211.4246, 2012.
- A Generative Process for Sampling Contractive Auto-Encoders. Salah Rifai, Yoshua Bengio, Yann Dauphin and Pascal Vincent, in: ICML’2012, Edinburgh, Scotland, U.K., 2012.
- Contracting Auto-Encoders: Explicit invariance during feature extraction, Salah Rifai, Pascal Vincent, Xavier Muller, Xavier Glorot and Yoshua Bengio, in: ICML’2011.
- Disentangling factors of variation for facial expression recognition, Salah Rifai, Yoshua Bengio, Aaron Courville, Pascal Vincent and Mehdi Mirza, in: ECCV’2012.
- Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterionVincent, Pascal, et al. The Journal of Machine Learning Research 11 (2010): 3371-3408.
- A connection between score matching and denoising autoencoders. Vincent, Pascal. Neural computation 23.7 (2011): 1661-1674.
- Marginalized denoising autoencoders for domain adaptation. Chen, Minmin, et al. arXiv preprint arXiv:1206.4683 (2012).