GitHub - harsh306/awesome-nn-optimization: Awesome list for Neural Network Optimization methods.

Content

Popular Optimization algorithms

SGD [Book]
Momentum [Book]
RMSProp [Book]
AdaGrad [Link]
ADAM [Link]
AdaBound [Link] [Github]
ADAMAX [Link]
NADAM [Link]
ADAMW [Link]
AdaLOMO Link
All optimizers list Awesome-Optimizer

Normalization Methods

BatchNorm [Link]
Weight Norm [Link]
Spectral Norm [Link]
Cosine Normalization [Link]
L2 Regularization versus Batch and Weight Normalization Link
WHY GRADIENT CLIPPING ACCELERATES TRAINING: A THEORETICAL JUSTIFICATION FOR ADAPTIVITY Link

On Convexity and Generalization of Neural Networks

Convex Neural Networks [Link]
Breaking the Curse of Dimensionality with Convex Neural Networks [Link]
UNDERSTANDING DEEP LEARNING REQUIRES RETHINKING GENERALIZATION [Link]
Optimal Control Via Neural Networks: A Convex Approach. [Link]
Input Convex Neural Networks [Link]
A New Concept of Convex based Multiple Neural Networks Structure. [Link
SGD Converges to Global Minimum in Deep Learning via Star-convex Path [Link]
A Convergence Theory for Deep Learning via Over-Parameterization Link

Continuation Methods and Curriculum Learning

Curriculum Learning [Link]
SOLVING RUBIK’S CUBE WITH A ROBOT HAND Link
Noisy Activation Function [Link]
Mollifying Networks [Link]
Curriculum Learning by Transfer Learning: Theory and Experiments with Deep Networks Link Talk
Automated Curriculum Learning for Neural Networks Link
On The Power of Curriculum Learning in Training Deep Networks Link
On-line Adaptative Curriculum Learning for GANs Link
Parameter Continuation with Secant Approximation for Deep Neural Networks and Step-up GAN Link
HashNet: Deep Learning to Hash by Continuation. [Link]
Learning Combinations of Activation Functions. [Link]
Learning and development in neural networks: The importance of starting small (1993) Link
Flexible shaping: How learning in small steps helps Link
Curriculum Labeling: Self-paced Pseudo-Labeling for Semi-Supervised Learning Link
RETHINKING CURRICULUM LEARNING WITH INCREMENTAL LABELS AND ADAPTIVE COMPENSATION Link
Parameter Continuation Methods for the Optimization of Deep Neural Networks Link
Denoising Neural Machine Translation Training with Trusted Data and Online Data Selection [Link (https://www.aclweb.org/anthology/W18-6314.pdf)
Reinforcement Learning based Curriculum Optimization for Neural Machine Translation Link
EVOLUTIONARY POPULATION CURRICULUM FOR SCALING MULTI-AGENT REINFORCEMENT LEARNING Link
ENTROPY-SGD: BIASING GRADIENT DESCENT INTO WIDE VALLEYS Link
NEIGHBOURHOOD DISTILLATION: ON THE BENEFITS OF NON END-TO-END DISTILLATION Link
LEARNING TO EXECUTE Link
Cyclical Annealing Schedule: A Simple Approach to Mitigating KL Vanishing Link
Data Parameters: A New Family of Parameters for Learning a Differentiable Curriculum Link
Breaking the Curse of Space Explosion: Towards Effcient NAS with Curriculum Search Link
Continuation Methods and Curriculum Learning for Learning to Rank Link

On Loss Surfaces and Generalization of Deep Neural Networks

Exact solutions to the nonlinear dynamics of learning in deep linear neural networks Link
QUALITATIVELY CHARACTERIZING NEURAL NETWORK OPTIMIZATION PROBLEMS[Link]
The Loss Surfaces of Multilayer Networks [Link]
Visualizing the Loss Landscape of Neural Nets [Link]
The Loss Surface Of Deep Linear Networks Viewed Through The Algebraic Geometry Lens [Link]
How regularization affects the critical points in linear networks.[Link]
Local minima in training of neural networks [Link]
Necessary and Sufficient Geometries for Gradient Methods Link
Fine-grained Optimization of Deep Neural Networks Link
SCORE-BASED GENERATIVE MODELING THROUGH STOCHASTIC DIFFERENTIAL EQUATIONS Link

Dynamics, Bifurcations and RNNs difficulty to train

Deep Equilibrium Models Link
Bifurcations of Recurrent Neural Networks in Gradient Descent Learning [Link]
On the difficulty of training recurrent neural networks [Link]
Understanding and Controlling Memory in Recurrent Neural Networks [Link]
Dynamics and Bifurcation of Neural Networks [Link]
Context Aware Machine Learning [Link]
The trade-off between long-term memory and smoothness for recurrent networks [Link]
Dynamical complexity and computation in recurrent neural networks beyond their fxed point [Link]
Bifurcations in discrete-time neural networks : controlling complex network behaviour with inputs [Links]
Interpreting Recurrent Neural Networks Behaviour via Excitable Network Attractors [Link]
Bifurcation analysis of a neural network model Link
A Differentiable Physics Engine for Deep Learning in Robotics Link
Deep learning for universal linear embeddings of nonlinear dynamics Link
Deep Hidden Physics Models: Deep Learning of Nonlinear Partial Differential Equations Link
Analysis of gradient descent learning algorithms for multilayer feedforward neural networks Link
A dynamical model for the analysis and acceleration of learning in feedforward networks Link
A bio-inspired bistable recurrent cell allows for long-lasting memory Link
Equilibrium Propagation: Bridging the Gap between Energy-Based Models and Backpropagation [Link (https://www.frontiersin.org/articles/10.3389/fncom.2017.00024/full)

Poor Local Minima? and Sharp Minima

Adding One Neuron Can Eliminate All Bad Local Minima Link
Deep Learning without Poor Local Minima Link
Elimination of All Bad Local Minima in Deep Learning Link
How to escape saddle points efficiently. Link
Depth with Nonlinearity Creates No Bad Local Minima in ResNets Link
Sharp Minima Can Generalize For Deep Nets Link
Asymmetric Valleys: Beyond Sharp and Flat Local Minima Link
A Reparameterization-Invariant Flatness Measure for Deep Neural Networks Link
A Simple Weight Decay Can Improve Generalization Link
Finding Critical and Gradient-Flat Points of Deep Neural Network Loss Functions Link
The Loss Surface Of Deep Linear Networks Viewed Through The Algebraic Geometry Lens Link
Theoretical Issues in Deep Networks: Approximation, Optimization and Generalization Link
Flatness is a False Friend Link
Are_Saddles_Good_Enough_for_Deep_Learning Link

Initialization of Neural Network

Deep learning course notes Link
On the importance of initialization and momentum in deep learning Link
The Break-Even Point on Optimization Trajectories of Deep Neural Networks Link
THE EARLY PHASE OF NEURAL NETWORK TRAINING Link
One ticket to win them all: generalizing lottery ticket initializations across datasets and optimizers Link
PCA-Initialized Deep Neural Networks Applied To Document Image Analysis Link
Understanding the difficulty of training deep feedforward neural networks Link
Unitary Evolution of RNNs Link

Momentum in Optimization

RETHINKING THE HYPERPARAMETERS FOR FINE-TUNING Link
Momentum Residual Neural Networks Link
Smooth momentum: improving lipschitzness in gradient descent Link
Momentum-based Weight Interpolation of Strong Zero-Shot Models for Continual Learning link

Batch size Optimiation

ON LARGE-BATCH TRAINING FOR DEEP LEARNING: GENERALIZATION GAP AND SHARP MINIMALink
Revisiting Small Batch Training for Deep Neural Networks Link
LARGE BATCH TRAINING OF CONVOLUTIONAL NETWORKS Link
Large Batch Optimization for Deep Learning: Training BERT in 76 minutes Link
DON’T DECAY THE LEARNING RATE, INCREASE THE BATCH SIZE Link

Degeneracy of Neural Networks

Exact solutions to the nonlinear dynamics of learning in deep linear neural networks Link
Avoiding pathologies in very deep networks Link
Resurrecting the sigmoid in deep learning through dynamical isometry: theory and practice Link
SKIP CONNECTIONS ELIMINATE SINGULARITIES Link
How degenerate is the parametrization of neural networks with the ReLU activation function? Link
Theory of Deep Learning III: explaining the non-overfitting puzzle Link
Provable Benefit of Orthogonal Initialization in Optimizing Deep Linear Networks Link
Understanding Deep Learning: Expected Spanning Dimension and Controlling the Flexibility of Neural Networks Link
The Loss Surface Of Deep Linear Networks Viewed Through The Algebraic Geometry Lens Link
PYHESSIAN: Neural Networks Through the Lens of the Hessian Link

Convergencec Analysis in Deep Learning

A CONVERGENCE ANALYSIS OF GRADIENT DESCENT FOR DEEP LINEAR NEURAL NETWORKS Link
A Convergence Theory for Deep Learning via Over-Parameterization Link
Convergence Analysis of Homotopy-SGD for Non-Convex Optimization Link

Multi-Task Learning with curricula

Learning the Curriculum with Bayesian Optimization for Task-Specific Word Representation Learning. Link
Learning a Multitask Curriculum for Neural Machine Translation. Link
Self-paced Curriculum Learning. Link
Curriculum Learning of Multiple Tasks. Link

Constrained Optimization for Deep Learning

A Primal-Dual Formulation for Deep Learning with Constraints Link

Reinforcement Learning and Curriculum

Object-Oriented Curriculum Generation for Reinforcement Learning Link
Teacher-Student Curriculum Learning Link

Tutorials, Surveys and Blogs

Curriculum Learning: A Survey Link
A Comprehensive Survey on Curriculum Learning Link
https://www.offconvex.org/
An overview of gradient descent optimization algorithms [Link]
Review of second-order optimization techniques in artificial neural networks backpropagation Link
Linear Algebra and data Link
Why Momentum really works?[Blog]
Optimization [Book]
Optimization for deep learning: theory and algorithms Link
Generalization Error in Deep Learning Link
Automatic Differentiation in Machine Learning: a Survey Link
Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey Link
Automatic Curriculum Learning For Deep RL: A Short Survey Link
The Generalization Mystery: Sharp vs Flat Minima Link

Contributing

If you've found any informative resources that you think belong here, be sure to submit a pull request or create an issue!

If you find this helpful, I can enjoy a coffee donation :)

Or send me 2-4 dollars on my venmo account @HARSHNILESH-PATHAK

Name		Name	Last commit message	Last commit date
Latest commit History 89 Commits
.github		.github
LICENSE		LICENSE
README.md		README.md
nn_architecture.md		nn_architecture.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.github

.github

LICENSE

LICENSE

README.md

README.md

nn_architecture.md

nn_architecture.md

Repository files navigation

Content

Popular Optimization algorithms

Normalization Methods

On Convexity and Generalization of Neural Networks

Continuation Methods and Curriculum Learning

On Loss Surfaces and Generalization of Deep Neural Networks

Dynamics, Bifurcations and RNNs difficulty to train

Poor Local Minima? and Sharp Minima

Initialization of Neural Network

Momentum in Optimization

Batch size Optimiation

Degeneracy of Neural Networks

Convergencec Analysis in Deep Learning

Multi-Task Learning with curricula

Constrained Optimization for Deep Learning

Reinforcement Learning and Curriculum

Tutorials, Surveys and Blogs

Contributing

If you find this helpful, I can enjoy a coffee donation :)

About

Releases

Sponsor this project

Packages

License

harsh306/awesome-nn-optimization

Folders and files

Latest commit

History

Repository files navigation

Content

Popular Optimization algorithms

Normalization Methods

On Convexity and Generalization of Neural Networks

Continuation Methods and Curriculum Learning

On Loss Surfaces and Generalization of Deep Neural Networks

Dynamics, Bifurcations and RNNs difficulty to train

Poor Local Minima? and Sharp Minima

Initialization of Neural Network

Momentum in Optimization

Batch size Optimiation

Degeneracy of Neural Networks

Convergencec Analysis in Deep Learning

Multi-Task Learning with curricula

Constrained Optimization for Deep Learning

Reinforcement Learning and Curriculum

Tutorials, Surveys and Blogs

Contributing

If you find this helpful, I can enjoy a coffee donation :)

About

Topics

Resources

License

Stars

Watchers

Forks

Sponsor this project