Suggested reading:
- Understanding the difficulty of training deep feedforward neural networks
- Dropout: A Simple Way to Prevent Neural Networks from Overfitting
- Adam: A Method for Stochastic Optimization
- Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
- Random Search for Hyper-Parameter Optimization