I solved only Problem 1, 2, 6 & 7. And note that my implementation is NOT perfect.
This is implemented in only Python. I'm sorry that it's not Matlab implementation.
-
Problem 1 : implement batch steepest descent method and Newton's method
-
Problem 2 : implement PG
-
Problem 6 : nuclear norm
-
Problem 7 : implement Adam, AdaGrad, AdaDelta, RMS Prop, Nadam
You should also see the following references.
- https://jp.mathworks.com/help/matlab/ref/max.html
- https://www.programiz.com/python-programming/methods/built-in/max
- https://qiita.com/taka_horibe/items/9536931fbb26a6c51f6b
- https://www.cvxpy.org/tutorial/intro/index.html
- https://www.cvxpy.org/
- https://qiita.com/hagityann224/items/68fb4a07e90fffa0dadf
- https://medium.com/technology-nineleaps/logistic-regression-gradient-descent-optimization-part-1-ed320325a67e
- https://y-uti.hatenablog.jp/entry/2016/02/11/182602
- https://stats.stackexchange.com/questions/68391/hessian-of-logistic-function
- https://chrisyeh96.github.io/2018/06/11/logistic-regression.html#multinomial-logistic-regression-via-cross-entropy
- http://deeplearning.stanford.edu/tutorial/supervised/SoftmaxRegression/
- (important) https://houxianxu.github.io/2015/04/23/logistic-softmax-regression/
- https://www.sciencedirect.com/topics/mathematics/steepest-descent-method
- https://myenigma.hatenablog.com/entry/20141221/1419163905
- https://www.cs.cmu.edu/~mgormley/courses/10701-f16/slides/lecture5.pdf
- https://en.wikipedia.org/wiki/Newton%27s_method_in_optimization
- http://www.seas.ucla.edu/~vandenbe/236C/lectures/proxgrad.pdf
- (important for group lasso) https://qiita.com/msekino/items/9f217fcd735513627f65
- https://stanford.edu/~boyd/papers/prox_algs/lasso.html
- (important for group lasso) https://qiita.com/AnchorBlues/items/4e50d3b98a40c8b3086e#%E3%82%82%E3%81%A3%E3%81%A8%E8%A9%B3%E3%81%97%E3%81%8F
- (Japanese) https://qiita.com/ZoneTsuyoshi/items/8ef6fa1e154d176e25b8
- (English) http://ruder.io/optimizing-gradient-descent/index.html#nadam
- (paper) https://arxiv.org/pdf/1609.04747.pdf
- (paper) https://arxiv.org/pdf/1412.6980.pdf
- (code) https://colab.research.google.com/drive/1Ll77yBSeHtjzkEYWDjFCOQJp0yNUpZJ-#scrollTo=chRrDYOAXowh
- (paper) http://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf
- (code same as above) https://colab.research.google.com/drive/1Ll77yBSeHtjzkEYWDjFCOQJp0yNUpZJ-#scrollTo=chRrDYOAXowh
- (unpublished but first appeared in) https://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf
- (paper) http://cs229.stanford.edu/proj2015/054_report.pdf
- (paper) https://openreview.net/pdf?id=OM0jvwB8jIp57ZJjtNEZ
- (code by T. Dozat) https://github.com/tdozat/Optimization/blob/master/tensorflow/nadam.py
- https://towardsdatascience.com/understanding-support-vector-machine-part-1-lagrange-multipliers-5c24a52ffc5e
- https://medium.com/machine-learning-101/chapter-2-svm-support-vector-machine-theory-f0812effc72
- https://math.stackexchange.com/questions/2009274/whats-the-proximal-operator-of-the-nuclear-norm-optimization-problem
- http://yamagensakam.hatenablog.com/entry/2018/02/14/075106
(2019/7/9-2019/7/31)