ENH: Gamma Scaling Experiments

**Context**
In Kosta’s papers they represent a single and two layer NN as a random object and study its bias and variance. In particular, they introduce a scaled normalization of each layer’s output before it is fed to the next layer as in the reference [Normalization effects on deep neural networks](https://arxiv.org/abs/2209.01018), Equation 1. 

We are interested in studying if such scaling has any effect on RVFL type of networks w.r.t their performance . Do we get any guidance from numerical studies if gamma scaling has any effect on RVFL's generalization properties ? 

_In rest of the issue, I am using the term gamma to mean gamma scaled normalization._

**General questions:** 
1. Is there a universal gamma or a range that leads to consistent performance across multiple datasets ? 
    1. This is unlikely and conflicts with “No-free lunch theorem”. 
    2. Kosta’s response seemed to agree that this is not “free-lunch”
2. Are there specific properties of the dataset that are correlated with the gamma, so that we can utilize a-priori if such scaling would be useful for improving RVFL accuracy ? 
    1. Kosta’s provided an expression for  variance $\sigma(NN)=CN^{2/\gamma-1}e^{-At}$ , where $N-> \infty$, $t$ is running time, $C(X, w, a)$ is constant and function of input $X$, initialization $w$ and activation $a$, $A$ is a positive definite matrix, and A’s eigenvalues depend on $X, w, a$. But, the functional form of $C$ and $A$ in terms of $X, w, a$ is not clear. 
    2. Also, how variance $\sigma(NN)$ is related to accuracy of the function approximation is something I do not fully understand. 

**Experiment assumptions**
Activation function: non-polynomials and slowly increasing i.e., $tanh, sigmoid$ 
Gamma $\gamma$ : $[1/2, 1]$ 
Initialization weight function : normal 

**Experiment 1: Study the effect of $\gamma$ on solution accuracy for RVFL with direct solve, is there one specific value that gives the best approximation ?** 

_Requirement_ : We need to scale the output of each layer before feeding to the next layer as described in the 
reference. 

How to code this up in GFDL library code: 
1. Introduce an extra parameter to our GFDL base class as in [here](https://github.com/lanl/GFDL/blob/041c542a106c0b85820f4f9281a1b19393b58e80/src/gfdl/model.py#L44). 
2. Scale the design matrix for each layer after the activation function has been applied as in [here](https://github.com/lanl/GFDL/blob/041c542a106c0b85820f4f9281a1b19393b58e80/src/gfdl/model.py#L86).

Note: This design does not scale the input to first layer or the direct links. 

**Experiment 2 : Confirmation of experiment 1 with iterative solve.** 

Kosta’s response to why an iterative solver is needed: 
> Direct solution is great, but it is also hard to analyze analytically.  If the solvers converge, it makes sense to analyze the limit for large enough N. If it does not converge, it could mean different things, the algorithm may hit a local minimum, may be artifact of numerical algorithm. Hope is it should converge. 

_Constraint_: We need to specify the learning rate explicitly, so the learning rate has to be a hyper-parameter or at the very least exposed to the solver api even if it is constant. 

Why Ridge solver in current library cannot be re-purposed ? 
1. Currently, "reg_alpha=None" always leads to the direct solver path. 
2. Even if we manage to call the Ridge solver when "reg_alpha=None", the learning rate for sgd solver used by Ridge is a heuristic and not exposed through their [api](https://stackoverflow.com/questions/34141676/how-to-set-the-learning-rate-in-scikit-learns-ridge-regression).

How to code this up ? 
1. Implement standalone solvers (current approach, but I am considering the second option to avoid maintaining such solvers on our own)
2. Use other SGD implementations like torch.optim or scikit learning SGDRegressor, etc. 

Current design for exposing iterative solvers for the ordinary least-squares formulation is to add a solver type to "fit" method args as in [here](https://github.com/lanl/GFDL/blob/041c542a106c0b85820f4f9281a1b19393b58e80/src/gfdl/model.py#L91) along with kwargs to process solver specific arguments. 





Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Gamma Scaling Experiments #71

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ENH: Gamma Scaling Experiments #71

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions