-
Notifications
You must be signed in to change notification settings - Fork 491
SGDFusion function issue #256
Comments
I think you are right. could you contribute a PR for our merge? |
yes, i'll do the PR. Thanks. |
@minkkang hi, shall you move GetLocalRate() behind regularization() as the latter also changes the learning_param.diff? it's to be exact same with the no fusion version. |
Hi, When i analyzed the "SGDFusion" version(GetLocalRate -> Regularize & update) and the "NON-SGDFusion" version(Regularize -> GetLocalRate -> update), i'm not sure but i think "SGDFusion version" is right if GetLocalRate(LARS) was executed after "Normalize". When reading the LARS paper("LARGE BATCH TRAINING OF CONVOLUTIONAL NETWORKS WITH LAYER-WISE ADAPTIVE RATE SCALING[1]"), the flow of LARS is as follow Parameters: base LR γ0, momentum m, weight decay β, LARS coefficient η, number of steps T g[t] ←∇L(w[t]) // obtain a stochastic gradient for the current mini-batch (1) γ[t] ← γ0 * (1 − t/T)^2 // compute the global learning rate (2) λ ← ||w[t]|| / (||g[t]|| + β * ||w[t]||) // compute the local LR λ (3) // update the momentum (4) w[t+1] ← w[t] - v[t+1] // update the weights (5) But, the flow of "NON-SGDFusion" version is as follow g[t] ← ∇L(w[t]) //Call Normalize function g[t] ← β * w[t] + g[t] //Call ComputeUpdateValue function // (3) compute the local LR λ // update the momentum (4) w[t+1] ← w[t] - v[t+1] // update the weights (5) In this flow, v[t+1] value was changed. // LARS original The flow of "SGDFusion" version is as follow g[t] ← ∇L(w[t]) //Call SGDFusion function λ ← ||w[t]|| / ( ||g[t]|| + β * ||w[t]|| ) // compute the local LR λ (3) //execute Normalize( it should be executed before getlocalLR) // update the momentum (4) w[t+1] ← w[t] - v[t+1] // update the weights (5) In this flow, v[t+1] value is same. v[t+1] ← mv[t] + γ[t+1] * { ||w[t]|| / ( ||∇L(w[t])|| + β * ||w[t]|| ) } * ( ∇L(w[t]) + β * w[t] ) // LARS original I think, "SGDFusion" version looks same as LARS algorithm in [1]. So i think we just have to change the flow of executing "GetLocalRate" after "normalization". If i'm right, i'll change the "NON-SGDFusion". |
@ftian1 |
sorry for late respond due to Chinese New Year. Yes, I think your analysis is right. the non-fusion version should be updated. |
Thank you for reply. I'll do a PR after changing the flow. |
Hi i'm intel caffe user.
I think, i found the wrong flow of SGDFusion function (/sgd_solver.cpp).
When using GCC compiler or not using "iter_size", it doesn't make any problem. But, when using intel compiler and using "iter_size", LARS makes some problem.
As i know, when using intel compiler, SGD_FUSION option turns on.
In "SGD_FUSION" flow, it is executed in the order of "GetLocalRate(it includes LARS)", "normalize" , "regularization & update".
In this time, "normalize" divide "diff_data(mutable_cpu_diff or mutable_prv_diff)" by "iter_size". But, "LARS" is effected by sumsq_diff and sumsq_data.
So,i think "GetLocalRate" should be executed after "normalize".
After changing the SGD_FUSION flow("normalize" -> "GetLocalRate" -> "regularization & update"), LARS works fine.
Would you check the SGD_FUSION?
The text was updated successfully, but these errors were encountered: