Skip to content
This repository has been archived by the owner on Dec 6, 2023. It is now read-only.

Why not initialize SAG/SAGA memory with 0 and divide by seen indices so far as in sklearn? #199

Open
NikZak opened this issue Oct 14, 2022 · 0 comments

Comments

@NikZak
Copy link

NikZak commented Oct 14, 2022

Why you don't use initialize gradient memory with 0 and use the number of indices seen so far in SAG algorithm as suggested in the paper

In the update of x in Algorithm 1, we normalize the direction d by the total number of data points n. When initializing with y_i = 0 we believe this leads to steps that are too small on early iterations of the algorithm where we have only seen a fraction of the data points, because many y_i variables contributing to d are set to the uninformative zero-vector. Following Blatt et al. [2007], the more logical normalization is to divide d by m, the number of data points that we have seen at least once

SAGA paper suggests a similar procedure

Our algorithm assumes that initial gradients are known for each f_i at the starting point x0. Instead, a heuristic may be used where during the first pass, data-points are introduced one-by-one, in a non-randomized order, with averages computed in terms of those data-points processed so far. This procedure has been successfully used with SAG [1].

@NikZak NikZak changed the title Why not initialize SAG/SAGA memory with 0 and divide by seen updates as in sklearn? Why not initialize SAG/SAGA memory with 0 and divide by seen indices so far as in sklearn? Oct 14, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant