New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to handle randomness in optimizer #4
Comments
@se4u not one of the authors here, but it seems the lib is relying on You can see how the authors use this here: nevergrad/nevergrad/benchmark/xpbase.py Lines 119 to 124 in 7fb29af
So I guess if you use either base python or numpy |
Hi, we are happy to welcome new algorithms indeed! Optimizer implementationYou are correct in your understanding: implementing these 4 functions should make it work. The only one that is absolutely necessary is RandomnessAs @tomjorquera has correctly seen (thanks for starting answering ;) ), we rely on seeding during the experiments to make the benchmark reproducible. However, this also means, we avoid seeding inside the optimizer, so that we can run several optimization with different seed. In a nutshell, as long as you do not seed inside the optimizer, everything should work fine. Also, please use ExperimentsYou have several options:
My preference go to the second option, so that everybody can use your experiment, and so that it does not modify existing experiments (for reproducibility of the existing experiments) In any case, I'll try to help you in the process if I am available (Christmas / New year period may slow down my answer speed though :s) and don't hesitate to ask more questions if anything remains unclear! |
Just mentioning that SPSA is very (very) welcome. |
I added SPSA to my fork of nevergrad and compared it on The results on
The class for SPSA is the following, I hardcoded an rng inside the class so that experiments are reproducible without modifying anything outside the class. @registry.register
class SPSA(base.Optimizer):
def __init__(self, dimension: int, budget: Optional[int] = None, num_workers: int = 1) -> None:
super().__init__(dimension, budget=budget, num_workers=num_workers)
self.rng = np.random.RandomState(seed=1234)
self.init = True
self.idx = 0
self.delta = self.ym = self.yp = None
self.t = np.zeros(self.dimension)
self.avg = np.zeros(self.dimension)
return
@staticmethod
def ck(k) -> float:
return 1e-1 / (k//2 + 1)**0.101
@staticmethod
def ak(k) -> float:
return 1e-5 / (k//2 + 1 + 10)**0.602
def _internal_ask(self) -> base.ArrayLike:
k = self.idx
if k % 2 == 0:
if not self.init:
self.t -= (self.ak(k) * (self.yp - self.ym) / 2 / self.ck(k)) * self.delta
self.avg += (self.t - self.avg) / (k // 2 + 1)
self.delta = 2 * self.rng.randint(2, size=self.dimension) - 1
return self.t - self.ck(k) * self.delta
return self.t + self.ck(k) * self.delta
def _internal_tell(self, x: base.ArrayLike, value: float) -> None:
setattr(self, ('ym' if self.idx % 2 == 0 else 'yp'), value)
self.idx += 1
if self.init and self.yp is not None and self.ym is not None:
self.init = False
return
def _internal_provide_recommendation(self) -> base.ArrayLike:
return self.avg The code and plots are here - https://github.com/se4u/nevergrad |
I'm happy you were able to obtain results so quickly! If you are willing to submit a pull request, you are most welcome to it. |
@jrapin I used my own rng because I didn't want to affect the external rng by executing my own code. Also, this makes it easy to test in future what is the effect of the randomness inherent in the SPSA algo. versus the random noise in the problem. Also, this makes the class self-contained. No matter when you initialize it, its behavior will be consistent. Regarding feedback, I think the code structure is pretty intuitive so that's great. I guess some guidelines about how to submit new algorithms will be useful. For example, I can just submit a pull request after adding the above class, but should I also update the One more thing is about asynchronous vs synchronous execution. I am not sure exactly what happens in asynchronous execution. SPSA code won't work in an asynchronous manner because of the ask/tell API. SPSA needs to get values of y(θ + δ) and y(θ - δ) and it needs to know the current iteration to chose a step size, so I am maintaining a state in the form of |
For testing, I tend to seed outside on a deterministic problem, I'll add some unittests doing just that eventually (we have some internally). Anyway, this is a minor detail, if you prefer it this way I'm fine with it, as long as we dont try to benchmark it on a deterministic problem (in this case, averaging on multiple runs would not make sense...). I'll try to improve the README indeed, but I think knowing the code too well makes it difficult to realize what is difficult for a newcomer. That is why I am asking feedbacks ;) You are also welcome to improve the README is you want (but I know well that it is not the most interesting of works :D). Concerning the experiments, you are raising good points that we add not considered so far. @teytaud any thoughts about it? |
As for the sequence of call, if the implementation does not support parallelization, just notify it with a |
I have kept thinking about how the algorithms should be seeded. The more I think about it, the more I think it should not be seeded:
So, as soon as I have some time (this is not the best time of year for this :D) I will prepare:
I'll post back here when this is ready ;). Sorry for changing minds about it, it is indeed an important decision you are pointing out here! |
I just wanted to mention that your results are super exciting! The version of SPSA you are using is supposed to converge at rate simple-regret=O(1/n^alpha) for alpha=2/3 or ... ? I guess some different noise models with a lot of dissymmetry in terms of variance could be more in favor of SPSA... not sure though. |
@teytaud The asymptotic convergence rate using my hyper parameters (a=0.602, γ=0.101) is k^(-(0.602-2*0.101)/2) = 1/k^(1/5). So in your terminology alpha = 1/5. There are some conditions which basically put restrictions on a, γ.¹ a ∈ [ 0.602, 1] and γ ∈ [a/6, a/2). The fastest rate that can be achieved is when a is highest, i.e. 1, and γ is lowest i.e. 1/6. With those hyperparameters the convergence rate will be alpha = 1/3. But Prof. Spall suggests to use the more conservative values and those are what I used. I think the analysis could probably be simplified and streamlined compared to his approach although I haven't tried it. [1] Spall, James C. "Multivariate stochastic approximation using a simultaneous perturbation gradient approximation." IEEE transactions on automatic control 37.3 (1992): 332-341. and theorem 7.2 on google books Also, I wanted to mention that regarding submitting a PR I am waiting for
|
|
|
I am getting the following two errors during testing. The first one is about my class not supporting parallelization. The second is about reproducibility. I am not sure why the second error is happening. I get the error regardless of whether I set a fixed seed in my optimizer rng or not
both of the above cause error. For convenience, I added a WIP PR #16
|
Thanks! Let's follow up on the PR ;) |
Please close the issue if everything is fine for you. |
I should mention that after some more work SPSA turns out to be very good. |
oh wow, really good to hear. I am trying to read the pdf but I must apologize, I wasn't quite able to figure out how to adapt the scale. Also I looked at section 4: "Statistics over all benchmarks" and Section 5: "conclusion" and I didn't see SPSA being competitive. How should I read the results ? |
Hi, I have prevously worked¹ on a gradient-free optimization algorithm called SPSA²-³, and I have Matlab/mex code⁴ that I can port to python easily. I am interested in benchmarking SPSA against other zero-order optimization algorithms using nevergrad.
I am following the instructions for benchmarking a new optimizer given in adding-your-own-experiments-andor-optimizers-andor-function. My understanding is that I can just add a new
SPSA
class innevergrad/optimization/optimizerlib.py
and implementfunctions and then add
SPSA
to theoptims
variable in the right experiment function in thenevergrad/benchmark/experiments.py
module and then I should be able to generate graphs likedocs/resources/noise_r400s12_xpresults_namecigar,rotationTrue.png
However, SPSA itself uses an rng in the
_internal_ask
function. But theoptimizer
base
class does not take any seed in the__init__
function. What will be a good way to make the experiments reproducible in such situation?[1] Pushpendre Rastogi, Jingyi Zhu, James C. Spall (2016). Efficient implementation of Enhanced Adaptive Simultaneous Perturbation Algorithms. CISS 2016, pdf
[2] https://en.wikipedia.org/wiki/Simultaneous_perturbation_stochastic_approximation
[3] https://www.chessprogramming.org/SPSA
[4] https://github.com/se4u/FASPSA/blob/master/src/SPSA.m
The text was updated successfully, but these errors were encountered: