New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add support for alternative cmaes implementation #546
add support for alternative cmaes implementation #546
Conversation
Hi @dietmarwo! Thank you for your pull request and welcome to our community.We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. If you have received this in error or have any questions, please contact us at cla@fb.com. Thanks! |
Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @dietmarwo
Thank you for your contribution!
I've added a few comments below before we can merge this ;)
@@ -256,6 +257,82 @@ def __init__( | |||
CMA = ParametrizedCMA().set_name("CMA", register=True) | |||
DiagonalCMA = ParametrizedCMA(diagonal=True).set_name("DiagonalCMA", register=True) | |||
|
|||
class _FCMA(base.Optimizer): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since most of the code is similar to CMA the code should be shared as much as possible. Avoid code duplication makes it way easier to iterate on the long run. Two ways of doing it here:
- add an argument in the original CMA (
fcmaes: bool = False
), if some arguments are not supported in one of the other version (popsize? can it be added to the original CMA? and diagonal for FCMA?), raise an error/warning if it is set and shouldnt. - inherit the original CMA and override methods that are different. I try to If I'm correct that's only the
es
property.
I have a strong preference for the first option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, will implement the first option.
parametrization: IntOrParameter, | ||
budget: Optional[int] = None, | ||
num_workers: int = 1, | ||
scale: float = 1.0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in practice scale
is used in some experiments but the parametrization is now supposed to hold one as well, so it's mostly a legacy argument, if you opt for a different class, I would advice removing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
scale is a very important parameter
Agreed, but it's just a scaling of the space, it's more general than just CMA, hence already handled in the parametrization which normalizes the space. So I think it's redundant, but we can always keep it anyway
Both the original CMA and FCMA support scale and popsize. Should we define both via parametrization?
Then let's make it possible to define both in the algorithm configuration (aka have them both in the init).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have to check how parametrization normalizes the space. Could be that 0.3 is a better default than 1.0, see "is expected to lie within about x0 +- 3*sigma0." here:
https://github.com/CMA-ES/pycma/blob/master/cma/evolution_strategy.py
sigma0
initial standard deviation. The problem variables should
have been scaled, such that a single standard deviation
on all variables is useful and the optimum is expected to
lie within about x0 +- 3*sigma0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the very aim of the parametrization is to provide a bridge between the input space of the function (which may be discrete, continuous or a mixture of both, with different scaling etc) and a space which is centered around 0 and has standard deviation 1. So if the problem was correctly parametrized by the user, the solution is indeed supposed to lie in x0 +- 3*sigma0.
Then again, parametrization is harder and more important than most people think, so it's not always true, but the best guess you can make is that the vectors you need to handle in _internal_ask
and _internal_tell
are as close as you can know from a centered and reduced variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overlooked the importance of the parametrization package. So in transforms.py you define transformations mapping input arguments to a standardized interval. Affine(Transform) rescales (and recenters) an input variable. This means that sigma0=1.0 is the "correct" default value both for the original cma and fcma
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of the transformations do not happen anymore in transforms.py
, but in data.py
but yes you got it right ;) and sigma0=1.0 is a "best initial guess" (since I would expect ill-parametrization happens quite often, I'm trying hard to make it simpler and simpler though ;) )
@@ -7,6 +7,7 @@ | |||
from collections import deque | |||
import warnings | |||
import cma | |||
from fcmaes import cmaes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This adds a dependency but has not been added to the requirements.
I'd try to avoid adding dependencies when I can, so, I would recommend making this step by step:
for now we would add fcmaes as a dependency in requirements/bench.txt so that it is always available at least in the benchmarks.
Then in order to avoiding breaking all the code when this is not installed, add the import inline:
@property
def es(self) -> tp.Any: # typing not possible since not imported :(
try:
from fcmaes import cmaes
except ImportError as e:
raise ImportError("Please install fcmaes (pip install fcmaes) to use FCMA optimizers") from e
if self._es is None:
...
That would make it a soft dependency for now, and eventually if it becomes more and more relevant compared to the current CMA, we could move it to a hard dependency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds reasonable. Will do as you propose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ci/circleci check still fails since the soft dependency is not fulfilled.
Requires the change in requirements/bench.txt. Shall I add this dependency as
part of the PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes please do. bench.txt requirements are "extra" requirements, installed if required by the users.
|
||
|
||
class ParametrizedFCMA(base.ConfiguredOptimizer): | ||
"""FCMA-ES optimizer, wrapping external implementation: https://github.com/dietmarwo/fast-cma-es |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should add a few more lines about how different it is from the original implementation and in which case we should use your implementation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Short answer: If you don't use DiagonalOnly FCMA should be the better/faster option. At least this is what my experiments using your benchmarks suggest. But with both variants
it is really important to use "export MKL_NUM_THREADS=1" or "export OPENBLAS_NUM_THREADS=1" for OPENBLAS if num_workers > 1 for the benchmark.
When BLAS multithreading conflicts with multithreading at the benchmark level CMA can slow down up to factor 10 for higher dimensions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we need to add a context manager which sets this variables when calling CMA ? Inside ask and or tell in order to avoid border effects
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fcmaes provides a retry mechanism where I set these variables internally. Tried that with Nevergrad and it failed somehow, probably because they were not propagated to the workers. Will investigate this further.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate on the mechanism?
super().__init__(_FCMA, locals()) | ||
|
||
FCMA = ParametrizedFCMA().set_name("FCMA", register=True) | ||
MiniFCMA = ParametrizedFCMA(scale=0.1, popsize=13).set_name("MiniFCMA", register=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These Mini
variants are mostly used in benchmarks, if this is a variant you want to use in benchmarks, please move it to optimization/experimentalvariants.py (this is new from this week) instead.
In the future, we want to keep only the very main variants of the algorithms in optimizerlib (so FCMA
but not MiniFCMA
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
popsize will not be in the parametrization, the parametrization only provides information on how the optimization space is structured, so I expect that popsize will always remain as a configuration to the optimizer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then we should also leave scaling as configuration to the optimizer. You wrote above
" hence already handled in the parametrization which normalizes the space."
which leaves scaling as an additional parameter not related to the optimization space.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cf discussion above about sigma? anyway I'd rather have both popsize and sigma for the time being.
nevergrad/optimization/base.py
Outdated
@@ -456,6 +456,13 @@ def minimize( | |||
if new_sugg: | |||
sleeper.start_timer() | |||
remaining_budget = self.budget - self.num_ask | |||
#kills the budget if CMA-ES or FCMA-ES found an optimum | |||
#should be replaced by a general concept checking whether an optimization algo signals it wants to stop | |||
if hasattr(self, '_es') and (not self._es is None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, that something that we will need (and more generally also a way to specify a stopping criterion for the optimizer.
For now I'm not really fond of adding this specific code here, I'll give it a thought.
@teytaud any idea?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What you really would like to do if you have lets say a budget of 100000 and CMA stops after 10000 evals is to have a number of retries. But there is a problem with that: you shouldn't use the same guess, and without bounds you don't know how to generate different guesses.
If we drop the code above we get wrong time measurements for CMA and FCMA, since
you measure the time for evals after termination of the optimization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now we've not been investigating much in the actual time needed by the optimizers, since it's we usually assume that this time is negligeable in front of the function evaluation time.
I don't mean it does not make sense (especially given I feel BO optimizers are a pain to benchmark because they are so sloooooow), but we don't have a solution for this so far indeed :s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw you have benchmarks with very high dimensionality (> 1000). Since covariance matrix adaption has quadratic complexity, the time needed by the optimizer could become relevant.
Beside that, there are interesting optimization problems like https://www.esa.int/gsp/ACT/projects/gtop/messenger_full/ where evaluation is fast. For this problem fcmaes is four times as fast (including function evaluation) than original cmaes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the code from the PR as requested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, if you want to have some mechanism for CMA without without interfering with the base optimizer class, you could change the budget right it the ask or tell method or (F)CMA.
Then again, it's hacky and since there is no framework for this, it may have weird behavior in the benchmarks but we can always remove it if that fails. So far in the benchmarks we have only wanted to compare with the same budget, so some optimizers are asked to start all over again from a different starting point if they consider they have converged.
If you have ideas on how to deal with this for all optimizers correctly please open an issue to discuss it (if you dont have an idea but do want to see the feature, please open it anyway :D) but that may take some time to get it right for the whole package :s
Implemented the requested changes, updated recorded_recommendations.csv using pytest. |
In summary, you can add fcmaes in bench.txt. |
After updating bench.test I see github showing 24 files changed, but that was after I merged master. In reality, if you check master...dietmarwo:patch/master/AddFcmaes you see the real difference to nevergrad/master which is as it should be (4 files). Do you |
In https://github.com/facebookresearch/nevergrad/blob/master/mypy.ini
Not really, never had this kind of problem. Maybe you need to update the master branch in your fork as well? It seems it is behind the main master branch, and the missing commits are the one mixed with your changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few details to change but it looks quite nice.
I'll need to see the actual exact diff to make sure at some point, try updating your fork's master, hopefully that will work :s
popsize = max(self.num_workers, 4 + int(3 * np.log(self.dimension))) if self._popsize is None else self._popsize | ||
if self._fcmaes: | ||
if self._diagonal: | ||
raise RuntimeError("fcmaes doesn't support diagonal=True, use fcmaes=False") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check should be performed right in the init of ParametrizedCMA, this way we'd get the error way earlier.
Will apply your suggestions. Did you see https://github.community/t5/How-to-use-Git-and-GitHub/GitHub-pull-requests-showing-invalid-diff-for-already-merged/td-p/3000 ? I think this applies here, since I merged from master. My forked master is up to date, this seems May be the only option is to close this PR and I create a new one based on a new branch from current master, if comparing using master...dietmarwo:patch/master/AddFcmaes is not sufficient. Github says: "When you merge commits from a Pull Request into a branch, the commit SHA changes from what the original commit SHA's were. |
All green now and all your suggestions are in. Regarding "I'll need to see the actual exact diff" : master...dietmarwo:patch/master/AddFcmaes is the actual exact diff between your master and what is to be merged on my fork branch. If you want a correct diff on the PR I fear I have to create a new one and we close this one. |
I'd rather have an actual diff. Maybe I'm over-careful but if github cant get the diff right, I'm afraid it may have the merge wrong as well and who knows what can happen then :D I would try this in the following order:
When I can see the actual diff in the PR it should be good to go as far as I can see from the other diff. |
May be it wasn't clear: I updated my master branch already yesterday, this is the reason you see these (your) master branch commits in the diff, since the reference point is the old branching point as the PR was created. Github now shows your commits as part of the PR. Which is wrong, since it What do you mean with "changing the PR branches back and forth"? What exactly master...dietmarwo:patch/master/AddFcmaes compares
By the way, if really something unexpected happens, its very easy to revert the change: But you need to do a "squash and merge" - the squash part will combine all commits into one. |
3f2f1a8
to
4a266ef
Compare
Squashed all commits on my side, merged again with master - there was a new conflict - and now it shows the correct changes. Didn't know that squashing at my side has this effect. Nevertheless, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
Thanks for contributing ;) |
Did some experiments what setting is optimal for NGO / Shiva. |
For now I think NGO/Shiva is not very good for small budget, which may explain the interest of the popsize (pop sizes have not been tuned so far, and they should most definitely be). |
Just mentioning that I launch a vast campaign of experiments for comparing CMA and FCMAES. |
Just make sure you do a "pip install fcmaes --upgrade" before testing since
there are some improvements.
My impression is that https://github.com/CMA-ES/libcmaes is the real
"official" cma-es implementation. It's quite similar to fcmaes (uses a
EIGEN3 based C++ base implementation) but there is no efficient pure Python
version as fcmaes has. If you are not happy with
https://github.com/CMA-ES/pycma and need a fast pure Python implementation,
fcmaes is probably still the only option. If you need changes / additional
features I can support you and also will accept PRs for fcmaes. Recently I
wrote two new documentation chapters
https://github.com/dietmarwo/fast-cma-es/blob/master/HyperparameterOptimization.adoc
and
https://github.com/dietmarwo/fast-cma-es/blob/master/DelayedUpdate.adoc
and therefore slightly extended the interface to further enhance the
support for parallel objective
function evaluation. Seems CMA-ES outperforms Bayesian Optimization for
hyperparameter optimization,
although this was never tested in the literature.
Am Do., 31. Dez. 2020 um 09:47 Uhr schrieb Teytaud <notifications@github.com
…:
Just mentioning that I launch a vast campaign of experiments for comparing
CMA and FCMAES.
In particular, we have a problem with some cases in which CMA is too slow
(I mean, the internal computational cost is too big) -- let us see if FCMA
solves this.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#546 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AOIZMKWJ4GXSDVI2PEY26Q3SXQ3A7ANCNFSM4K4CUWWQ>
.
|
We prepare a vast comparison between cma, bayesian optimization and many others, you are already a significant contributor to this forthcoming work :-) you can contact me at: Whatsapp +44 7540 143007. |
Found the reason for the high dimension speedup of pycma:
https://github.com/CMA-ES/pycma/blob/02112373b52b89b161d957c5df1af64afac8556b/cma/sampler.py#L309
There is a lazy_update_gap parameter which avoids covariance update for
large dimensions. It is trivial to add this feature in fcmaes if you think
this is useful Large dimensions were never in my focus up to know.
Am Do., 31. Dez. 2020 um 19:29 Uhr schrieb Teytaud <notifications@github.com
…:
We prepare a vast comparison between cma, bayesian optimization and many
others, you are already a significant contributor to this forthcoming work
:-) you can contact me at:
Whatsapp +44 7540 143007.
Messenger under name "Olivier Teytaud"
Email ***@***.***
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#546 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AOIZMKXXO4JYAQ64CV55DA3SXS7JRANCNFSM4K4CUWWQ>
.
|
Types of changes
Motivation and Context / Related issue
Bad performance of cmaes for higher dimensions
How Has This Been Tested (if it applies)
Tested by including the new optimization algorithm in several benchmarks and executed
them successfully
Checklist
Purpose of this PR is:
a) Enable experiments with an alternative CMA implementation -
install with "pip install fcmaes".
b) Exposing popsize as an important CMA parameter
c) Reading _es.stop() / es.stop to check whether CMA is terminated and
kill the rest of the assigned budget in this case. Otherwise time comparison
with other algos not able to see if they are stuck is unfair. May be some
of them can, then this idea should be generalized.
General observations:
Benchmarks where the optimization algos are configured with workers=1 run fastest if
a) Benchmark multiprocessing is configured using workers=multiprocessing.cpu_count().
I checked this with a variety of CPUs from Intel + AMD.
b) The algos use only one thread, which means you have to set
export "MKL_NUM_THREADS=1" or export "OPENBLAS_NUM_THREADS=1" if numpy
is configured to use OPENBLAS. Without doing this CMA, which is heavily
based on BLAS, suffers badly - sometimes its factor 10 slower.
c) Use "export MKL_DEBUG_CPU_TYPE=5" if you are using an AMD CPU, otherwise
Intel MKL "ignores" the advanced SIMD instructions of your CPU.
popsize is an important CMA parameter which should be benchmarked using different settings.
Higher popsize means a broader search which can slow down the optimization for lower budgets
but often pays off when the budget is higher.
Scaling should not only be viewed as a scalar, often the dimensions should be
scaled separately. FCMA does this automatically if bounds are defined, (lb, ub) is
mapped on ([-1]*dim, [1]*dim). Nevergrad doesn't provide the optimization algorithms
with information about bounds, which may be a bad idea.
The CMA implementation https://github.com/dietmarwo/fast-cma-es offered
as an alternative is much faster with higher dimensions because it better utilizes
the underlying BLAS library and avoids loops when possible. There is no
"DiagonalOnly" option supported yet, but test have to show if this option makes
still sense.
With all the benchmarks / tests implemented in Nevergrad it should be possible to check:
results.
My tests have shown that the advantage of FCMA is higher if both CMAs are compared standalone
or with an efficient parallel retry mechanism. But nevertheless FCMA is significantly
faster, specialy with higher dimensions outperforming most of the other algorithms
regarding execution time.