Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add support for alternative cmaes implementation #546

Merged
merged 2 commits into from Mar 1, 2020

Conversation

dietmarwo
Copy link
Contributor

Types of changes

  • Docs change / refactoring / dependency upgrade
  • Bug fix (non-breaking change which fixes an issue)
  • [x ] New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Motivation and Context / Related issue

Bad performance of cmaes for higher dimensions

How Has This Been Tested (if it applies)

Tested by including the new optimization algorithm in several benchmarks and executed
them successfully

Checklist

  • The documentation is up-to-date with the changes I made.
  • [x ] I have read the CONTRIBUTING document and completed the CLA (see CONTRIBUTING).
  • All tests passed, and additional code has been covered with new tests.

Purpose of this PR is:

a) Enable experiments with an alternative CMA implementation -
install with "pip install fcmaes".

b) Exposing popsize as an important CMA parameter

c) Reading _es.stop() / es.stop to check whether CMA is terminated and
kill the rest of the assigned budget in this case. Otherwise time comparison
with other algos not able to see if they are stuck is unfair. May be some
of them can, then this idea should be generalized.

General observations:

  • Benchmarks where the optimization algos are configured with workers=1 run fastest if

    a) Benchmark multiprocessing is configured using workers=multiprocessing.cpu_count().
    I checked this with a variety of CPUs from Intel + AMD.

    b) The algos use only one thread, which means you have to set
    export "MKL_NUM_THREADS=1" or export "OPENBLAS_NUM_THREADS=1" if numpy
    is configured to use OPENBLAS. Without doing this CMA, which is heavily
    based on BLAS, suffers badly - sometimes its factor 10 slower.

    c) Use "export MKL_DEBUG_CPU_TYPE=5" if you are using an AMD CPU, otherwise
    Intel MKL "ignores" the advanced SIMD instructions of your CPU.

  • popsize is an important CMA parameter which should be benchmarked using different settings.
    Higher popsize means a broader search which can slow down the optimization for lower budgets
    but often pays off when the budget is higher.

  • Scaling should not only be viewed as a scalar, often the dimensions should be
    scaled separately. FCMA does this automatically if bounds are defined, (lb, ub) is
    mapped on ([-1]*dim, [1]*dim). Nevergrad doesn't provide the optimization algorithms
    with information about bounds, which may be a bad idea.

The CMA implementation https://github.com/dietmarwo/fast-cma-es offered
as an alternative is much faster with higher dimensions because it better utilizes
the underlying BLAS library and avoids loops when possible. There is no
"DiagonalOnly" option supported yet, but test have to show if this option makes
still sense.

With all the benchmarks / tests implemented in Nevergrad it should be possible to check:

  • If there are cases were FCMA performs worse than CMA.
  • How the execution times compare.
  • Which are the optimal parameters for FCMA
  • How to reconfigure NGO - if and when to use FCMA with which parameters - to achieve optimal
    results.

My tests have shown that the advantage of FCMA is higher if both CMAs are compared standalone
or with an efficient parallel retry mechanism. But nevertheless FCMA is significantly
faster, specialy with higher dimensions outperforming most of the other algorithms
regarding execution time.

@facebook-github-bot
Copy link

Hi @dietmarwo!

Thank you for your pull request and welcome to our community.We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file.

In order for us to review and merge your code, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

If you have received this in error or have any questions, please contact us at cla@fb.com. Thanks!

@facebook-github-bot facebook-github-bot added the CLA Signed Do not delete this pull request or issue due to inactivity. label Feb 26, 2020
@facebook-github-bot
Copy link

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks!

Copy link
Contributor

@jrapin jrapin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @dietmarwo
Thank you for your contribution!
I've added a few comments below before we can merge this ;)

@@ -256,6 +257,82 @@ def __init__(
CMA = ParametrizedCMA().set_name("CMA", register=True)
DiagonalCMA = ParametrizedCMA(diagonal=True).set_name("DiagonalCMA", register=True)

class _FCMA(base.Optimizer):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since most of the code is similar to CMA the code should be shared as much as possible. Avoid code duplication makes it way easier to iterate on the long run. Two ways of doing it here:

  • add an argument in the original CMA (fcmaes: bool = False), if some arguments are not supported in one of the other version (popsize? can it be added to the original CMA? and diagonal for FCMA?), raise an error/warning if it is set and shouldnt.
  • inherit the original CMA and override methods that are different. I try to If I'm correct that's only the es property.

I have a strong preference for the first option.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, will implement the first option.

parametrization: IntOrParameter,
budget: Optional[int] = None,
num_workers: int = 1,
scale: float = 1.0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in practice scale is used in some experiments but the parametrization is now supposed to hold one as well, so it's mostly a legacy argument, if you opt for a different class, I would advice removing it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scale is a very important parameter

Agreed, but it's just a scaling of the space, it's more general than just CMA, hence already handled in the parametrization which normalizes the space. So I think it's redundant, but we can always keep it anyway

Both the original CMA and FCMA support scale and popsize. Should we define both via parametrization?

Then let's make it possible to define both in the algorithm configuration (aka have them both in the init).

Copy link
Contributor Author

@dietmarwo dietmarwo Feb 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have to check how parametrization normalizes the space. Could be that 0.3 is a better default than 1.0, see "is expected to lie within about x0 +- 3*sigma0." here:

https://github.com/CMA-ES/pycma/blob/master/cma/evolution_strategy.py

sigma0
initial standard deviation. The problem variables should
have been scaled, such that a single standard deviation
on all variables is useful and the optimum is expected to
lie within about x0 +- 3*sigma0.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the very aim of the parametrization is to provide a bridge between the input space of the function (which may be discrete, continuous or a mixture of both, with different scaling etc) and a space which is centered around 0 and has standard deviation 1. So if the problem was correctly parametrized by the user, the solution is indeed supposed to lie in x0 +- 3*sigma0.
Then again, parametrization is harder and more important than most people think, so it's not always true, but the best guess you can make is that the vectors you need to handle in _internal_ask and _internal_tell are as close as you can know from a centered and reduced variable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overlooked the importance of the parametrization package. So in transforms.py you define transformations mapping input arguments to a standardized interval. Affine(Transform) rescales (and recenters) an input variable. This means that sigma0=1.0 is the "correct" default value both for the original cma and fcma

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the transformations do not happen anymore in transforms.py, but in data.py but yes you got it right ;) and sigma0=1.0 is a "best initial guess" (since I would expect ill-parametrization happens quite often, I'm trying hard to make it simpler and simpler though ;) )

@@ -7,6 +7,7 @@
from collections import deque
import warnings
import cma
from fcmaes import cmaes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This adds a dependency but has not been added to the requirements.
I'd try to avoid adding dependencies when I can, so, I would recommend making this step by step:
for now we would add fcmaes as a dependency in requirements/bench.txt so that it is always available at least in the benchmarks.

Then in order to avoiding breaking all the code when this is not installed, add the import inline:

@property
def es(self) -> tp.Any:  # typing not possible since not imported :(
try:
    from fcmaes import cmaes
except ImportError as e:
    raise ImportError("Please install fcmaes (pip install fcmaes) to use FCMA optimizers") from e
    if self._es is None:
        ...

That would make it a soft dependency for now, and eventually if it becomes more and more relevant compared to the current CMA, we could move it to a hard dependency.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds reasonable. Will do as you propose.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ci/circleci check still fails since the soft dependency is not fulfilled.
Requires the change in requirements/bench.txt. Shall I add this dependency as
part of the PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes please do. bench.txt requirements are "extra" requirements, installed if required by the users.



class ParametrizedFCMA(base.ConfiguredOptimizer):
"""FCMA-ES optimizer, wrapping external implementation: https://github.com/dietmarwo/fast-cma-es
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should add a few more lines about how different it is from the original implementation and in which case we should use your implementation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Short answer: If you don't use DiagonalOnly FCMA should be the better/faster option. At least this is what my experiments using your benchmarks suggest. But with both variants
it is really important to use "export MKL_NUM_THREADS=1" or "export OPENBLAS_NUM_THREADS=1" for OPENBLAS if num_workers > 1 for the benchmark.
When BLAS multithreading conflicts with multithreading at the benchmark level CMA can slow down up to factor 10 for higher dimensions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we need to add a context manager which sets this variables when calling CMA ? Inside ask and or tell in order to avoid border effects

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fcmaes provides a retry mechanism where I set these variables internally. Tried that with Nevergrad and it failed somehow, probably because they were not propagated to the workers. Will investigate this further.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate on the mechanism?

super().__init__(_FCMA, locals())

FCMA = ParametrizedFCMA().set_name("FCMA", register=True)
MiniFCMA = ParametrizedFCMA(scale=0.1, popsize=13).set_name("MiniFCMA", register=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These Mini variants are mostly used in benchmarks, if this is a variant you want to use in benchmarks, please move it to optimization/experimentalvariants.py (this is new from this week) instead.
In the future, we want to keep only the very main variants of the algorithms in optimizerlib (so FCMA but not MiniFCMA).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

popsize will not be in the parametrization, the parametrization only provides information on how the optimization space is structured, so I expect that popsize will always remain as a configuration to the optimizer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then we should also leave scaling as configuration to the optimizer. You wrote above
" hence already handled in the parametrization which normalizes the space."
which leaves scaling as an additional parameter not related to the optimization space.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cf discussion above about sigma? anyway I'd rather have both popsize and sigma for the time being.

@@ -456,6 +456,13 @@ def minimize(
if new_sugg:
sleeper.start_timer()
remaining_budget = self.budget - self.num_ask
#kills the budget if CMA-ES or FCMA-ES found an optimum
#should be replaced by a general concept checking whether an optimization algo signals it wants to stop
if hasattr(self, '_es') and (not self._es is None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, that something that we will need (and more generally also a way to specify a stopping criterion for the optimizer.
For now I'm not really fond of adding this specific code here, I'll give it a thought.
@teytaud any idea?

Copy link

@drdwo drdwo Feb 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What you really would like to do if you have lets say a budget of 100000 and CMA stops after 10000 evals is to have a number of retries. But there is a problem with that: you shouldn't use the same guess, and without bounds you don't know how to generate different guesses.
If we drop the code above we get wrong time measurements for CMA and FCMA, since
you measure the time for evals after termination of the optimization.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now we've not been investigating much in the actual time needed by the optimizers, since it's we usually assume that this time is negligeable in front of the function evaluation time.
I don't mean it does not make sense (especially given I feel BO optimizers are a pain to benchmark because they are so sloooooow), but we don't have a solution for this so far indeed :s

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw you have benchmarks with very high dimensionality (> 1000). Since covariance matrix adaption has quadratic complexity, the time needed by the optimizer could become relevant.
Beside that, there are interesting optimization problems like https://www.esa.int/gsp/ACT/projects/gtop/messenger_full/ where evaluation is fast. For this problem fcmaes is four times as fast (including function evaluation) than original cmaes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the code from the PR as requested.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, if you want to have some mechanism for CMA without without interfering with the base optimizer class, you could change the budget right it the ask or tell method or (F)CMA.
Then again, it's hacky and since there is no framework for this, it may have weird behavior in the benchmarks but we can always remove it if that fails. So far in the benchmarks we have only wanted to compare with the same budget, so some optimizers are asked to start all over again from a different starting point if they consider they have converged.

If you have ideas on how to deal with this for all optimizers correctly please open an issue to discuss it (if you dont have an idea but do want to see the feature, please open it anyway :D) but that may take some time to get it right for the whole package :s

nevergrad/optimization/optimizerlib.py Outdated Show resolved Hide resolved
@dietmarwo
Copy link
Contributor Author

dietmarwo commented Feb 27, 2020

Implemented the requested changes, updated recorded_recommendations.csv using pytest.
No errors at my site, but ci/circleci checks still fail because of the missing import -
which is a soft import now but still required for fcmaes. Now there is only one _CMA where fcmaes is
integrated. Rebased on master to sync the changes with the other new commits.

@jrapin
Copy link
Contributor

jrapin commented Feb 28, 2020

In summary, you can add fcmaes in bench.txt.
Also you may need to merge master (again?) since there seems to be some issues (the PR integrates other recent changes) which make it impossible to review :s

@drdwo
Copy link

drdwo commented Feb 28, 2020

After updating bench.test
now ci/circleci: pytests are green, ci/circleci: mypy are still red

I see github showing 24 files changed, but that was after I merged master.

In reality, if you check

master...dietmarwo:patch/master/AddFcmaes

you see the real difference to nevergrad/master which is as it should be (4 files). Do you
know how to force github to show the real diff here?

@jrapin
Copy link
Contributor

jrapin commented Feb 28, 2020

now ci/circleci: pytests are green, ci/circleci: mypy are still red

In https://github.com/facebookresearch/nevergrad/blob/master/mypy.ini
add fcma to [mypy-scipy.*,pandas,gym,matplotlib.*,pytest,cma,bayes_opt.*,torch.*,mpl_toolkits.*] so that it gets ignored

Do you know how to force github to show the real diff here?

Not really, never had this kind of problem. Maybe you need to update the master branch in your fork as well? It seems it is behind the main master branch, and the missing commits are the one mixed with your changes.

Copy link
Contributor

@jrapin jrapin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few details to change but it looks quite nice.
I'll need to see the actual exact diff to make sure at some point, try updating your fork's master, hopefully that will work :s

popsize = max(self.num_workers, 4 + int(3 * np.log(self.dimension))) if self._popsize is None else self._popsize
if self._fcmaes:
if self._diagonal:
raise RuntimeError("fcmaes doesn't support diagonal=True, use fcmaes=False")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check should be performed right in the init of ParametrizedCMA, this way we'd get the error way earlier.

nevergrad/optimization/optimizerlib.py Show resolved Hide resolved
nevergrad/optimization/optimizerlib.py Outdated Show resolved Hide resolved
@drdwo
Copy link

drdwo commented Feb 28, 2020

Will apply your suggestions.

Did you see https://github.community/t5/How-to-use-Git-and-GitHub/GitHub-pull-requests-showing-invalid-diff-for-already-merged/td-p/3000 ?

I think this applies here, since I merged from master. My forked master is up to date, this seems
to cause the issue.

May be the only option is to close this PR and I create a new one based on a new branch from current master, if comparing using master...dietmarwo:patch/master/AddFcmaes is not sufficient.

Github says:

"When you merge commits from a Pull Request into a branch, the commit SHA changes from what the original commit SHA's were.
"This means when you compare the same two branches again, the diff will show those commits again because those specific commit SHA's don't exist on the branch you're merging into.
This is a result of the type of diff we use on GitHub. We use git's three dot diff, which is the difference between the latest commit on the HEAD branch and the last common ancestor commit with the base branch."

@drdwo
Copy link

drdwo commented Feb 28, 2020

All green now and all your suggestions are in.

Regarding "I'll need to see the actual exact diff" :

master...dietmarwo:patch/master/AddFcmaes is the actual exact diff between your master and what is to be merged on my fork branch.
What github shows above as diff is wrong.

If you want a correct diff on the PR I fear I have to create a new one and we close this one.
Then I could branch from the actual master and everything is ok.

@jrapin
Copy link
Contributor

jrapin commented Feb 28, 2020

If you want a correct diff on the PR I fear I have to create a new one and we close this one.
Then I could branch from the actual master and everything is ok.

I'd rather have an actual diff. Maybe I'm over-careful but if github cant get the diff right, I'm afraid it may have the merge wrong as well and who knows what can happen then :D

I would try this in the following order:

  • updating your master branch, which is lagging behind
  • changing the PR branches back and forth
  • creating a new diff

When I can see the actual diff in the PR it should be good to go as far as I can see from the other diff.

@drdwo
Copy link

drdwo commented Feb 28, 2020

May be it wasn't clear:

I updated my master branch already yesterday, this is the reason you see these (your) master branch commits in the diff, since the reference point is the old branching point as the PR was created. Github now shows your commits as part of the PR. Which is wrong, since it
will merge only the real difference. I observed this behavoir often in the past.

What do you mean with "changing the PR branches back and forth"? What exactly
can I do?

master...dietmarwo:patch/master/AddFcmaes

compares

  • facebookresearch/nevergrad/master with
  • dietmarwo:patch/master/AddFcmaes
    which is the only diff relevant for the merge. If there really would be an issue it would not
    say "This branch has no conflicts with the base branch".

By the way, if really something unexpected happens, its very easy to revert the change:
https://help.github.com/en/github/collaborating-with-issues-and-pull-requests/reverting-a-pull-request

But you need to do a "squash and merge" - the squash part will combine all commits into one.
If you dont want to try this, the only option I see is that I create a new pull request from a branch
freshly branched from the actual master. No problem for me. Would be much easier if github would allow me to switch to this new branch in this PR.

@drdwo
Copy link

drdwo commented Feb 29, 2020

Squashed all commits on my side, merged again with master - there was a new conflict - and now it shows the correct changes. Didn't know that squashing at my side has this effect. Nevertheless,
I hope all issues are resolved now.

Copy link
Contributor

@jrapin jrapin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@jrapin jrapin merged commit a718b05 into facebookresearch:master Mar 1, 2020
@jrapin
Copy link
Contributor

jrapin commented Mar 1, 2020

Thanks for contributing ;)

@dietmarwo
Copy link
Contributor Author

dietmarwo commented Mar 2, 2020

Did some experiments what setting is optimal for NGO / Shiva.
Replacing CMA by
FCMAp31 = ParametrizedCMA(fcmaes=True, scale=0.3, popsize=31).set_name("FCMAp31", register=True)
there looks promising. For Oneshot4 the overall Shiva / NGO percentage
(fight_all.png) raises about 10%. Its interesting that scale=0.3 is better, since parametrization
already scales the input. And it confirmes that experimentation with popsize is a good idea.
By the way: Scale 0.3 and fixed popsize 31 - see https://github.com/dietmarwo/fast-cma-es/blob/master/fcmaes/cmaes.py - are the default values for fcmaes.

@jrapin
Copy link
Contributor

jrapin commented Mar 2, 2020

For now I think NGO/Shiva is not very good for small budget, which may explain the interest of the popsize (pop sizes have not been tuned so far, and they should most definitely be).
Concerning the scaling, the experiments greatly outdate the parametrization system, so I would expect they are not very well parametrized, this is a WIP :D
@teytaud some updates are needed for NGO?

@teytaud
Copy link
Contributor

teytaud commented Dec 31, 2020

Just mentioning that I launch a vast campaign of experiments for comparing CMA and FCMAES.
In particular, we have a problem with some cases in which CMA is too slow (I mean, the internal computational cost is too big) -- let us see if FCMA solves this.

@dietmarwo
Copy link
Contributor Author

dietmarwo commented Dec 31, 2020 via email

@teytaud
Copy link
Contributor

teytaud commented Dec 31, 2020

We prepare a vast comparison between cma, bayesian optimization and many others, you are already a significant contributor to this forthcoming work :-) you can contact me at:

Whatsapp +44 7540 143007.
Messenger under name "Olivier Teytaud"
Email oteytaud@fb.withoutspamstuff.com

@dietmarwo
Copy link
Contributor Author

dietmarwo commented Jan 13, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed Do not delete this pull request or issue due to inactivity.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants