Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance issue in optimization.utils.Pruning.__call__ #1013

Closed
dietmarwo opened this issue Jan 18, 2021 · 11 comments
Closed

Performance issue in optimization.utils.Pruning.__call__ #1013

dietmarwo opened this issue Jan 18, 2021 · 11 comments

Comments

@dietmarwo
Copy link
Contributor

dietmarwo commented Jan 18, 2021

Steps to reproduce

Benchmark used:

@registry.register
def illcond_prof(seed: tp.Optional[int] = None) -> tp.Iterator[Experiment]:
    """All optimizers on ill cond problems"""
    seedg = create_seed_generator(seed)
    for budget in [200, 30000]:
        for optim in ["CMA"]:            
            for rotation in [True]:
                for name in ["cigar"]:
                    function = ArtificialFunction(name=name, rotation=rotation, block_dimension=50)
                    yield Experiment(function, optim, budget=budget, seed=next(seedg))
  1. Execute the profiler code below

  2. Modify

    new_archive.bytesdict = {
    b: v
    for b, v in archive.bytesdict.items()
    if any(v.get_estimation(n) <= quantiles[n] for n in names)
    }
    as follows:

new_archive.bytesdict = {}
# new_archive.bytesdict = { 
#     b: v 
#     for b, v in archive.bytesdict.items() 
#     if any(v.get_estimation(n) <= quantiles[n] for n in names) 
  1. Execute the profiler again and compare the results

Observed Results

There is a severe performance issue related to new_archive.bytesdict.
If you increase the second budget in
'for budget in [200, 30000]:' the execution time unrelated to the optimizer grows further.

Expected Results

  • Expected is that execution time is dominated by the cost to execute the optimizer.

Profiler Code

import cProfile
import io
import pstats
import nevergrad.benchmark.__main__ as main

if __name__ == '__main__':
    pr = cProfile.Profile()
    pr.enable()
    main.repeated_launch('illcond_prof',num_workers=1)
    pr.disable()
    s = io.StringIO()
    sortby = pstats.SortKey.CUMULATIVE
    ps = pstats.Stats(pr, stream=s).sort_stats(sortby)
    ps.print_stats()
    print(s.getvalue())
@jrapin
Copy link
Contributor

jrapin commented Jan 18, 2021

Thanks for this check!

A bit of context:

  • the archive is used to keep track of past trials, some algorithms need it (albeit, few of them).
  • it could grow indefinitely (including too much for the memory, that happens for large problems), hence the pruning every now and then.
  • regular pruning is supposed to prevent the pruning time to grow, and should therefore be negligeable. A year ago we decreased it further for this very reason (Make pruning much much less conservative #571) Looks like it's not enough, so something is very wrong indeed.
  • it'd be in favor of deleting the archive altogether, and use optimizer-specific code for those that required it. However, we may still need something of the sort, eg: Avoid two candidates with same value. #1001 which I am delaying for the very fear that it may slow the computation or create memory errors :s

So right now, not sure what happens, but this is indeed critical :s

@dietmarwo
Copy link
Contributor Author

dietmarwo commented Jan 18, 2021

If someone is blocked by the issue, as a workaround
you may just comment out the new_archive.bytesdict code as described above.
Testing with very large dimension (>= 4500) revealed there may be other issues.
I need to do further testing to make sure it is not a garbage collection issue.

You could think about adding the profiling code to the repo. It has to be adapted if it should do the same command line parsing as __main__.py . Here is the profiler output showing the issue:

  ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000  183.685  183.685 /home/xxx/test/nevergrad/nevergrad/benchmark/__main__.py:99(repeated_launch)
        1    0.000    0.000  183.685  183.685 /home/xxx/test/nevergrad/nevergrad/benchmark/__main__.py:18(launch)
        1    0.000    0.000  183.673  183.673 /home/xxx/test/nevergrad/nevergrad/benchmark/core.py:265(compute)
        1    0.000    0.000  183.668  183.668 /home/xxx/test/nevergrad/nevergrad/benchmark/core.py:298(<listcomp>)
        1    0.009    0.009  183.668  183.668 /home/xxx/test/nevergrad/nevergrad/optimization/utils.py:136(result)
        1    0.000    0.000  183.659  183.659 /home/xxx/test/nevergrad/nevergrad/benchmark/core.py:182(compute)
        2    0.000    0.000  183.654   91.827 /home/xxx/test/nevergrad/nevergrad/benchmark/xpbase.py:181(run)
        2    0.000    0.000  183.654   91.827 /home/xxx/test/nevergrad/nevergrad/benchmark/xpbase.py:222(_run_with_error)
        2    0.243    0.121  183.653   91.827 /home/xxx/test/nevergrad/nevergrad/optimization/base.py:520(minimize)
    30200    0.171    0.000  177.597    0.006 /home/xxx/test/nevergrad/nevergrad/optimization/base.py:290(tell)
    30200    0.370    0.000  174.678    0.006 /home/xxx/test/nevergrad/nevergrad/optimization/base.py:376(_update_archive_and_bests)
    30200    0.354    0.000  173.369    0.006 /home/xxx/test/nevergrad/nevergrad/optimization/utils.py:258(__call__)
102953400   19.085    0.000  128.725    0.000 /home/xxx/test/nevergrad/nevergrad/optimization/utils.py:59(get_estimation)
    18771   11.572    0.001   97.506    0.005 /home/xxx/test/nevergrad/nevergrad/optimization/utils.py:266(<listcomp>)
 51393530   73.517    0.000   73.517    0.000 /home/xxx/test/nevergrad/nevergrad/optimization/utils.py:51(optimistic_confidence_bound)
     6257   13.702    0.002   71.771    0.011 /home/xxx/test/nevergrad/nevergrad/optimization/utils.py:284(<dictcomp>)
 25679201    3.757    0.000   55.295    0.000 {built-in method builtins.any}
 51378670   11.812    0.000   54.336    0.000 /home/xxx/test/nevergrad/nevergrad/optimization/utils.py:287(<genexpr>)
 25749735   36.123    0.000   36.123    0.000 /home/xxx/test/nevergrad/nevergrad/optimization/utils.py:55(pessimistic_confidence_bound)

the workaround achieves a huge improvement:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000   10.132   10.132 /home/xxx/test/nevergrad/nevergrad/benchmark/__main__.py:99(repeated_launch)
        1    0.000    0.000   10.132   10.132 /home/xxx/test/nevergrad/nevergrad/benchmark/__main__.py:18(launch)
        1    0.000    0.000   10.130   10.130 /home/xxx/test/nevergrad/nevergrad/benchmark/core.py:265(compute)
        1    0.000    0.000   10.125   10.125 /home/xxx/test/nevergrad/nevergrad/benchmark/core.py:298(<listcomp>)
        1    0.000    0.000   10.125   10.125 /home/xxx/test/nevergrad/nevergrad/optimization/utils.py:136(result)
        1    0.000    0.000   10.125   10.125 /home/xxx/test/nevergrad/nevergrad/benchmark/core.py:182(compute)
        2    0.000    0.000   10.121    5.060 /home/xxx/test/nevergrad/nevergrad/benchmark/xpbase.py:181(run)
        2    0.000    0.000   10.121    5.060 /home/xxx/test/nevergrad/nevergrad/benchmark/xpbase.py:222(_run_with_error)
        2    0.237    0.118   10.119    5.060 /home/xxx/test/nevergrad/nevergrad/optimization/base.py:520(minimize)
    30200    0.097    0.000    4.275    0.000 /home/xxx/test/nevergrad/nevergrad/optimization/base.py:417(ask)
    30200    0.169    0.000    4.208    0.000 /home/xxx/test/nevergrad/nevergrad/optimization/base.py:290(tell)
    30200    0.065    0.000    3.938    0.000 /home/xxx/test/nevergrad/nevergrad/optimization/optimizerlib.py:420(_internal_ask_candidate)
    30200    0.046    0.000    2.280    0.000 /home/xxx/test/nevergrad/nevergrad/optimization/optimizerlib.py:428(_internal_tell_candidate)

@jrapin
Copy link
Contributor

jrapin commented Jan 18, 2021

I think #1014 may solve it
This is something I have been fearing but could not find for a long while, thanks for providing an actual simple case pointing it out!
The issue only arises at the very end of convergence when values reach machine precision. At this time, the pruning kept all values because all values were equal, so it was called on and on again. Since it is not efficient at all (it is not designed to be called often), it triggers a huge slow down.

@jrapin
Copy link
Contributor

jrapin commented Jan 18, 2021

Testing with very large dimension (>= 4500) revealed there may be other issues.

@dietmarwo what do you mean by this?

@jrapin
Copy link
Contributor

jrapin commented Jan 19, 2021

I've cut a release including the fix, let me know if it works for you

@dietmarwo
Copy link
Contributor Author

dietmarwo commented Jan 21, 2021 via email

@dietmarwo
Copy link
Contributor Author

dietmarwo commented Jan 21, 2021 via email

@dietmarwo
Copy link
Contributor Author

dietmarwo commented Jan 21, 2021 via email

@jrapin
Copy link
Contributor

jrapin commented Jan 22, 2021

Great to read this! Thanks again for highlighting the issue, that had really been a concern for me for months but I could not identify any actual case that would have such a slowdown.

I'll close the issue since this is solved.
/usr/bin/nvidia-modprobe: unrecognized option: "-s" seems unrelated indeed, there is no GPU code in the optimization subpackage, it may be related to one of the testbeds, but not sure where since I have never seen this so far. Open a new issue if you think that needs investigation ;)

@jrapin jrapin closed this as completed Jan 22, 2021
@dietmarwo
Copy link
Contributor Author

dietmarwo commented Jan 22, 2021 via email

@jrapin
Copy link
Contributor

jrapin commented Jan 22, 2021

Don't underestimate the possible performance gain of the new Python 3.8
feature

Seems interesting, where would you use it in nevergrad though? The optimizer in itself is not supposed to require sharing anything between multiple processes (except the scipy based optimizers, which are a bit special... and a pain :s)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants