Performance issue in optimization.utils.Pruning.call #1013

dietmarwo · 2021-01-18T11:27:33Z

Steps to reproduce

Benchmark used:

@registry.register
def illcond_prof(seed: tp.Optional[int] = None) -> tp.Iterator[Experiment]:
    """All optimizers on ill cond problems"""
    seedg = create_seed_generator(seed)
    for budget in [200, 30000]:
        for optim in ["CMA"]:            
            for rotation in [True]:
                for name in ["cigar"]:
                    function = ArtificialFunction(name=name, rotation=rotation, block_dimension=50)
                    yield Experiment(function, optim, budget=budget, seed=next(seedg))

Execute the profiler code below

Modify

nevergrad/nevergrad/optimization/utils.py

Lines 269 to 273 in 294aed2

    
           new_archive.bytesdict = { 
        
               b: v 
        
               for b, v in archive.bytesdict.items() 
        
               if any(v.get_estimation(n) <= quantiles[n] for n in names) 
        
           }

as follows:

new_archive.bytesdict = {}
# new_archive.bytesdict = { 
#     b: v 
#     for b, v in archive.bytesdict.items() 
#     if any(v.get_estimation(n) <= quantiles[n] for n in names)

Execute the profiler again and compare the results

Observed Results

There is a severe performance issue related to new_archive.bytesdict.
If you increase the second budget in
'for budget in [200, 30000]:' the execution time unrelated to the optimizer grows further.

Expected Results

Expected is that execution time is dominated by the cost to execute the optimizer.

Profiler Code

import cProfile
import io
import pstats
import nevergrad.benchmark.__main__ as main

if __name__ == '__main__':
    pr = cProfile.Profile()
    pr.enable()
    main.repeated_launch('illcond_prof',num_workers=1)
    pr.disable()
    s = io.StringIO()
    sortby = pstats.SortKey.CUMULATIVE
    ps = pstats.Stats(pr, stream=s).sort_stats(sortby)
    ps.print_stats()
    print(s.getvalue())

The text was updated successfully, but these errors were encountered:

jrapin · 2021-01-18T11:57:32Z

Thanks for this check!

A bit of context:

the archive is used to keep track of past trials, some algorithms need it (albeit, few of them).
it could grow indefinitely (including too much for the memory, that happens for large problems), hence the pruning every now and then.
regular pruning is supposed to prevent the pruning time to grow, and should therefore be negligeable. A year ago we decreased it further for this very reason (Make pruning much much less conservative #571) Looks like it's not enough, so something is very wrong indeed.
it'd be in favor of deleting the archive altogether, and use optimizer-specific code for those that required it. However, we may still need something of the sort, eg: Avoid two candidates with same value. #1001 which I am delaying for the very fear that it may slow the computation or create memory errors :s

So right now, not sure what happens, but this is indeed critical :s

dietmarwo · 2021-01-18T13:22:22Z

If someone is blocked by the issue, as a workaround
you may just comment out the new_archive.bytesdict code as described above.
Testing with very large dimension (>= 4500) revealed there may be other issues.
I need to do further testing to make sure it is not a garbage collection issue.

You could think about adding the profiling code to the repo. It has to be adapted if it should do the same command line parsing as __main__.py . Here is the profiler output showing the issue:

  ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000  183.685  183.685 /home/xxx/test/nevergrad/nevergrad/benchmark/__main__.py:99(repeated_launch)
        1    0.000    0.000  183.685  183.685 /home/xxx/test/nevergrad/nevergrad/benchmark/__main__.py:18(launch)
        1    0.000    0.000  183.673  183.673 /home/xxx/test/nevergrad/nevergrad/benchmark/core.py:265(compute)
        1    0.000    0.000  183.668  183.668 /home/xxx/test/nevergrad/nevergrad/benchmark/core.py:298(<listcomp>)
        1    0.009    0.009  183.668  183.668 /home/xxx/test/nevergrad/nevergrad/optimization/utils.py:136(result)
        1    0.000    0.000  183.659  183.659 /home/xxx/test/nevergrad/nevergrad/benchmark/core.py:182(compute)
        2    0.000    0.000  183.654   91.827 /home/xxx/test/nevergrad/nevergrad/benchmark/xpbase.py:181(run)
        2    0.000    0.000  183.654   91.827 /home/xxx/test/nevergrad/nevergrad/benchmark/xpbase.py:222(_run_with_error)
        2    0.243    0.121  183.653   91.827 /home/xxx/test/nevergrad/nevergrad/optimization/base.py:520(minimize)
    30200    0.171    0.000  177.597    0.006 /home/xxx/test/nevergrad/nevergrad/optimization/base.py:290(tell)
    30200    0.370    0.000  174.678    0.006 /home/xxx/test/nevergrad/nevergrad/optimization/base.py:376(_update_archive_and_bests)
    30200    0.354    0.000  173.369    0.006 /home/xxx/test/nevergrad/nevergrad/optimization/utils.py:258(__call__)
102953400   19.085    0.000  128.725    0.000 /home/xxx/test/nevergrad/nevergrad/optimization/utils.py:59(get_estimation)
    18771   11.572    0.001   97.506    0.005 /home/xxx/test/nevergrad/nevergrad/optimization/utils.py:266(<listcomp>)
 51393530   73.517    0.000   73.517    0.000 /home/xxx/test/nevergrad/nevergrad/optimization/utils.py:51(optimistic_confidence_bound)
     6257   13.702    0.002   71.771    0.011 /home/xxx/test/nevergrad/nevergrad/optimization/utils.py:284(<dictcomp>)
 25679201    3.757    0.000   55.295    0.000 {built-in method builtins.any}
 51378670   11.812    0.000   54.336    0.000 /home/xxx/test/nevergrad/nevergrad/optimization/utils.py:287(<genexpr>)
 25749735   36.123    0.000   36.123    0.000 /home/xxx/test/nevergrad/nevergrad/optimization/utils.py:55(pessimistic_confidence_bound)

the workaround achieves a huge improvement:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000   10.132   10.132 /home/xxx/test/nevergrad/nevergrad/benchmark/__main__.py:99(repeated_launch)
        1    0.000    0.000   10.132   10.132 /home/xxx/test/nevergrad/nevergrad/benchmark/__main__.py:18(launch)
        1    0.000    0.000   10.130   10.130 /home/xxx/test/nevergrad/nevergrad/benchmark/core.py:265(compute)
        1    0.000    0.000   10.125   10.125 /home/xxx/test/nevergrad/nevergrad/benchmark/core.py:298(<listcomp>)
        1    0.000    0.000   10.125   10.125 /home/xxx/test/nevergrad/nevergrad/optimization/utils.py:136(result)
        1    0.000    0.000   10.125   10.125 /home/xxx/test/nevergrad/nevergrad/benchmark/core.py:182(compute)
        2    0.000    0.000   10.121    5.060 /home/xxx/test/nevergrad/nevergrad/benchmark/xpbase.py:181(run)
        2    0.000    0.000   10.121    5.060 /home/xxx/test/nevergrad/nevergrad/benchmark/xpbase.py:222(_run_with_error)
        2    0.237    0.118   10.119    5.060 /home/xxx/test/nevergrad/nevergrad/optimization/base.py:520(minimize)
    30200    0.097    0.000    4.275    0.000 /home/xxx/test/nevergrad/nevergrad/optimization/base.py:417(ask)
    30200    0.169    0.000    4.208    0.000 /home/xxx/test/nevergrad/nevergrad/optimization/base.py:290(tell)
    30200    0.065    0.000    3.938    0.000 /home/xxx/test/nevergrad/nevergrad/optimization/optimizerlib.py:420(_internal_ask_candidate)
    30200    0.046    0.000    2.280    0.000 /home/xxx/test/nevergrad/nevergrad/optimization/optimizerlib.py:428(_internal_tell_candidate)

jrapin · 2021-01-18T17:02:46Z

I think #1014 may solve it
This is something I have been fearing but could not find for a long while, thanks for providing an actual simple case pointing it out!
The issue only arises at the very end of convergence when values reach machine precision. At this time, the pruning kept all values because all values were equal, so it was called on and on again. Since it is not efficient at all (it is not designed to be called often), it triggers a huge slow down.

jrapin · 2021-01-18T17:29:21Z

Testing with very large dimension (>= 4500) revealed there may be other issues.

@dietmarwo what do you mean by this?

jrapin · 2021-01-19T09:35:23Z

I've cut a release including the fix, let me know if it works for you

dietmarwo · 2021-01-21T14:03:39Z

Yes, it works. The scenario which is described above works now without any workaround. Get an /usr/bin/nvidia-modprobe: unrecognized option: "-s" message but this is unrelated and seems not a problem. Am Di., 19. Jan. 2021 um 10:35 Uhr schrieb Jérémy Rapin < notifications@github.com>:

…

I've cut a release including the fix, let me know if it works for you — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1013 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AOIZMKTFVTLCAM33C2JIWPDS2VG6XANCNFSM4WHFAR4A> .

dietmarwo · 2021-01-21T14:06:00Z

what do you mean by this?

Initially I got the issue with tests with huge dimension settings. Then I tried to find a scenario easier to reproduce. I suspect with your solution my initial issue is also gone, but this needs more tests. Am Mo., 18. Jan. 2021 um 18:29 Uhr schrieb Jérémy Rapin < notifications@github.com>:

…

Testing with very large dimension (>= 4500) revealed there may be other issues. @dietmarwo <https://github.com/dietmarwo> what do you mean by this? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1013 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AOIZMKUOHVOJDFYUOWMB3W3S2RVYFANCNFSM4WHFAR4A> .

dietmarwo · 2021-01-21T20:48:04Z

"Testing with very large dimension (>= 4500) revealed there may be other issues."

@dietmarwo <https://github.com/dietmarwo> what do you mean by this?

Actually with the new version testing high dimensional problems (dim > 4500) works fine. Am Mo., 18. Jan. 2021 um 18:29 Uhr schrieb Jérémy Rapin < notifications@github.com>:

…

Testing with very large dimension (>= 4500) revealed there may be other issues. @dietmarwo <https://github.com/dietmarwo> what do you mean by this? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1013 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AOIZMKUOHVOJDFYUOWMB3W3S2RVYFANCNFSM4WHFAR4A> .

jrapin · 2021-01-22T08:05:38Z

Great to read this! Thanks again for highlighting the issue, that had really been a concern for me for months but I could not identify any actual case that would have such a slowdown.

I'll close the issue since this is solved.
/usr/bin/nvidia-modprobe: unrecognized option: "-s" seems unrelated indeed, there is no GPU code in the optimization subpackage, it may be related to one of the testbeds, but not sure where since I have never seen this so far. Open a new issue if you think that needs investigation ;)

dietmarwo · 2021-01-22T10:31:05Z

"/usr/bin/nvidia-modprobe: unrecognized option: "-s"" Is related to an outdated nvidia cuda installation. On another machine with actual cuda everything is fine. One general remark: Don't underestimate the possible performance gain of the new Python 3.8 feature https://docs.python.org/3/library/multiprocessing.shared_memory.html. FCMA has its own parallel optimization feature https://github.com/dietmarwo/fast-cma-es/blob/master/fcmaes/retry.py based on shared memory (an older variant). Compared to nevergrad at least twice as much optimizations can be performed using the same time on a modern 16 core machine. It is quite easy to compare since nevergrad and fcmaes both support the FCMA optimizer. Am Fr., 22. Jan. 2021 um 09:05 Uhr schrieb Jérémy Rapin < notifications@github.com>:

…

Closed #1013 <#1013>. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1013 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AOIZMKTJLAB3PA2NVODGOK3S3EWWDANCNFSM4WHFAR4A> .

jrapin · 2021-01-22T13:02:37Z

Don't underestimate the possible performance gain of the new Python 3.8
feature

Seems interesting, where would you use it in nevergrad though? The optimizer in itself is not supposed to require sharing anything between multiple processes (except the scipy based optimizers, which are a bit special... and a pain :s)

jrapin mentioned this issue Jan 18, 2021

Make sure pruning does prune #1014

Merged

7 tasks

jrapin closed this as completed Jan 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance issue in optimization.utils.Pruning.call #1013

Performance issue in optimization.utils.Pruning.call #1013

dietmarwo commented Jan 18, 2021 •

edited

jrapin commented Jan 18, 2021 •

edited

dietmarwo commented Jan 18, 2021 •

edited

jrapin commented Jan 18, 2021

jrapin commented Jan 18, 2021

jrapin commented Jan 19, 2021

dietmarwo commented Jan 21, 2021 via email

dietmarwo commented Jan 21, 2021 via email

dietmarwo commented Jan 21, 2021 via email

jrapin commented Jan 22, 2021

dietmarwo commented Jan 22, 2021 via email

jrapin commented Jan 22, 2021

Performance issue in optimization.utils.Pruning.__call__ #1013

Performance issue in optimization.utils.Pruning.__call__ #1013

Comments

dietmarwo commented Jan 18, 2021 • edited

Steps to reproduce

Observed Results

Expected Results

Profiler Code

jrapin commented Jan 18, 2021 • edited

dietmarwo commented Jan 18, 2021 • edited

jrapin commented Jan 18, 2021

jrapin commented Jan 18, 2021

jrapin commented Jan 19, 2021

dietmarwo commented Jan 21, 2021 via email

dietmarwo commented Jan 21, 2021 via email

dietmarwo commented Jan 21, 2021 via email

jrapin commented Jan 22, 2021

dietmarwo commented Jan 22, 2021 via email

jrapin commented Jan 22, 2021

Performance issue in optimization.utils.Pruning.call #1013

Performance issue in optimization.utils.Pruning.call #1013

dietmarwo commented Jan 18, 2021 •

edited

jrapin commented Jan 18, 2021 •

edited

dietmarwo commented Jan 18, 2021 •

edited