Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow disabling lookup of missing prop names #4100

Open
bmaranville opened this issue Mar 13, 2023 · 2 comments
Open

Allow disabling lookup of missing prop names #4100

bmaranville opened this issue Mar 13, 2023 · 2 comments
Labels
performance Issues related to performance on creating figures

Comments

@bmaranville
Copy link
Contributor

When running a large update on a plot, sometimes the time is dominated by the lookup for missing names, even when this lookup serves no purpose, such as in plotly.subplots.make_subplots where new axes are being added by the plotly library itself, and finding "nearby" names is not helpful when all that is needed is a check if the name already exists.

In plotly.basedatatypes.BaseDatatype._perform_update, for each key to be updated it calls _check_path_in_prop_tree, but then ignores the error that is generated if the plotly_obj is a BaseLayoutType and the key matches one of the _subplot_re_match keywords, and just adds the key to the plotly_obj instead.

This call to _check_path_in_prop_tree is expensive when

  • there are a large number of defined keys in the layout (such as when there are a large number of subplots being initialized) and
  • the path is not yet in the prop tree

When those conditions are met, the prop lookup check_path_in_prop_tree calls BasePlotlyType.__getitem__, which then calls BasePlotlyType._raise_on_invalid_property_error, which then calls _plotly_utils.utils.find_closest_string for each new property being added. find_closest_string uses a levenshtein lookup and this takes geometrically more time the more keys there are to compare, which is the root of the problem.

If the lookup could be disabled in _raise_on_invalid_property_error, that would help, but there is a problem in that _check_path_in_prop_tree is using an implicit __getitem__ on the plotly_obj, here

The issue can be resolved by disabling the lookup completely, as shown in the second example notebook below, where _plotly_utils.utils is monkey-patched so that find_closest_string always raises an error, but this doesn't seem like the best way to handle it.

Example one: without disabling lookup

import plotly.subplots
import cProfile
import pstats
from pstats import SortKey

cProfile.run('fig = plotly.subplots.make_subplots(rows=20, cols=20)', 'restats')
p = pstats.Stats('restats')
p.sort_stats(SortKey.CUMULATIVE).print_stats(20)
Mon Mar 13 15:24:56 2023    restats

         55179619 function calls (54968062 primitive calls) in 18.368 seconds

   Ordered by: cumulative time
   List reduced from 868 to 20 due to restriction <20>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    150/1    0.000    0.000   18.368   18.368 {built-in method builtins.exec}
        1    0.000    0.000   18.368   18.368 <string>:1(<module>)
        1    0.000    0.000   18.368   18.368 /home/bbm/miniforge3/envs/refl1d_py310/lib/python3.10/site-packages/plotly/subplots.py:7(make_subplots)
        1    0.002    0.002   18.368   18.368 /home/bbm/miniforge3/envs/refl1d_py310/lib/python3.10/site-packages/plotly/_subplots.py:45(make_subplots)
        1    0.000    0.000   18.043   18.043 /home/bbm/miniforge3/envs/refl1d_py310/lib/python3.10/site-packages/plotly/graph_objs/_figure.py:736(update_layout)
        1    0.000    0.000   18.043   18.043 /home/bbm/miniforge3/envs/refl1d_py310/lib/python3.10/site-packages/plotly/basedatatypes.py:1378(update_layout)
        1    0.000    0.000   18.043   18.043 /home/bbm/miniforge3/envs/refl1d_py310/lib/python3.10/site-packages/plotly/basedatatypes.py:5096(update)
    802/2    0.010    0.000   17.536    8.768 /home/bbm/miniforge3/envs/refl1d_py310/lib/python3.10/site-packages/plotly/basedatatypes.py:3841(_perform_update)
19302/14494    0.065    0.000   17.219    0.001 /home/bbm/miniforge3/envs/refl1d_py310/lib/python3.10/site-packages/plotly/basedatatypes.py:4659(__getitem__)
4893/4891    0.018    0.000   17.209    0.004 /home/bbm/miniforge3/envs/refl1d_py310/lib/python3.10/site-packages/plotly/basedatatypes.py:159(_check_path_in_prop_tree)
14498/14494    0.015    0.000   17.096    0.001 /home/bbm/miniforge3/envs/refl1d_py310/lib/python3.10/site-packages/plotly/basedatatypes.py:5828(__getitem__)
      798    0.005    0.000   16.890    0.021 /home/bbm/miniforge3/envs/refl1d_py310/lib/python3.10/site-packages/plotly/basedatatypes.py:5047(_ret)
      798    0.001    0.000   16.879    0.021 /home/bbm/miniforge3/envs/refl1d_py310/lib/python3.10/site-packages/_plotly_utils/utils.py:449(find_closest_string)
     1600    0.176    0.000   16.878    0.011 {built-in method builtins.sorted}
   390621    0.099    0.000   16.702    0.000 /home/bbm/miniforge3/envs/refl1d_py310/lib/python3.10/site-packages/_plotly_utils/utils.py:450(_key)
470776/390621   11.539    0.000   16.603    0.000 /home/bbm/miniforge3/envs/refl1d_py310/lib/python3.10/site-packages/_plotly_utils/utils.py:430(levenshtein)
 24961225    3.540    0.000    3.540    0.000 {built-in method builtins.min}
 24971843    1.419    0.000    1.419    0.000 {method 'append' of 'list' objects}
        1    0.000    0.000    0.507    0.507 /home/bbm/miniforge3/envs/refl1d_py310/lib/python3.10/contextlib.py:139(__exit__)
        2    0.000    0.000    0.507    0.253 {built-in method builtins.next}


<pstats.Stats at 0x7faae67616f0>

Example two: disabling lookup

## Patch the library to disable levenshtein lookup for missing strings.
def disable_find_closest_string(string, strings):
    raise ValueError()

import _plotly_utils.utils
_plotly_utils.utils.find_closest_string = disable_find_closest_string
import plotly.subplots
import cProfile
import pstats
from pstats import SortKey

cProfile.run('fig = plotly.subplots.make_subplots(rows=20, cols=20)', 'restats')
p = pstats.Stats('restats')
p.sort_stats(SortKey.CUMULATIVE).print_stats(20)
Mon Mar 13 15:25:05 2023    restats

         2674042 function calls (2542640 primitive calls) in 1.498 seconds

   Ordered by: cumulative time
   List reduced from 866 to 20 due to restriction <20>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    150/1    0.000    0.000    1.498    1.498 {built-in method builtins.exec}
        1    0.000    0.000    1.498    1.498 <string>:1(<module>)
        1    0.000    0.000    1.498    1.498 /home/bbm/miniforge3/envs/refl1d_py310/lib/python3.10/site-packages/plotly/subplots.py:7(make_subplots)
        1    0.002    0.002    1.497    1.497 /home/bbm/miniforge3/envs/refl1d_py310/lib/python3.10/site-packages/plotly/_subplots.py:45(make_subplots)
        1    0.000    0.000    1.170    1.170 /home/bbm/miniforge3/envs/refl1d_py310/lib/python3.10/site-packages/plotly/graph_objs/_figure.py:736(update_layout)
        1    0.000    0.000    1.170    1.170 /home/bbm/miniforge3/envs/refl1d_py310/lib/python3.10/site-packages/plotly/basedatatypes.py:1378(update_layout)
        1    0.000    0.000    1.170    1.170 /home/bbm/miniforge3/envs/refl1d_py310/lib/python3.10/site-packages/plotly/basedatatypes.py:5096(update)
    802/2    0.009    0.000    0.626    0.313 /home/bbm/miniforge3/envs/refl1d_py310/lib/python3.10/site-packages/plotly/basedatatypes.py:3841(_perform_update)
        1    0.000    0.000    0.544    0.544 /home/bbm/miniforge3/envs/refl1d_py310/lib/python3.10/contextlib.py:139(__exit__)
        2    0.000    0.000    0.544    0.272 {built-in method builtins.next}
        2    0.000    0.000    0.544    0.272 /home/bbm/miniforge3/envs/refl1d_py310/lib/python3.10/site-packages/plotly/basedatatypes.py:2995(batch_update)
        1    0.000    0.000    0.544    0.544 /home/bbm/miniforge3/envs/refl1d_py310/lib/python3.10/site-packages/plotly/basedatatypes.py:2860(plotly_update)
     9697    0.072    0.000    0.443    0.000 /home/bbm/miniforge3/envs/refl1d_py310/lib/python3.10/site-packages/plotly/basedatatypes.py:51(_str_to_dict_path_full)
        1    0.000    0.000    0.417    0.417 /home/bbm/miniforge3/envs/refl1d_py310/lib/python3.10/site-packages/plotly/basedatatypes.py:2934(_perform_plotly_update)
        1    0.004    0.004    0.417    0.417 /home/bbm/miniforge3/envs/refl1d_py310/lib/python3.10/site-packages/plotly/basedatatypes.py:2611(_perform_plotly_relayout)
     1600    0.003    0.000    0.371    0.000 /home/bbm/miniforge3/envs/refl1d_py310/lib/python3.10/site-packages/plotly/basedatatypes.py:5842(__setitem__)
     1596    0.004    0.000    0.355    0.000 /home/bbm/miniforge3/envs/refl1d_py310/lib/python3.10/site-packages/plotly/basedatatypes.py:5726(_set_subplotid_prop)
     1598    0.010    0.000    0.353    0.000 /home/bbm/miniforge3/envs/refl1d_py310/lib/python3.10/site-packages/plotly/basedatatypes.py:5237(_set_compound_prop)
   103626    0.042    0.000    0.347    0.000 /home/bbm/miniforge3/envs/refl1d_py310/lib/python3.10/site-packages/plotly/basedatatypes.py:1811(_str_to_dict_path)
19302/14494    0.066    0.000    0.343    0.000 /home/bbm/miniforge3/envs/refl1d_py310/lib/python3.10/site-packages/plotly/basedatatypes.py:4659(__getitem__)


<pstats.Stats at 0x7ff048631750>
@nicolaskruchten
Copy link
Member

Oof, this is a complex call stack. From your comment in the other issue "it is pre-calculating an error message for a missing key" ... it seems to me like pre-calculating the messages is the problem here. If we're not actually using any invalid keys, we shouldn't be anywhere near a levenshtein calculation IMO. Since you've been in the code here, does there seem like a decent way to not precalculate things?

@bmaranville
Copy link
Contributor Author

I think the top of _perform_update could be restructured so that it doesn't first call _check_path_in_prop_tree, but instead does a simple __contains__ check for the property. If the property doesn't exist and doesn't match _subplot_re_match, then go ahead and calculate and throw the exception, but for internal methods like make_subplots that would never happen, and you'd never have to calculate the (increasingly expensive!) exception messages.

@AaronStiff AaronStiff added the performance Issues related to performance on creating figures label Mar 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Issues related to performance on creating figures
Projects
None yet
Development

No branches or pull requests

3 participants