Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workflow loading speedup #4500

Merged
merged 10 commits into from Sep 1, 2017

Conversation

@mvdbeek
Copy link
Member

commented Aug 27, 2017

This PR introduces a number of changes to improve the speed with which we iterate over workflows and workflow steps by doing roughly 4 things (see commits for details).

  • Fetch related workflows, steps, tags and annotations when getting the StoredWorkflow instance from the database
  • Avoid accessing tool_shed_repository attributes when these are available in the Tool object
  • Avoid looking up annotations and and tags when they are attached to the StoredWorkflow instance
  • Build edam datatypes only once per datatype registry lifetime.

For a sqlite backend this decreases the time needed to download a workflow through the API using the editor style about 4 fold. When using a postgresql backend the speed of downloading a single workflow is minor, but profiling the API endpoint directly shows about a 10 fold speedup.

@mvdbeek

This comment has been minimized.

Copy link
Member Author

commented Aug 27, 2017

For completeness, this is the profiler output for downloading the same workflow 30 times using a postgresql db without the PR:

*** PROFILER RESULTS ***
_workflow_to_dict_editor (/Users/mvandenb/src/galaxy/lib/galaxy/managers/workflows.py:428)
function called 30 times

         321660 function calls (312879 primitive calls) in 0.674 seconds

   Ordered by: cumulative time, internal time, call count
   List reduced from 556 to 40 due to restriction <40>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
       30    0.005    0.000    0.675    0.023 workflows.py:428(_workflow_to_dict_editor)
     3090    0.004    0.000    0.423    0.000 attributes.py:229(__get__)
  840/510    0.002    0.000    0.419    0.001 attributes.py:561(get)
      330    0.001    0.000    0.409    0.001 strategies.py:492(_load_for_state)
      360    0.001    0.000    0.408    0.001 query.py:2756(__iter__)
      270    0.001    0.000    0.406    0.002 <string>:1(<lambda>)
      270    0.004    0.000    0.405    0.002 strategies.py:565(_emit_lazyload)
      270    0.004    0.000    0.368    0.001 query.py:2607(all)
      360    0.002    0.000    0.353    0.001 query.py:2770(_execute_and_instances)
      360    0.001    0.000    0.324    0.001 base.py:846(execute)
      360    0.001    0.000    0.323    0.001 elements.py:322(_execute_on_connection)
      360    0.003    0.000    0.323    0.001 base.py:975(_execute_clauseelement)
       90    0.000    0.000    0.296    0.003 modules.py:985(from_workflow_step)
       90    0.001    0.000    0.295    0.003 modules.py:593(from_workflow_step)
      360    0.004    0.000    0.197    0.001 base.py:1061(_execute_context)
       90    0.001    0.000    0.192    0.002 item_attrs.py:108(get_item_annotation_obj)
       90    0.000    0.000    0.153    0.002 query.py:2644(first)
       90    0.002    0.000    0.153    0.002 query.py:2438(__getitem__)
      360    0.001    0.000    0.139    0.000 default.py:449(do_execute)
      360    0.136    0.000    0.139    0.000 {method 'execute' of 'psycopg2.extensions.cursor' objects}
      360    0.001    0.000    0.122    0.000 elements.py:431(compile)
      360    0.001    0.000    0.121    0.000 elements.py:496(_compiler)
      360    0.003    0.000    0.120    0.000 compiler.py:332(__init__)
      360    0.001    0.000    0.116    0.000 compiler.py:167(__init__)
  450/360    0.001    0.000    0.115    0.000 compiler.py:212(process)
 4770/360    0.009    0.000    0.115    0.000 visitors.py:75(_compiler_dispatch)
      360    0.005    0.000    0.114    0.000 compiler.py:1509(visit_select)
      480    0.005    0.000    0.105    0.000 loading.py:30(instances)
      360    0.001    0.000    0.067    0.000 result.py:946(fetchall)
      360    0.001    0.000    0.064    0.000 result.py:639(_soft_close)
      360    0.001    0.000    0.063    0.000 base.py:793(close)
      360    0.001    0.000    0.061    0.000 pool.py:879(close)
      360    0.001    0.000    0.061    0.000 pool.py:756(_checkin)
      360    0.002    0.000    0.060    0.000 pool.py:615(_finalize_fairy)
      360    0.003    0.000    0.054    0.000 query.py:3204(_compile_context)
      360    0.001    0.000    0.048    0.000 pool.py:764(_reset)
     1800    0.007    0.000    0.047    0.000 compiler.py:1281(_label_select_column)
      360    0.001    0.000    0.046    0.000 default.py:419(do_rollback)
      360    0.044    0.000    0.044    0.000 {method 'rollback' of 'psycopg2.extensions.connection' objects}
     1800    0.009    0.000    0.035    0.000 compiler.py:584(visit_label)

and with the PR:

*** PROFILER RESULTS ***
_workflow_to_dict_editor (/Users/mvandenb/src/galaxy/lib/galaxy/managers/workflows.py:430)
function called 30 times

         47052 function calls (45410 primitive calls) in 0.055 seconds

   Ordered by: cumulative time, internal time, call count
   List reduced from 177 to 40 due to restriction <40>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
       30    0.004    0.000    0.055    0.002 workflows.py:430(_workflow_to_dict_editor)
       90    0.000    0.000    0.015    0.000 modules.py:985(from_workflow_step)
       90    0.001    0.000    0.015    0.000 modules.py:593(from_workflow_step)
      270    0.003    0.000    0.014    0.000 __init__.py:20(visit_input_values)
       90    0.000    0.000    0.014    0.000 modules.py:73(from_workflow_step)
       90    0.001    0.000    0.012    0.000 modules.py:118(recover_state)
       90    0.001    0.000    0.011    0.000 __init__.py:356(decode)
       90    0.000    0.000    0.009    0.000 modules.py:715(check_and_update_state)
       90    0.001    0.000    0.009    0.000 __init__.py:1415(check_and_update_param_values)
       90    0.000    0.000    0.008    0.000 modules.py:643(get_tooltip)
      720    0.003    0.000    0.008    0.000 __init__.py:64(callback_helper)
       90    0.002    0.000    0.006    0.000 __init__.py:155(params_from_strings)
       90    0.000    0.000    0.006    0.000 template.py:433(render)
      180    0.002    0.000    0.006    0.000 util.py:139(url_for)
       90    0.001    0.000    0.005    0.000 runtime.py:811(_render)
     3090    0.003    0.000    0.005    0.000 attributes.py:229(__get__)
      240    0.000    0.000    0.004    0.000 __init__.py:1424(validate_inputs)
       90    0.000    0.000    0.004    0.000 modules.py:655(get_data_inputs)
 1500/360    0.002    0.000    0.004    0.000 json.py:21(json_fix)
      210    0.001    0.000    0.004    0.000 __init__.py:115(check_param)
    10200    0.002    0.000    0.003    0.000 {isinstance}
      360    0.001    0.000    0.003    0.000 json.py:56(safe_loads)
      360    0.000    0.000    0.002    0.000 odict.py:71(values)
       90    0.000    0.000    0.002    0.000 modules.py:108(get_state)
       90    0.001    0.000    0.002    0.000 runtime.py:857(_render_context)
      360    0.001    0.000    0.002    0.000 {map}
       90    0.000    0.000    0.002    0.000 basic.py:833(from_json)
   120/60    0.000    0.000    0.002    0.000 attributes.py:561(get)
       90    0.000    0.000    0.002    0.000 __init__.py:347(encode)
       60    0.000    0.000    0.002    0.000 basic.py:1061(from_json)
       90    0.001    0.000    0.002    0.000 __init__.py:139(params_to_strings)
      360    0.000    0.000    0.001    0.000 __init__.py:294(loads)
      990    0.001    0.000    0.001    0.000 UserDict.py:91(get)
       60    0.000    0.000    0.001    0.000 strategies.py:492(_load_for_state)
       90    0.000    0.000    0.001    0.000 modules.py:556(__init__)
      360    0.001    0.000    0.001    0.000 decoder.py:359(decode)
       90    0.000    0.000    0.001    0.000 runtime.py:871(_exec_template)
       90    0.000    0.000    0.001    0.000 abc.py:128(__instancecheck__)
       90    0.000    0.000    0.001    0.000 modules.py:681(get_data_outputs)
      180    0.000    0.000    0.001    0.000 util.py:534(ascii_characters)

@mvdbeek mvdbeek added this to the 17.09 milestone Aug 27, 2017

@mvdbeek mvdbeek force-pushed the mvdbeek:workflow_loading_speedup branch from 98d041f to 030207a Aug 27, 2017

mvdbeek added some commits Aug 27, 2017

Use pre-existing annotations object if possible
This avoids additional requests to the database.
Optimize loading of StoredWorkflow attributes
WorkflowsManager.get_stored_workflow is used when exporting or editing
workflows. In these cases we iterate over the related workflow objects
steps as well as the related tags and annotation objects. Each iteration
will emit a new SQL statement. Joinedloading these relations here speeds
up the workflow run form and the editor. With sqlite as backend this
speeds up exporting a workflow for the workflow editor by about 5X.

@mvdbeek mvdbeek force-pushed the mvdbeek:workflow_loading_speedup branch from 030207a to 1a1232e Aug 27, 2017

@jmchilton

This comment has been minimized.

Copy link
Member

commented Aug 29, 2017

+1 from me - this looks fantastic. Thanks so much @mvdbeek! I'm going to ping @guerler about the workflow building mode changes - they seem fine to me but I'm not sure I totally understand the workflow build mode code in general yet so a review from him would be great.

raise exceptions.MessageException('History unavailable. Please specify a valid history id')
except Exception as e:
raise exceptions.MessageException('[history_id=%s] Failed to retrieve history. %s.' % (history_id, str(e)))
if not workflow_building_mode or workflow_building_mode is workflow_building_modes.USE_HISTORY:

This comment has been minimized.

Copy link
@nsoranzo

nsoranzo Aug 29, 2017

Member

I'd be more explicit (i.e. use less magic) with:

        if workflow_building_mode in (workflow_building_modes.DISABLED, workflow_building_modes.USE_HISTORY):

This comment has been minimized.

Copy link
@mvdbeek

mvdbeek Aug 29, 2017

Author Member

True, that looks better!

This comment has been minimized.

Copy link
@mvdbeek

mvdbeek Aug 29, 2017

Author Member

Actually that wouldn't work, because in checks equality, right ?
workflow_building_modes are

workflow_building_modes = Bunch(DISABLED=False, ENABLED=True, USE_HISTORY=1)

So something like:

workflow_building_modes.ENABLED in (workflow_building_modes.DISABLED, workflow_building_modes.USE_HISTORY)

would translate to True in (False, 1), which evaluates to True, and that isn't correct here.

This comment has been minimized.

Copy link
@mvdbeek

mvdbeek Aug 29, 2017

Author Member

But this is prone to confusion -- I didn't know workflow_building_mode could be False, 1 or True before this PR and I spent a moment figuring this out. Maybe we could change workflow_building_modes to Bunch(DISABLED='disabled', ENABLED='enabled', USE_HISTORY='use_history') ?

This comment has been minimized.

Copy link
@mvdbeek

mvdbeek Aug 29, 2017

Author Member

@nsoranzo what do you think about 2faa56e ?

This comment has been minimized.

Copy link
@nsoranzo

nsoranzo Aug 29, 2017

Member

@mvdbeek I was going to suggest exactly this "workaround"! Sorry for the wrong inital suggestion.

@erasche erasche added this to Workflows in Performance Aug 31, 2017

@jmchilton jmchilton merged commit dd9c757 into galaxyproject:dev Sep 1, 2017

4 of 5 checks passed

continuous-integration/travis-ci/pr The Travis CI build failed
Details
api test Build finished. 284 tests run, 0 skipped, 0 failed.
Details
framework test Build finished. 161 tests run, 0 skipped, 0 failed.
Details
integration test Build finished. 44 tests run, 0 skipped, 0 failed.
Details
lgtm analysis: JavaScript No alert changes
Details
@jmchilton

This comment has been minimized.

Copy link
Member

commented Sep 1, 2017

Awesome - thanks for this @mvdbeek !

@mvdbeek mvdbeek deleted the mvdbeek:workflow_loading_speedup branch Sep 3, 2017

@martenson martenson moved this from Workflows to Merged in Performance Feb 27, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.