Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't write job and job-related attributes when creating export model #11815

Merged

Conversation

mvdbeek
Copy link
Member

@mvdbeek mvdbeek commented Apr 10, 2021

These aren't used in regular jobs (should only be needed for history export currently) and looping over the implicit collection jobs for each job is quadratic in runtime and inefficient.

What did you do?

  • Skip serializing job and implicit collection job attributes when they are not needed

Why did you make this change?

@bgruening mentioned that when a user submits 10000 map-over jobs he starts seeing slow query timing logs.
When this happens all of the active job worker threads are within galaxy/model/__init__.py:1536

Thread 2747314 (idle): "CondorRunner.work_thread-3"
    do_execute (sqlalchemy/engine/default.py:593)
    _execute_context (sqlalchemy/engine/base.py:1277)
    _execute_clauseelement (sqlalchemy/engine/base.py:1130)
    _execute_on_connection (sqlalchemy/sql/elements.py:298)
    execute (sqlalchemy/engine/base.py:1011)
    _execute_and_instances (sqlalchemy/orm/query.py:3560)
    __iter__ (sqlalchemy/ext/baked.py:444)
    _load_on_pk_identity (sqlalchemy/ext/baked.py:615)
    _emit_lazyload (sqlalchemy/orm/strategies.py:850)
    <lambda> (<string>:1)
    _load_for_state (sqlalchemy/orm/strategies.py:760)
    get (sqlalchemy/orm/attributes.py:723)
    __get__ (sqlalchemy/orm/attributes.py:287)
    <listcomp> (galaxy/model/__init__.py:1536)
    serialize (galaxy/model/__init__.py:1536)
    _finalize (galaxy/model/store/__init__.py:1380)
    __exit__ (galaxy/model/store/__init__.py:1397)
    setup_external_metadata (galaxy/metadata/__init__.py:167)
    setup_external_metadata (galaxy/jobs/__init__.py:2106)
    __handle_metadata (galaxy/jobs/command_factory.py:235)
    build_command (galaxy/jobs/command_factory.py:128)
    build_command_line (galaxy/jobs/runners/__init__.py:285)
    prepare_job (galaxy/jobs/runners/__init__.py:244)
    queue_job (galaxy/jobs/runners/condor.py:65)
    run_next (galaxy/jobs/runners/__init__.py:136)
    run (threading.py:864)
    _bootstrap_inner (threading.py:916)
    _bootstrap (threading.py:884)

At this point Galaxy serializes all jobs that belong to an implicit collections jobs association.
Note that we only run through the export model by default since 21.01 (it was made the default in 7d0ec29).

How to test the changes?

(select the most appropriate option; if the latter, provide steps for testing below)

  • I've included appropriate automated tests.
  • This is a refactoring of components with existing test coverage.
  • Instructions for manual testing are as follows:
    1. [add testing steps and prerequisites here if you didn't write automated tests covering all your changes]

License

For UI Components

  • I've included a screenshot of the changes

@mvdbeek mvdbeek force-pushed the performance_fix_setup_external_metadata branch 3 times, most recently from 5175dbf to a8310a0 Compare April 11, 2021 08:53
These aren't used in regular jobs (should only be needed for history
export currently).
@mvdbeek mvdbeek force-pushed the performance_fix_setup_external_metadata branch from a8310a0 to 0a4aa46 Compare April 11, 2021 08:56
@mvdbeek mvdbeek marked this pull request as ready for review April 11, 2021 11:26
@github-actions github-actions bot added this to the 21.05 milestone Apr 11, 2021
for icj in implicit_collection_jobs_dict.values():
icj_attrs = icj.serialize(self.security, self.serialization_options)
icjs_attrs.append(icj_attrs)
if not self.serialization_options.for_edit:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That if looks iffy right? Maybe there should a TODO here and options used more clearly? I tried to make this code pretty general and it seems to be becoming more metadata specific slowly instead moving the other direction. for_edit doesn't really naturally seem to imply serialize job data. I feel like it might be better to add options for including job data and set at a different level.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think e7e3b00 looks better ?

@mvdbeek mvdbeek force-pushed the performance_fix_setup_external_metadata branch from e7e3b00 to 897590b Compare April 11, 2021 18:34
@bgruening bgruening merged commit 240829a into galaxyproject:dev Apr 11, 2021
@github-actions
Copy link

This PR was merged without a "kind/" label, please correct.

@bgruening
Copy link
Member

Thanks a lot @mvdbeek!

@bgruening
Copy link
Member

Ah shit ... you wanted to target 21.01?

@mvdbeek
Copy link
Member Author

mvdbeek commented Apr 12, 2021

Ah, my bad ... yeah, I'll open a backport.

@mvdbeek mvdbeek changed the title [21.01] Don't write job and job-related attributes when creating export model Don't write job and job-related attributes when creating export model Apr 12, 2021
bgruening added a commit that referenced this pull request Apr 12, 2021
…_metadata

[21.01] Backport #11815: Don't write job and job-related attributes when creating export model
@bgruening
Copy link
Member

Very nice, this works pretty well on EU! Thanks a lot @mvdbeek!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants