Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Track workflow step input definitions in our model. #6850

Merged
merged 5 commits into from Nov 16, 2018

Conversation

Projects
None yet
5 participants
@jmchilton
Copy link
Member

commented Oct 10, 2018

We don't track workflow step inputs in any formal way in our model currently. This has resulted in some current hacks and prevents future enhancements. This commit splits WorkflowStepConnection into two models WorkflowStepInput and WorkflowStepConnection - normalizing the previous table workflow_step_connection on input step and input name.

In terms of current hacks forced on it by restricting all of tool state to be confined to a big JSON blob in the database - we have problems distinguishing keys and values when walking tool state. As we store more and more JSON blobs inside of the giant tool state blob - the worse this problem gets. Take for instance checking for runtime parameters or the rules parameter values - these both use JSON blobs that aren't simple values, so it is hard to tell looking at the tool state blob in the database or the workflow export to tell what is a key or what is a value. Tracking state as normalized inputs with default values and explicit attributes runtime values should allow much more percise state definition and construction.

This variant of the models would also potentially allow defining runtime values with non-tool default values (so default values defined for the workflow but still explicitly settable at runtime). The combinations of overriding defaults and defining runtime values were not representable before.

In terms of future enhancements, there is a lot we cannot track with the current models - such as map/reduce options for collection operations (#4623 (comment)). This should enable a lot of that. Obviously there are a lot of attributes defined here that are not yet utilized, but I'm using most (all?) of them downstream in the CWL branch. I'd rather populate this table fully realized and fill in the implementation around it as work continues to stream in from the CWL branch - to keep things simple and avoid extra database migrations. But I understand if this feels like speculative complexity we want to avoid despite the implementation being readily available for inspection downstream.

@jmchilton jmchilton force-pushed the jmchilton:workflows_track_step_inputs branch from 249fd00 to 7b33f56 Oct 11, 2018

@bgruening

This comment has been minimized.

Copy link
Member

commented Oct 11, 2018

But I understand if this feels like speculative complexity we want to avoid despite the implementation being readily available for inspection downstream.

I can not judge this and this PR feels like a good first step. Thanks John!
In regards to the json blobs I wanted to point out that we, in addition, also might want to consider to move over to (JsonTypes)[https://docs.sqlalchemy.org/en/latest/core/type_basics.html#sqlalchemy.types.JSON]. This feature is long enough in all our supported RDMS and could make queries faster.

@jmchilton jmchilton force-pushed the jmchilton:workflows_track_step_inputs branch 3 times, most recently from d21f09b to c8b433b Oct 11, 2018

@jmchilton jmchilton force-pushed the jmchilton:workflows_track_step_inputs branch 4 times, most recently from be6f6d5 to 9ac459d Oct 23, 2018

@jmchilton jmchilton force-pushed the jmchilton:workflows_track_step_inputs branch from 9ac459d to fef06c8 Oct 30, 2018

@jmchilton jmchilton force-pushed the jmchilton:workflows_track_step_inputs branch 2 times, most recently from 5403648 to d604a30 Nov 1, 2018

@jmchilton jmchilton force-pushed the jmchilton:workflows_track_step_inputs branch from d604a30 to 8cb37e5 Nov 13, 2018

Show resolved Hide resolved lib/galaxy/model/mapping.py Outdated
Show resolved Hide resolved lib/galaxy/model/mapping.py Outdated
Show resolved Hide resolved lib/galaxy/model/mapping.py Outdated
@nsoranzo

This comment has been minimized.

Copy link
Member

commented Nov 13, 2018

The failing selenium_tests.test_workflow_run.WorkflowRunTestCase.test_subworkflow_runtime_parameters selenium test seems legit.

Show resolved Hide resolved lib/galaxy/webapps/tool_shed/model/__init__.py Outdated
Show resolved Hide resolved lib/galaxy/model/__init__.py Outdated
Show resolved Hide resolved lib/galaxy/model/__init__.py Outdated
Show resolved Hide resolved lib/galaxy/webapps/tool_shed/model/__init__.py Outdated
Show resolved Hide resolved lib/galaxy/model/__init__.py Outdated
Show resolved Hide resolved lib/galaxy/model/mapping.py Outdated
Show resolved Hide resolved lib/galaxy/workflow/modules.py Outdated
Show resolved Hide resolved lib/galaxy/workflow/modules.py
Show resolved Hide resolved lib/galaxy/model/__init__.py Outdated

@jmchilton jmchilton force-pushed the jmchilton:workflows_track_step_inputs branch from 03ba12a to 7fe4a46 Nov 15, 2018

@galaxybot galaxybot added this to the 19.01 milestone Nov 15, 2018

@jmchilton

This comment has been minimized.

Copy link
Member Author

commented Nov 15, 2018

Indeed the Selenium failure seems legit:

Traceback (most recent call last):
  File "/Users/john/workspace/galaxy/lib/galaxy/web/framework/decorators.py", line 283, in decorator
    rval = func(self, trans, *args, **kwargs)
  File "/Users/john/workspace/galaxy/lib/galaxy/webapps/galaxy/api/workflows.py", line 414, in workflow_dict
    ret_dict = self.workflow_contents_manager.workflow_to_dict(trans, stored_workflow, style=style, version=version)
  File "/Users/john/workspace/galaxy/lib/galaxy/managers/workflows.py", line 437, in workflow_to_dict
    wf_dict = self._workflow_to_dict_run(trans, stored, workflow=workflow)
  File "/Users/john/workspace/galaxy/lib/galaxy/managers/workflows.py", line 468, in _workflow_to_dict_run
    module_injector.inject(step, steps=workflow.steps, exact_tools=False)
  File "/Users/john/workspace/galaxy/lib/galaxy/workflow/modules.py", line 1425, in inject
    state, step_errors = module.compute_runtime_state(self.trans, step, step_args)
  File "/Users/john/workspace/galaxy/lib/galaxy/workflow/modules.py", line 230, in compute_runtime_state
    visit_input_values(self.get_runtime_inputs(), state.inputs, update_value, no_replacement_value=NO_REPLACEMENT)
  File "/Users/john/workspace/galaxy/lib/galaxy/tools/parameters/__init__.py", line 167, in visit_input_values
    callback_helper(input, input_values, name_prefix, label_prefix, parent_prefix=parent_prefix, context=context)
  File "/Users/john/workspace/galaxy/lib/galaxy/tools/parameters/__init__.py", line 121, in callback_helper
    'value'             : input_values.get(input.name),
AttributeError: 'InputProxy' object has no attribute 'name'

@jmchilton jmchilton changed the title Track workflow step input definitions in our model. [WIP] Track workflow step input definitions in our model. Nov 15, 2018

Track workflow step input definitions in our model.
We don't track workflow step inputs in any formal way in our model currently. This has resulted in some current hacks and prevents future enhancements. This commit splits WorkflowStepConnection into two models WorkflowStepInput and WorkflowStepConnection - normalizing the previous table workflow_step_connection on input step and input name.

In terms of current hacks forced on it by restricting all of tool state to be confined to a big JSON blob in the database - we have problems distinguishing keys and values when walking tool state. As we store more and more JSON blobs inside of the giant tool state blob - the worse this problem gets. Take for instance checking for runtime parameters or the rules parameter values - these both use JSON blobs that aren't simple values, so it is hard to tell looking at the tool state blob in the database or the workflow export to tell what is a key or what is a value. Tracking state as normalized inputs with default values and explicit attributes runtime values should allow much more percise state definition and construction.

This variant of the models would also potentially allow defining runtime values with non-tool default values (so default values defined for the workflow but still explicitly settable at runtime). The combinations of overriding defaults and defining runtime values were not representable before.

In terms of future enhancements, there is a lot we cannot track with the current models - such as map/reduce options for collection operations (#4623 (comment)). This should enable a lot of that. Obviously there are a lot of attributes defined here that are not yet utilized, but I'm using most (all?) of them downstream in the CWL branch. I'd rather populate this table fully realized and fill in the implementation around it as work continues to stream in from the CWL branch - to keep things simple and avoid extra database migrations. But I understand if this feels like speculative complexity we want to avoid despite the implementation being readily available for inspection downstream.

@jmchilton jmchilton force-pushed the jmchilton:workflows_track_step_inputs branch from 7fe4a46 to 67efd07 Nov 15, 2018

Show resolved Hide resolved lib/galaxy/model/mapping.py Outdated

@nsoranzo nsoranzo changed the title [WIP] Track workflow step input definitions in our model. Track workflow step input definitions in our model. Nov 16, 2018

@nsoranzo nsoranzo added status/review and removed status/WIP labels Nov 16, 2018

Update lib/galaxy/model/mapping.py
Co-Authored-By: jmchilton <jmchilton@gmail.com>
Column("value_from_type", TEXT),
Column("default_value", JSONType),
Column("default_value_set", Boolean, default=False),
Column("runtime_value", Boolean, default=False))

This comment has been minimized.

Copy link
@nsoranzo

nsoranzo Nov 16, 2018

Member

What about enforcing uniqueness of an input name for a workflow step with:

    UniqueConstraint("workflow_step_id", "name", name='uix_1')

This comment has been minimized.

Copy link
@jmchilton

jmchilton Nov 16, 2018

Author Member

This is a great idea!

Column("value_from_type", TEXT),
Column("default_value", JSONType),
Column("default_value_set", Boolean, default=False),
Column("runtime_value", Boolean, default=False),

This comment has been minimized.

Copy link
@nsoranzo

nsoranzo Nov 16, 2018

Member

UniqueConstraint("workflow_step_id", "name", name='uix_1'),

Show resolved Hide resolved lib/galaxy/model/migrate/versions/0145_add_workflow_step_input.py Outdated

# Drop new workflow invocation step and job association table and restore legacy data.
LegacyWorkflowStepConnection_table = Table("workflow_step_connection_premigrate145", metadata, autoload=True)
LegacyWorkflowStepConnection_table.rename("workflow_step_connection")

This comment has been minimized.

Copy link
@nsoranzo

nsoranzo Nov 16, 2018

Member

This will lose any update to the workflow_step_connection table after the upgrade, is it worth to try to instead to recreate the table using the old schema but with the present content? I can work on that, if you agree.

This comment has been minimized.

Copy link
@jmchilton

jmchilton Nov 16, 2018

Author Member

I don't think it is worth the effort, but I wouldn't stop you from working on it if you wanted to.

This comment has been minimized.

Copy link
@nsoranzo

nsoranzo Nov 16, 2018

Member

OK, thanks, I'll give it a try and push to your branch if successful.

Update lib/galaxy/model/migrate/versions/0145_add_workflow_step_input.py
Co-Authored-By: jmchilton <jmchilton@gmail.com>

nsoranzo and others added some commits Nov 16, 2018

Update lib/galaxy/model/migrate/versions/0145_add_workflow_step_input.py
Co-Authored-By: jmchilton <jmchilton@gmail.com>
Preserve updated content in migration 145 downgrade
Also add UNIQUE constraint to `("workflow_step_id", "name")` to the
new `workflow_step_input` table.

@nsoranzo nsoranzo merged commit f968545 into galaxyproject:dev Nov 16, 2018

6 checks passed

api test Build finished. 442 tests run, 1 skipped, 0 failed.
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
framework test Build finished. 190 tests run, 0 skipped, 0 failed.
Details
integration test Build finished. 269 tests run, 10 skipped, 0 failed.
Details
selenium test Build finished. 151 tests run, 3 skipped, 0 failed.
Details
toolshed test Build finished. 577 tests run, 0 skipped, 0 failed.
Details
@mvdbeek

This comment has been minimized.

Copy link
Member

commented Nov 19, 2018

Hmm, the migration fails on my dev instance:

Activating virtualenv at .venv3
WARNING: Your Python version is: 3.6.5
Galaxy is currently supported on Python 2.7 only, support for Python >= 3.4
is still in beta stage. Since you are using a virtual environment, we assume
you are developing or testing Galaxy under Python 3. If not,
install a supported Python version. If a supported version is already
installed but is not your default, https://galaxyproject.org/admin/python/
contains instructions on how to force Galaxy to use a different version.
144 -> 145...

Migration script for workflow step input table.

Traceback (most recent call last):
  File "/Users/mvandenb/src/galaxy/.venv3/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1193, in _execute_context
    context)
  File "/Users/mvandenb/src/galaxy/.venv3/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 509, in do_execute
    cursor.execute(statement, parameters)
psycopg2.IntegrityError: duplicate key value violates unique constraint "workflow_step_input_workflow_step_id_name_key"
DETAIL:  Key (workflow_step_id, name)=(45, multibam_conditional|bamfiles) already exists.


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "./scripts/manage_db.py", line 24, in <module>
    invoke_migrate_main()
  File "./scripts/manage_db.py", line 20, in invoke_migrate_main
    main(repository=repo, url=db_url)
  File "/Users/mvandenb/src/galaxy/.venv3/lib/python3.6/site-packages/migrate/versioning/shell.py", line 209, in main
    ret = command_func(**kwargs)
  File "/Users/mvandenb/src/galaxy/.venv3/lib/python3.6/site-packages/migrate/versioning/api.py", line 186, in upgrade
    return _migrate(url, repository, version, upgrade=True, err=err, **opts)
  File "<decorator-gen-15>", line 2, in _migrate
  File "/Users/mvandenb/src/galaxy/.venv3/lib/python3.6/site-packages/migrate/versioning/util/__init__.py", line 167, in with_engine
    return f(*a, **kw)
  File "/Users/mvandenb/src/galaxy/.venv3/lib/python3.6/site-packages/migrate/versioning/api.py", line 366, in _migrate
    schema.runchange(ver, change, changeset.step)
  File "/Users/mvandenb/src/galaxy/.venv3/lib/python3.6/site-packages/migrate/versioning/schema.py", line 93, in runchange
    change.run(self.engine, step)
  File "/Users/mvandenb/src/galaxy/.venv3/lib/python3.6/site-packages/migrate/versioning/script/py.py", line 148, in run
    script_func(engine)
  File "lib/galaxy/model/migrate/versions/0145_add_workflow_step_input.py", line 69, in upgrade
    migrate_engine.execute(insert_step_inputs_cmd)
  File "/Users/mvandenb/src/galaxy/.venv3/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 2075, in execute
    return connection.execute(statement, *multiparams, **params)
  File "/Users/mvandenb/src/galaxy/.venv3/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 942, in execute
    return self._execute_text(object, multiparams, params)
  File "/Users/mvandenb/src/galaxy/.venv3/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1104, in _execute_text
    statement, parameters
  File "/Users/mvandenb/src/galaxy/.venv3/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1200, in _execute_context
    context)
  File "/Users/mvandenb/src/galaxy/.venv3/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1413, in _handle_dbapi_exception
    exc_info
  File "/Users/mvandenb/src/galaxy/.venv3/lib/python3.6/site-packages/sqlalchemy/util/compat.py", line 265, in raise_from_cause
    reraise(type(exception), exception, tb=exc_tb, cause=cause)
  File "/Users/mvandenb/src/galaxy/.venv3/lib/python3.6/site-packages/sqlalchemy/util/compat.py", line 248, in reraise
    raise value.with_traceback(tb)
  File "/Users/mvandenb/src/galaxy/.venv3/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1193, in _execute_context
    context)
  File "/Users/mvandenb/src/galaxy/.venv3/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 509, in do_execute
    cursor.execute(statement, parameters)
sqlalchemy.exc.IntegrityError: (psycopg2.IntegrityError) duplicate key value violates unique constraint "workflow_step_input_workflow_step_id_name_key"
DETAIL:  Key (workflow_step_id, name)=(45, multibam_conditional|bamfiles) already exists.
 [SQL: 'INSERT INTO workflow_step_input (workflow_step_id, name) SELECT input_step_id, input_name FROM workflow_step_connection_preupgrade145'] (Background on this error at: http://sqlalche.me/e/gkpj)

@nsoranzo nsoranzo deleted the jmchilton:workflows_track_step_inputs branch Nov 19, 2018

@nsoranzo

This comment has been minimized.

Copy link
Member

commented Nov 19, 2018

@mvdbeek Can you try something like the following (not tested):

diff --git a/lib/galaxy/model/migrate/versions/0145_add_workflow_step_input.py b/lib/galaxy/model/migrate/versions/0145_add_workflow_step_input.py
index d648d4b..cbbd2b9 100644
--- a/lib/galaxy/model/migrate/versions/0145_add_workflow_step_input.py
+++ b/lib/galaxy/model/migrate/versions/0145_add_workflow_step_input.py
@@ -65,7 +65,7 @@ def upgrade(migrate_engine):
 
     insert_step_inputs_cmd = \
         "INSERT INTO workflow_step_input (workflow_step_id, name) " + \
-        "SELECT input_step_id, input_name FROM workflow_step_connection_preupgrade145"
+        "SELECT DISTINCT input_step_id, input_name FROM workflow_step_connection_preupgrade145"
     migrate_engine.execute(insert_step_inputs_cmd)
 
     insert_step_connections_cmd = \
@mvdbeek

This comment has been minimized.

Copy link
Member

commented Nov 19, 2018

slightly different error now:

144 -> 145...

Migration script for workflow step input table.

Traceback (most recent call last):
  File "/Users/mvandenb/src/galaxy/.venv3/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1193, in _execute_context
    context)
  File "/Users/mvandenb/src/galaxy/.venv3/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 509, in do_execute
    cursor.execute(statement, parameters)
psycopg2.ProgrammingError: relation "workflow_step_connection_preupgrade145" already exists


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "./scripts/manage_db.py", line 24, in <module>
    invoke_migrate_main()
  File "./scripts/manage_db.py", line 20, in invoke_migrate_main
    main(repository=repo, url=db_url)
  File "/Users/mvandenb/src/galaxy/.venv3/lib/python3.6/site-packages/migrate/versioning/shell.py", line 209, in main
    ret = command_func(**kwargs)
  File "/Users/mvandenb/src/galaxy/.venv3/lib/python3.6/site-packages/migrate/versioning/api.py", line 186, in upgrade
    return _migrate(url, repository, version, upgrade=True, err=err, **opts)
  File "<decorator-gen-15>", line 2, in _migrate
  File "/Users/mvandenb/src/galaxy/.venv3/lib/python3.6/site-packages/migrate/versioning/util/__init__.py", line 167, in with_engine
    return f(*a, **kw)
  File "/Users/mvandenb/src/galaxy/.venv3/lib/python3.6/site-packages/migrate/versioning/api.py", line 366, in _migrate
    schema.runchange(ver, change, changeset.step)
  File "/Users/mvandenb/src/galaxy/.venv3/lib/python3.6/site-packages/migrate/versioning/schema.py", line 93, in runchange
    change.run(self.engine, step)
  File "/Users/mvandenb/src/galaxy/.venv3/lib/python3.6/site-packages/migrate/versioning/script/py.py", line 148, in run
    script_func(engine)
  File "lib/galaxy/model/migrate/versions/0145_add_workflow_step_input.py", line 49, in upgrade
    OldWorkflowStepConnection_table.rename("workflow_step_connection_preupgrade145")
  File "/Users/mvandenb/src/galaxy/.venv3/lib/python3.6/site-packages/migrate/changeset/schema.py", line 507, in rename
    run_single_visitor(engine, visitorcallable, self, connection, **kwargs)
  File "/Users/mvandenb/src/galaxy/.venv3/lib/python3.6/site-packages/migrate/changeset/databases/visitor.py", line 85, in run_single_visitor
    fn(element, **kwargs)
  File "/Users/mvandenb/src/galaxy/.venv3/lib/python3.6/site-packages/migrate/changeset/ansisql.py", line 164, in visit_table
    self.execute()
  File "/Users/mvandenb/src/galaxy/.venv3/lib/python3.6/site-packages/migrate/changeset/ansisql.py", line 44, in execute
    return self.connection.execute(self.buffer.getvalue())
  File "/Users/mvandenb/src/galaxy/.venv3/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 942, in execute
    return self._execute_text(object, multiparams, params)
  File "/Users/mvandenb/src/galaxy/.venv3/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1104, in _execute_text
    statement, parameters
  File "/Users/mvandenb/src/galaxy/.venv3/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1200, in _execute_context
    context)
  File "/Users/mvandenb/src/galaxy/.venv3/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1413, in _handle_dbapi_exception
    exc_info
  File "/Users/mvandenb/src/galaxy/.venv3/lib/python3.6/site-packages/sqlalchemy/util/compat.py", line 265, in raise_from_cause
    reraise(type(exception), exception, tb=exc_tb, cause=cause)
  File "/Users/mvandenb/src/galaxy/.venv3/lib/python3.6/site-packages/sqlalchemy/util/compat.py", line 248, in reraise
    raise value.with_traceback(tb)
  File "/Users/mvandenb/src/galaxy/.venv3/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1193, in _execute_context
    context)
  File "/Users/mvandenb/src/galaxy/.venv3/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 509, in do_execute
    cursor.execute(statement, parameters)
sqlalchemy.exc.ProgrammingError: (psycopg2.ProgrammingError) relation "workflow_step_connection_preupgrade145" already exists
 [SQL: '\nALTER TABLE workflow_step_connection RENAME TO workflow_step_connection_preupgrade145'] (Background on this error at: http://sqlalche.me/e/f405)
@mvdbeek

This comment has been minimized.

Copy link
Member

commented Nov 19, 2018

Might just need to drop workflow_step_connection_preupgrade145 ?

@nsoranzo

This comment has been minimized.

Copy link
Member

commented Nov 19, 2018

Not drop it, rename it back to workflow_step_connection !

nsoranzo added a commit to nsoranzo/galaxy that referenced this pull request Nov 20, 2018

Only migrate distinct (input_step_id, input_name) pairs
otherwise the UNIQUE constraint on the `workflow_step_input` table is
violated.

Follow-up on galaxyproject#6850.

Reported by @mvdbeek.

jmchilton added a commit that referenced this pull request Nov 20, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.