Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove custom tool state handling #207

Closed
wants to merge 1 commit into from
Closed

Conversation

guerler
Copy link

@guerler guerler commented Jan 26, 2017

The json.loads here is strange for nested parameters like repeats, sections and conditionals, properly decoding the parameters would be the right thing to do. Alternatively I can make consistent adjustments such as loading or decoding the states dictionary on the side of Galaxy. The location at which all Step input states are stored is here: https://github.com/galaxyproject/galaxy/blob/dev/lib/galaxy/workflow/modules.py#L72. I believe that we have quite some flexibility in the storing format at this point and it would be great to talk about what the preferred format is. ping @jmchilton

@guerler guerler added the bug label Jan 26, 2017
@jmchilton
Copy link
Member

I'm fine with breaking backward compatibility I think as long as we are moving toward a more simple over the wire format. Like anywhere we have strings encoding json in json objects - that is real bummer. Am I reading this correctly that the recent Galaxy PR made something that was a string encoding JSON into just a JSON object directly? Do you have before and after examples of what was going over the wire?

@guerler
Copy link
Author

guerler commented Jan 26, 2017

Not yet, but the Galaxy PR allows us to do exactly that now that the state handling has been unified. Here is how we can store all workflow step states as json-dictionaries without stringifications while still be able to load previously stored workflow steps: https://github.com/galaxyproject/galaxy/compare/dev...guerler:store_step_states_as_dict?expand=1. I did not want to change the tool state generation process in the previous PR. Historically, we started of with a hash-encoded string of nested-json strings and moved to a regular string containing nested-json strings, now we can transform this as shown in the branch above into a regular dict-object. Either way the custom json.load of the first-layer happening in bioblend is not an option.

@jmchilton
Copy link
Member

I like https://github.com/galaxyproject/galaxy/compare/dev...guerler:store_step_states_as_dict?expand=1 I think - it seems like a good step forward at first glance. I still don't have a grasp on where these things are coming from and going to - what things look like over the wire - etc.... I have to just trust you I think or spend an afternoon figuring this out.

The fact that this change doesn't break older Galaxy releases means the encoding loads was probably always redundant?

+1 I guess.

@nsoranzo
Copy link
Member

nsoranzo commented Feb 1, 2017

This is the difference when executing:

from bioblend.galaxy import GalaxyInstance

gi = GalaxyInstance(<Galaxy_API_URL>, <Galaxy_API_key>)
gi.workflows.show_workflow(wf_id)

for a very simple workflow.
Galaxy 17.01:

{u'annotation': None,
 u'deleted': False,
 u'id': u'333177a7512af1c7',
 u'inputs': {u'0': {u'label': u'Input Dataset',
   u'uuid': u'0a238de4-5f9e-4176-a92b-1e33ef8a274e',
   u'value': u''}},
 u'latest_workflow_uuid': u'ebc188a8-9bf3-47ec-b1a9-4c68d0ab57da',
 u'model_class': u'StoredWorkflow',
 u'name': u'Group',
 u'owner': u'soranzon',
 u'published': False,
 u'steps': {u'0': {u'annotation': None,
   u'id': 0,
   u'input_steps': {},
   u'tool_id': None,
   u'tool_inputs': {u'name': u'Input Dataset'},
   u'tool_version': None,
   u'type': u'data_input'},
  u'1': {u'annotation': None,
   u'id': 1,
   u'input_steps': {u'input1': {u'source_step': 0, u'step_output': u'output'}},
   u'tool_id': u'Grouping1',
   u'tool_inputs': {u'groupcol': u'null',
    u'ignorecase': u'"false"',
    u'ignorelines': u'["62", "43"]',
    u'input1': u'{"__class__": "RuntimeValue"}',
    u'operations': u'[{"opcol": "1", "__index__": 0, "optype": "mean", "opround": "no"}, {"opcol": "2", "__index__": 1, "optype": "median", "opround": "yes"}]'},
   u'tool_version': u'2.1.1',
   u'type': u'tool'}},
 u'tags': [],
 u'url': u'/api/workflows/333177a7512af1c7'}

Galaxy dev:

{u'annotation': u'',
 u'deleted': False,
 u'id': u'5969b1f7201f12ae',
 u'inputs': {u'0': {u'label': u'Input Dataset',
   u'uuid': u'0a238de4-5f9e-4176-a92b-1e33ef8a274e',
   u'value': u''}},
 u'latest_workflow_uuid': u'ebc188a8-9bf3-47ec-b1a9-4c68d0ab57da',
 u'model_class': u'StoredWorkflow',
 u'name': u'Group',
 u'owner': u'nsoranzo',
 u'published': False,
 u'steps': {u'0': {u'annotation': None,
   u'id': 0,
   u'input_steps': {},
   u'tool_id': None,
   u'tool_inputs': u'{"__page__": null, "__rerun_remap_job_id__": null}',
   u'tool_version': None,
   u'type': u'data_input'},
  u'1': {u'annotation': None,
   u'id': 1,
   u'input_steps': {u'input1': {u'source_step': 0, u'step_output': u'output'}},
   u'tool_id': u'Grouping1',
   u'tool_inputs': u'{"operations": "[{\\"opcol\\": \\"1\\", \\"__index__\\": 0, \\"optype\\": \\"mean\\", \\"opround\\": \\"no\\"}, {\\"opcol\\": \\"2\\", \\"__index__\\": 1, \\"optype\\": \\"median\\", \\"opround\\": \\"yes\\"}]", "__page__": null, "input1": "{\\"__class__\\": \\"RuntimeValue\\"}", "ignorelines": "[\\"62\\", \\"43\\"]", "groupcol": "null", "__rerun_remap_job_id__": null, "ignorecase": "\\"false\\""}',
   u'tool_version': u'2.1.1',
   u'type': u'tool'}},
 u'tags': [],
 u'url': u'/api/workflows/5969b1f7201f12ae'}

tool_inputs was a dict, now it has an additional json encoding.
Also data_input steps now seem to not have name sometimes (used to be inside 'tool_inputs), and the new label too.

@guerler
Copy link
Author

guerler commented Feb 1, 2017

Thx @nsoranzo. I am working on providing the dict for the tool state directly through Galaxy (see: galaxyproject/galaxy#3522). Its still a work in progress but if that turns out to be more difficult I will leave the existing customization in.

@nsoranzo
Copy link
Member

nsoranzo commented Feb 1, 2017

Thanks @guerler, I've added commit 54639ef which adds more tests that fails for the Galaxy dev branch even with this PR applied, as confirmed by the Travis build that I've just re-run.

@guerler
Copy link
Author

guerler commented Feb 1, 2017

Great, makes sense. Thanks.

@guerler
Copy link
Author

guerler commented Feb 7, 2017

Tests, including the new ones @nsoranzo added, pass locally, I think the failing test is not related to the change here.

@nsoranzo nsoranzo closed this Feb 8, 2017
@nsoranzo nsoranzo reopened this Feb 8, 2017
@jmchilton
Copy link
Member

@nsoranzo Are you okay with this being merged now?

@nsoranzo
Copy link
Member

Unfortunately that's going to break the experience for Galaxy <= release_17.01 (where the values in the tool_inputs are still JSON-encoded), this improved patch seem to solve the problem in my testings:

@@ -170,7 +170,13 @@ class Step(Wrapper):
             raise ValueError('Unknown step type: %r' % stype)
         if self.type == 'tool' and self.tool_inputs:
             for k, v in six.iteritems(self.tool_inputs):
-                self.tool_inputs[k] = json.loads(v)
+                # In Galaxy before release_17.05, v is a JSON-encoded string
+                if not isinstance(v, six.string_types):
+                    break
+                try:
+                    self.tool_inputs[k] = json.loads(v)
+                except ValueError:
+                    break
 
     @property
     def gi_module(self):

Any objection if I commit this instead?

@nsoranzo
Copy link
Member

Superseded by commit 0be30ac .

@nsoranzo nsoranzo closed this Feb 12, 2017
@nsoranzo nsoranzo deleted the fix_step_wrapper_000 branch May 21, 2017 16:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants