Skip to content
This repository has been archived by the owner on Jan 31, 2020. It is now read-only.

Validate posted schemas #69

Merged
merged 13 commits into from
Dec 18, 2014
Merged

Conversation

mkiwala
Copy link
Contributor

@mkiwala mkiwala commented Dec 15, 2014

This PR attempts to define what json is valid for posting to the workflow service. The schema I present here is based in part on some work already done by @mark-burnett, but deviates from that work somewhat. Some areas that I've already identified for discussion are:

  • minItems and minProperties values
  • whether execution parameters should share the same validation as inputs (via the definitions/inputDictionary type)
  • whether tasks and edges in the top-level should be replaced with a "workflow" property of type definitions/workflow

Existing test cases were updated to support the definition of tasks and the removal of the environment property at the top-level.

Closes #65

* inputType is defined recursively to support nested_parallel_by_operation_matrix_inputs
* these messages are here instead of in a comment because json doesn't support comments
Not sure if these are correct.  Setting them to generate a discussion about
what the correct values should be.
@coveralls
Copy link

Coverage Status

Coverage decreased (-0.02%) when pulling c8c44ff on mkiwala:validate_posted_schemas into a8fafb1 on genome:master.

@mkiwala
Copy link
Contributor Author

mkiwala commented Dec 15, 2014

I wish the schema validation could provide assurances that the inputs, tasks, and edges in the posted json actually specify a valid graph. I discussed this with @davidlmorton who points out that this may not be an important problem to overcome and that run time validation on the endpoint can handle this sort of validation too.

I think we also agreed that readability of the posted json is perhaps more important than imposing a structure which enables a json schema to make assurances that the inputs, tasks, and edges in the posted json actually specify a valid graph, especially since the endpoint can do additional validation.

@mkiwala mkiwala self-assigned this Dec 15, 2014
return '', 201, {
'Location': url_for('workflow-detail', workflow_id=workflow_id)
}

except ValidationError as e:
LOG.exception(e)
return {'error': e.message}, 400
except exceptions.InvalidWorkflow as e:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove the InvalidWorkflow exception class?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to keep the InvalidWorkflow exception class for now. It is still being used in the backend to validate dag task names when a workflow is posted. We might also add additional validation constraints that we cannot constrain in the json schema, but wish to catch at the time the workflow is posted.

@mkiwala
Copy link
Contributor Author

mkiwala commented Dec 16, 2014

@mark-burnett said:

I guess a DAG must have at least one task.

Is your statement limited to DAGs specified as a method of a task or can it be taken to include the DAG at the top-level of a workflow? I might be using the terminology more loosely than you are, but if we use the same "workflow" type for the top level and for sub-DAGs, then they would have to have the same minProperties value.

If we set the minProperties for the taskDictionary to 1, then we should also set the minItems on the edgeList to be 2.

@mark-burnett
Copy link
Contributor

I agree that we should try to use the same schema for the workflow portion of the top level document, therefore that must have at least 1 task also.

I agree that it also makes sense to set minItems on edgeList to be 2.

@mark-burnett
Copy link
Contributor

A travis-ci environment update is causing the test failures. @mkiwala is going to look into updating our tests so that they are compatible with the new environment.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.02%) when pulling d0055bb on mkiwala:validate_posted_schemas into 33decf0 on genome:master.

@mkiwala
Copy link
Contributor Author

mkiwala commented Dec 17, 2014

I merged the fixes for the travis-ci failures into this branch.

The failure was due to honcho being unable to find the Procfile, even though the correct Procfile location was correctly specified on the honcho command line. The command line parsing in honcho depends on an unsupported behavior of the pythyon argparse library. Under python 2.7.9 that unsupported behavior changed, and caused honcho's command line parsing to break under certain circumstances. The workaround is to change the ordering of arguments to honcho so that the subcommand (e.g., start) is specified first, then any other arguments may follow.

@mark-burnett
Copy link
Contributor

+1

Changing the document's top level structure can happen in a separate PR.

mkiwala added a commit that referenced this pull request Dec 18, 2014
@mkiwala mkiwala merged commit 8c4373f into genome:master Dec 18, 2014
@mkiwala mkiwala removed the in review label Dec 18, 2014
@mkiwala mkiwala deleted the validate_posted_schemas branch December 18, 2014 16:22
This was referenced Dec 23, 2014
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use jsonschema to validate posted workflows
3 participants