Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scottx611x/fix derived node attribute inheritance #2174

Merged
merged 30 commits into from Sep 22, 2017

Conversation

scottx611x
Copy link
Member

@scottx611x scottx611x commented Sep 20, 2017

  • Fix bug where Derived Data Nodes did not inherit their parent's Attributes
  • Constrain WorkflowFilesDL creation to asterisked outputs of a Galaxy Workflow (This is way easier than having an end-user specify outputs they want returned to Refinery by name. They simply have to check the asterisk next to the output files in their Workflow Editor)

Tests to write:


@codecov-io
Copy link

codecov-io commented Sep 20, 2017

Codecov Report

Merging #2174 into release-1.6.0 will increase coverage by 0.43%.
The diff coverage is 99.18%.

Impacted file tree graph

@@                Coverage Diff                @@
##           release-1.6.0    #2174      +/-   ##
=================================================
+ Coverage          46.35%   46.79%   +0.43%     
=================================================
  Files                412      412              
  Lines              27977    28200     +223     
  Branches            1312     1312              
=================================================
+ Hits               12970    13196     +226     
+ Misses             15007    15004       -3
Impacted Files Coverage Δ
refinery/data_set_manager/utils.py 63.45% <ø> (+0.64%) ⬆️
refinery/tool_manager/test_data/galaxy_mocks.py 100% <100%> (ø) ⬆️
refinery/data_set_manager/tests.py 99.6% <100%> (ø) ⬆️
refinery/core/tests.py 99.63% <100%> (ø) ⬆️
refinery/core/models.py 67.92% <100%> (+1.26%) ⬆️
refinery/tool_manager/tests.py 99.85% <100%> (ø) ⬆️
refinery/galaxy_connector/galaxy_workflow.py 13.09% <100%> (+0.29%) ⬆️
refinery/data_set_manager/models.py 76.62% <100%> (+0.78%) ⬆️
refinery/tool_manager/models.py 95.36% <97.14%> (-0.39%) ⬇️
... and 7 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 68e8d11...4ce2052. Read the comment docs.

@scottx611x scottx611x added this to the Release 1.6.0 milestone Sep 20, 2017
@scottx611x scottx611x added this to QC in Tool APIs Sep 20, 2017
@scottx611x scottx611x changed the title Scottx611x/fix derived node attributes Scottx611x/fix derived node attribute inheritance Sep 21, 2017
)
if graph[edge[0]][edge[1]]['output_id'] == output_id:
output_node_id = edge[0]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOTE: This is where things were flip-flopped before
This code wasn't broken, just difficult to debug since the naming didn't properly reflect what was happening.

output_node_id was really the input_node_id etc.

@@ -563,7 +563,6 @@ def _add_annotated_nodes(

if len(bulk_list) > 0:
AnnotatedNode.objects.bulk_create(bulk_list)
bulk_list = []
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was dead code

@@ -1609,15 +1609,21 @@ def rename_results(self):
if item.get_filetype() == zipfile:
new_file_name = ''.join([root, '.zip'])
renamed_file_store_item_uuid = rename(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename & rename_datafile return None in odd ways. Make an issue for fixing.

if (graph[edge[0]][edge[1]]['output_id'] ==
str(input_connection.step) + '_' +
input_connection.filename):
input_id = "{}_{}".format(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Analysis_NodeConnection method for this concatenation pattern: {}_{}

analysis=self, direction=INPUT_CONNECTION)[0].node.assay
analysis=self,
direction=INPUT_CONNECTION
).first().node.assay
# 1. read workflow into graph
graph = create_expanded_workflow_graph(
ast.literal_eval(self.workflow_copy)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need literal_eval ??? Yes we do

# get graph edge that corresponds to this output node:
# a. attach output node to source data transformation node
# b. attach output node to target data transformation node
# (if exists)
if len(graph.edges([output_connection.step])) > 0:
for edge in graph.edges_iter([output_connection.step]):
workflow_step = output_connection.step
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

workflow_step -> output_connection.step


# return a sorted list based on the AnalysisNodeConnections step
# attribute
return sorted(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not convinced that it needs to be sorted
it didn't need to be

for dataset in self._get_galaxy_history_dataset_list():
creating_job = self._get_galaxy_dataset_job(dataset)
if "upload" not in creating_job["tool_id"]:
workflow_step = self._get_workflow_step(dataset)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

workflow_step_index ?.. key... workflow_steps_dict

wrap workflow_step_key as str()

workflow_steps[str(workflow_step)]["workflow_outputs"]
]
if creating_job_output_name in workflow_output_names:
visible_datasets.append(dataset)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exposed_datasets?

if step["job_id"] == galaxy_dataset_dict[self.CREATING_JOB]:
workflow_steps.append(step["order_index"])

if not workflow_steps:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment on necessity


if not workflow_steps:
workflow_steps.append(0)

assert len(workflow_steps) == 1, (
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this even necessary, why not return in loop?

"""
creating_job = self._get_galaxy_dataset_job(galaxy_dataset_dict)
creating_job_outputs = creating_job["outputs"]
logger.debug("Dataset: %s", galaxy_dataset_dict)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove loggers

output_connection.step,
output_connection.filename
)
output_id = output_connection.get_output_connection_id()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm: I was imagining just one method, get_id, instead of separate get_output_connection_id and get_input_connection_id?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, thats less confusing

else:
output_connections_to_analysis_results.append(
(output_connection, None)
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So there's always an analysis_result from the database.... but maybe it's not really an analysis_result? The if-then could be moved up to where we set it.


def get_output_connection_id(self):
return "{}_{}".format(self.step, self.name)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oy: I hadn't noticed this discrepancy. Maybe a comment to explain why filename for one and name for the other?

)
analysis_group_number = (
matching_refinery_to_galaxy_file_mappings[0][self.ANALYSIS_GROUP]
assert len(list(set(analysis_groups))) == 1, (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was necessary after all?

@@ -861,15 +867,54 @@ def get_galaxy_file_relationships(self):
def _get_galaxy_history_dataset_list(self):
"""
Retrieve a list of Galaxy Datasets from the Galaxy History of our
Galaxy Workflow invocation.
Galaxy Workflow invocation all tool outputs in the Galaxy Workflow
editor.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm: really not sure what this is trying to say.

# If we reach this point and have no workflow_steps, this means that
# the galaxy dataset in question corresponds to an `upload` or
# `input` step i.e. `0`
return self.INPUT_STEP_NUMBER
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure the constant makes this more clear, but it's fine.

@mccalluc mccalluc merged commit 6003a4a into release-1.6.0 Sep 22, 2017
@mccalluc mccalluc deleted the scottx611x/fix-derived-node-attributes branch September 22, 2017 16:35
@scottx611x scottx611x moved this from QC to Done in Tool APIs Sep 22, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Tool APIs
  
Done
Development

Successfully merging this pull request may close these issues.

None yet

3 participants