New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implemented requirements defined in DM-32675 #84
Conversation
1175468
to
9caab2d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few things about the changes and then a couple higher level things:
- Need a file in doc/changes
- To understand new chunk function I had to look at code not in this PR. The docstring for parameter dependencies in split_dependencies_by_tasks doesn't match code.
|
||
Parameters | ||
---------- | ||
tasks_dependency_map : `dict` of dependencies dictionary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style: only type goes here, description goes on next line. Also fix the Returns just below here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somewhere either here or where dependencies first used, it would be helpful to have a brief reminder what it is. For example, I didn't understand why len(dependencies) gave you the number of jobs in the task.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
chunk_id = hash(pseudo_input) % n_chunks | ||
tasks_dependency_map_chunked.setdefault(task_name + "_chunk_" | ||
+ str(chunk_id), | ||
{})[pseudo_input] \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since using task_name + "chunk" + str(chunk_id) in multiple places, please make it a variable or a function so if we need to change it we can safely do it in one place. Could use f-string. If zero-padding the chunk_id will make them show up in an order that would be helpful to someone monitoring the run, please consider doing that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
tasks_chunked[task_name] = n_chunks | ||
else: | ||
tasks_dependency_map_chunked[task_name] = dependencies | ||
tasks_dependency_map_chunked_updated_dep = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need comment here briefly describing next block of code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
self.tasks_steps[task_name + "_chunk_" + str(chunk_id)] = \ | ||
self.tasks_steps[task_name] | ||
self.tasks_cmd_lines[task_name + "_chunk_" + str(chunk_id)] = \ | ||
self.tasks_cmd_lines[task_name] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No updates for self.tasks_inputs? Are those actually used anywhere? If not, they should be removed from the rest of the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed this data structure, was an artefact
9caab2d
to
371b9e3
Compare
Checklist
doc/changes