Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow multiple simultaneous uploads via single POST. #4563

Merged
merged 1 commit into from Oct 4, 2017

Conversation

jmchilton
Copy link
Member

The upload.py tool itself already allowed this.

@guerler
Copy link
Contributor

guerler commented Sep 6, 2017

Is this ready for review and will this allow to specify separate dbkey/datatype pairs for each file?

@jmchilton
Copy link
Member Author

jmchilton commented Sep 7, 2017

It lets you add different names, space_to_tab and to_posix_lines values, and such - but I guess not different dbkeys and filetypes. I'll update the PR to do this and add a test case for that this morning.

if override_file_type:
return override_file_type
else:
return context.get(self.file_type_name, self.default_file_type)
Copy link
Contributor

@guerler guerler Sep 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very minor but could we do something like unless you think its less performant:

default_type = context.get(self.file_type_name, self.default_file_type)
return context.get("override_file_type", default_type)

I also wonder if we could just name it file_type instead of override_file_type since they can be distinguished by being specified on request vs file bunch level. Thanks a lot for this addition. This is awesome.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - so I had that at first but the fact that a value - even though it was empty - was in the parent child context means the parent context was ignored. I should probably take another swing at this though - because you are right - the way it is now kinda sucks.

@@ -279,10 +285,18 @@ def value_from_basic(self, value, app, ignore_errors=False):
rval.append(rval_dict)
return rval

def get_file_count(self, trans, context):
file_count = context.get("file_count", "auto")
if file_count == "auto":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a minor one. Can we make it one liner?

file_count = len(self.get_datatype(trans, context).writable_files) if file_count == "auto" else int(file_count)

thank you!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll rebase with this change, thanks!

The upload.py tool itself already allowed this and update upload dataset grouping to handle this.
@jmchilton
Copy link
Member Author

I've changed the interface here so that "file_type" or "dbkey" in the parent context are taken as defaults and in the "per-file" context these can be overridden with "file_type" or "dbkey" instead of requiring a different variable "override_file_type" / "override_dbkey" as I had it before. I added a few more tests for different combinations of override versus default for dbkey and file_type.

@guerler
Copy link
Contributor

guerler commented Oct 3, 2017

Thanks a lot for working on this. I aligned the submission format in #4513 and it seems to work well.

if dbkey == "":
if parent_context:
dbkey = parent_context.get("dbkey", dbkey)
return dbkey
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be missing something but why are we not just returning parent_context.get("dbkey", context.get("dbkey"))?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the inner dbkey - the per file one - is just a hidden field in the tool form with a default of "" - that is what the outer one is checking the top-level dbkey param in the tool that I think will default to ?. So if the API request sends a dbkey for all files and then one for a specific file - I think we should use the base for all files that don't specify an explicit dbkey and then use the specific keys as set. I tried to use this strategy with the file extensions also. This implementation is crap because we are mapping API requests to a tool that wasn't really ever designed to do this well (these frustrations made led me to create #4734).

So I implemented API tests to verify all this behavior with respect to specific file dbkey versus dbkey for all files - if you can rework the implementation to be cleaner in such a way that the API calls don't break - I'm totally on board. I'm quite frustrated with a lot of grouping.py. There are more test cases in #4746 that should also help verify cleanups to the implementation don't break FTP uploads and such.

@guerler guerler merged commit 79a16da into galaxyproject:dev Oct 4, 2017
@jmchilton
Copy link
Member Author

Thanks for the merge @guerler - let me know if there is anything else I can do to help with optimizing uploads.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants