New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Orca: Align the data analysis method of dataloader and dataframe #5763
Orca: Align the data analysis method of dataloader and dataframe #5763
Conversation
Now we assume any @hkvision would you mind taking a look at this? |
I don't think the logic here is correct.
|
sorry for outdated description, now updated. |
if inputs is XShards of dictionary, in which case features is a dict, should pass
|
The modification looks good to me. Do we have test for dataloader that returns more than 2 values (e.g. feature1, feature2, label)? |
Need more tests to cover different cases (e.g., label containing multiple inputs) |
I suppose at this moment we may not be able to perfect support multi-label outputs, especially when the dataset return all the items as a tuple (x1, x2, x3, y1, y2)? |
invalidInputError(False, | ||
"Features should either be tensor, list/tuple or dict, " | ||
"but got {}".format(type(features))) | ||
|
||
if isinstance(output, tuple) or isinstance(output, list): | ||
# Then target is also assumed to be a tuple or list. | ||
loss = self.criterion(*output, *target) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should support multi-label output if target is already a list right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should support multi-label output if target is already a list right?
that's right
No, only find multi-input test case for df |
Yes, we have supported multi-label in df/xshards and raydataset, but not in dataloader. Maybe we can enable it in another pr? |
Lack of uts for input
|
The question is that is it possible for us to detect |
I think it may be difficult to detect automatically, only user knows which ones are labels. But we can provide an extra argument for user to specify label indexes like [3, 4]? |
Do we have multi-label unit test, particularly for Spark DataFrame intput? |
|
I mean multiple label not multiple inputs... |
no then, may we add in another pr |
Description
when model has only one input which is a list or tuple consists of tensors, we should not extract it in args.
Basic Assumption:
There are only three possible types in features: torch.Tensor, list\tuple and dict
features and lables type list:
When will features be a single tensor?
When will features be a list or tuple?
When will features be a dict?
only when input is XShards of dictionary
1. Why the change?
#5762
In some case, the model does take
x
as a list of two tensors as input:code
but our torchrunner will extract this as two separated ones:
https://github.com/intel-analytics/BigDL/blob/affe54803c320afd4fc0631dc3fa02f8be1cfcdc/python/orca/src/bigdl/orca/learn/pytorch/training_operator.py#L279
2. User API changes
none
3. Summary of the change
before: output = self.model(*features)after: output = self.model(*features) if not isSingleListInput else self.model(features)if data is a pt dataloader of creator,
reload_dataloader_creator
wil combine all elements besideslables
into a list, and if feature consists of only one tensor it remains the same:and will parse features here:
This ensure the consistency of *features and user input.
And current df, xshard and raydataset logic is right, we keep it safe.
4. How to test?