-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[tensor] refactor colo-tensor #992
Conversation
ver217
commented
May 17, 2022
•
edited
Loading
edited
- ColoTensor inherits torch.Tensor
- Test graph is broken and must be fixed in the next PR.
- Lazy Init of ColoTensor is not supported any more.
- Remove torch.sum and torch.mean from element wise ops. I don't think they are element wise.
colossalai/tensor/colo_tensor.py
Outdated
|
||
def replace_tensor_with_colo(func): | ||
func = _COLOSSAL_OPS[func] | ||
return super().__torch_function__(func, types, args, kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
confusing about this line? why not return _COLOSSAL_OPS[func](types, args, kwargs, None) as before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super().__torch_function__
handles type conversion automatically and disable torch function in this context. See https://github.com/pytorch/pytorch/blob/0e351c7df9c41367bca94b824e20f68ea8f165b4/torch/_tensor.py#L1112
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As ColoTensor inherits torch.Tensor, we must disable torch_function in our implementations of torch functions. This avoids infinite recursion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. You are smart.
from typing import Union, Optional | ||
from colossalai.tensor import ColoTensor | ||
|
||
GeneralTensor = Union[ColoTensor, torch.Tensor] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we still need this Union?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's a subset of torch.Tensor now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because input tensor can be torch.Tensor
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but their attributes and methods are slightly different. This helps auto completion when using IDE.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Wesley. ColoTensor is a subclass of Tensor. Then a Union of them is unnecessary. As for the auto-completion issue, it may not be a good idea to confuse users for this reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think we should distinguish them. ColoTensor is torch.Tensor, but torch.Tensor is not ColoTensor.
spec: TensorSpec = TensorSpec(dist_spec.replicate())) -> 'ColoParameter': | ||
tensor = tensor.as_subclass(ColoParameter) | ||
tensor.__init__(tensor, requires_grad=requires_grad, spec=spec) | ||
return tensor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about using classmethod and as_subclass(cls) in ColoTensor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ColoTensor.from_torch_tensor
and ColoParameter.from_torch_tensor
take different arguments. Although we can make from_torch_tensor
class method which takes *args
and **kwargs
, in order to make the function clear and friendly for IDE auto completion, I override it.
colossalai/tensor/_ops/addmm.py
Outdated
parallel_action = mat2.spec.get_action_by_compute_pattern(ComputePattern.TP1D) | ||
# mat1:S[1] x mat2:S[0] = Output:P | ||
# beta * input + alpha * All-Reduce(Output) = res | ||
|
||
mat1.to_dist_spec(dist_spec.shard(mat2.spec.get_process_group(), [-1], [mat2.spec.get_process_group().size()])) | ||
mat1 = mat1.convert_to_dist_spec( | ||
dist_spec.shard(mat2.spec.get_process_group(), [-1], [mat2.spec.get_process_group().size()])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not add a new API get_process_group_size().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DONE
colossalai/tensor/_ops/addmm.py
Outdated
@@ -1,64 +1,60 @@ | |||
import torch | |||
from typing import Union | |||
from colossalai.tensor.op_wrapper import colo_op_impl | |||
from colossalai.nn.layer.parallel_1d._utils import reduce_input, reduce_grad | |||
from colossalai.tensor import ComputePattern, TensorSpec, ComputePattern, ParallelAction, ColoTensor | |||
from colossalai.tensor import dist_spec |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I consistently confused the package name with a variable name. Because you tend to use the short names and _ for local variables.
How about name it as distspec?
As PEP 8 clearly describes it:
Package and Module Names
Modules should have short, all-lowercase names. Underscores can be used in the module name if it improves readability. Python packages should also have short, all-lowercase names, although the use of underscores is discouraged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DONE
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So do you mean file names should have no underscore? I think there are many cases like this in colo tensor files.
colossalai/tensor/_ops/embedding.py
Outdated
elif weight.spec.has_compute_pattern(ComputePattern.TP1D): # Single Model Parallel Applied | ||
if weight.spec.is_1D_row(): | ||
return colo_embedding_1Drow(input_tensor, weight, args, kwargs) | ||
return colo_embedding_1Drow(input_tensor, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The following code snippet is duplicated. Typically, it is better to use a general API as colo_tensor_1D(XXXX, type = 'col'|'row').
But I know you would like to make the style consist with the other Ops.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DONE
ModelOutput.__post_init__ = _post_init_colotensor | ||
|
||
|
||
class GPTLMModel(nn.Module): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move model def it the commons.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test_tensor
folder have some redundancy codes which can be moved to utils. I will refactor them and move GPT model to tests.components_to_test
in the next PR.
|
||
attr = getattr(self._torch_tensor, name) | ||
def convert_to_dist_spec_(self, dist_spec: _DistSpec) -> None: | ||
with DistSpecManager.no_grad(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the user can not understand why we need call no_grad() here.
The common case of no_grad usage is fwd without grad by torch.no_grad()
, which is completely different what you did here.,
hide the no_grad() for user? Just with DistSpecManager()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It also means "fwd with out grad". In this context, autograd Function is not applied.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, as a developer, I know you do some communication inside trans_spec, and you do not want to trigger grad_fn, or something related to grads like that.
But as a user, I would assume handle_trans_spec will lead to grad-related ops?
colossalai/tensor/_ops/addmm.py
Outdated
@@ -1,64 +1,67 @@ | |||
from sqlalchemy import func |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like a auto completion error...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://segmentfault.com/a/1190000041428523 for reference
BATCH_SIZE = 4 | ||
SEQ_LEN = 1024 | ||
VOCAB_SIZE = 50304 | ||
NUM_STEPS = 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
upper_case means global vars. not local ones
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will refactor this file in the next PR.
|
||
attr = getattr(self._torch_tensor, name) | ||
def convert_to_dist_spec_(self, dist_spec: _DistSpec) -> None: | ||
with DistSpecManager.no_grad(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, as a developer, I know you do some communication inside trans_spec, and you do not want to trigger grad_fn, or something related to grads like that.
But as a user, I would assume handle_trans_spec will lead to grad-related ops?