Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[tensor] refactor colo-tensor #992

Merged
merged 12 commits into from
May 19, 2022
Merged

[tensor] refactor colo-tensor #992

merged 12 commits into from
May 19, 2022

Conversation

ver217
Copy link
Member

@ver217 ver217 commented May 17, 2022

  1. ColoTensor inherits torch.Tensor
  2. Test graph is broken and must be fixed in the next PR.
  3. Lazy Init of ColoTensor is not supported any more.
  4. Remove torch.sum and torch.mean from element wise ops. I don't think they are element wise.

colossalai/tensor/colo_tensor.py Outdated Show resolved Hide resolved
colossalai/tensor/colo_tensor.py Outdated Show resolved Hide resolved

def replace_tensor_with_colo(func):
func = _COLOSSAL_OPS[func]
return super().__torch_function__(func, types, args, kwargs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

confusing about this line? why not return _COLOSSAL_OPS[func](types, args, kwargs, None) as before?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super().__torch_function__ handles type conversion automatically and disable torch function in this context. See https://github.com/pytorch/pytorch/blob/0e351c7df9c41367bca94b824e20f68ea8f165b4/torch/_tensor.py#L1112

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As ColoTensor inherits torch.Tensor, we must disable torch_function in our implementations of torch functions. This avoids infinite recursion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. You are smart.

@ver217 ver217 marked this pull request as ready for review May 18, 2022 08:51
from typing import Union, Optional
from colossalai.tensor import ColoTensor

GeneralTensor = Union[ColoTensor, torch.Tensor]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we still need this Union?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a subset of torch.Tensor now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because input tensor can be torch.Tensor.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but their attributes and methods are slightly different. This helps auto completion when using IDE.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Wesley. ColoTensor is a subclass of Tensor. Then a Union of them is unnecessary. As for the auto-completion issue, it may not be a good idea to confuse users for this reason.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think we should distinguish them. ColoTensor is torch.Tensor, but torch.Tensor is not ColoTensor.

spec: TensorSpec = TensorSpec(dist_spec.replicate())) -> 'ColoParameter':
tensor = tensor.as_subclass(ColoParameter)
tensor.__init__(tensor, requires_grad=requires_grad, spec=spec)
return tensor
Copy link
Contributor

@Wesley-Jzy Wesley-Jzy May 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about using classmethod and as_subclass(cls) in ColoTensor?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ColoTensor.from_torch_tensor and ColoParameter.from_torch_tensor take different arguments. Although we can make from_torch_tensor class method which takes *args and **kwargs, in order to make the function clear and friendly for IDE auto completion, I override it.

parallel_action = mat2.spec.get_action_by_compute_pattern(ComputePattern.TP1D)
# mat1:S[1] x mat2:S[0] = Output:P
# beta * input + alpha * All-Reduce(Output) = res

mat1.to_dist_spec(dist_spec.shard(mat2.spec.get_process_group(), [-1], [mat2.spec.get_process_group().size()]))
mat1 = mat1.convert_to_dist_spec(
dist_spec.shard(mat2.spec.get_process_group(), [-1], [mat2.spec.get_process_group().size()]))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not add a new API get_process_group_size().

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DONE

colossalai/tensor/_ops/addmm.py Show resolved Hide resolved
@@ -1,64 +1,60 @@
import torch
from typing import Union
from colossalai.tensor.op_wrapper import colo_op_impl
from colossalai.nn.layer.parallel_1d._utils import reduce_input, reduce_grad
from colossalai.tensor import ComputePattern, TensorSpec, ComputePattern, ParallelAction, ColoTensor
from colossalai.tensor import dist_spec
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I consistently confused the package name with a variable name. Because you tend to use the short names and _ for local variables.
How about name it as distspec?

As PEP 8 clearly describes it:

Package and Module Names

Modules should have short, all-lowercase names. Underscores can be used in the module name if it improves readability. Python packages should also have short, all-lowercase names, although the use of underscores is discouraged.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DONE

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So do you mean file names should have no underscore? I think there are many cases like this in colo tensor files.

elif weight.spec.has_compute_pattern(ComputePattern.TP1D): # Single Model Parallel Applied
if weight.spec.is_1D_row():
return colo_embedding_1Drow(input_tensor, weight, args, kwargs)
return colo_embedding_1Drow(input_tensor,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following code snippet is duplicated. Typically, it is better to use a general API as colo_tensor_1D(XXXX, type = 'col'|'row').
But I know you would like to make the style consist with the other Ops.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DONE

ModelOutput.__post_init__ = _post_init_colotensor


class GPTLMModel(nn.Module):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move model def it the commons.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_tensor folder have some redundancy codes which can be moved to utils. I will refactor them and move GPT model to tests.components_to_test in the next PR.


attr = getattr(self._torch_tensor, name)
def convert_to_dist_spec_(self, dist_spec: _DistSpec) -> None:
with DistSpecManager.no_grad():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the user can not understand why we need call no_grad() here.
The common case of no_grad usage is fwd without grad by torch.no_grad(), which is completely different what you did here.,
hide the no_grad() for user? Just with DistSpecManager()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also means "fwd with out grad". In this context, autograd Function is not applied.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, as a developer, I know you do some communication inside trans_spec, and you do not want to trigger grad_fn, or something related to grads like that.
But as a user, I would assume handle_trans_spec will lead to grad-related ops?

@@ -1,64 +1,67 @@
from sqlalchemy import func
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like a auto completion error...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +197 to +200
BATCH_SIZE = 4
SEQ_LEN = 1024
VOCAB_SIZE = 50304
NUM_STEPS = 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

upper_case means global vars. not local ones

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will refactor this file in the next PR.


attr = getattr(self._torch_tensor, name)
def convert_to_dist_spec_(self, dist_spec: _DistSpec) -> None:
with DistSpecManager.no_grad():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, as a developer, I know you do some communication inside trans_spec, and you do not want to trigger grad_fn, or something related to grads like that.
But as a user, I would assume handle_trans_spec will lead to grad-related ops?

@ver217 ver217 merged commit ad536e3 into main May 19, 2022
@ver217 ver217 deleted the refactor/colo-tensor branch May 19, 2022 09:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants