Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[zero] add zero optimizer for ColoTensor #1046

Merged
merged 13 commits into from
Jun 2, 2022
Merged

Conversation

ver217
Copy link
Member

@ver217 ver217 commented May 31, 2022

No description provided.

@ver217 ver217 marked this pull request as ready for review June 1, 2022 07:05
UNSCALED = 1


class ZeroOptimizer(ColossalaiOptimizer):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be put in the zero module?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

torch_model.train()
set_seed(gpc.get_local_rank(ParallelMode.DATA))
for i, (input_ids, attn_mask) in enumerate(train_dataloader):
if i > 1:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why only one pass? will it fail if i > 5?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test takes too long if i > 5

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, but you should have more robust test like i > 3

tests/test_tensor/test_zero_optim.py Show resolved Hide resolved
max_scale=max_scale)
self._found_overflow: torch.Tensor = torch.zeros(1, dtype=torch.int64, device=torch.cuda.current_device())
self.dp_process_group = gpc.get_group(ParallelMode.DATA)
self.mp_process_group = gpc.get_group(ParallelMode.MODEL)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why don't we have a ParallelMode.Global?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@FrankLeeeee shall we add this mode?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think implementing it is not difficult. You can consider it as a patch in future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have this mode actually...

for p in group['params']:
if not self.module.chunk_manager.is_chunk_free(p):
fp32_p = self.fp16_param_to_fp32_param[p]
self.module.chunk_manager.copy_tensor_to_chunk_slice(p, fp32_p)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copy in tensors is inefficient....
You should add a TODO and improve this line later?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@@ -68,6 +68,7 @@ def run_gpt(init_spec_func, use_ddp):
for i, (input_ids, attn_mask) in enumerate(train_dataloader):
logits = model(input_ids, attn_mask)
torch_logits = torch_model(input_ids, attn_mask)
print(torch_logits, logits)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this.

@ver217 ver217 merged commit 51b9a49 into main Jun 2, 2022
@ver217 ver217 deleted the feature/colo-tensor-optim branch June 2, 2022 04:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants