Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need more runtime hooks during a training step #250

Closed
kurisusnowdeng opened this issue Feb 23, 2022 · 6 comments
Closed

Need more runtime hooks during a training step #250

kurisusnowdeng opened this issue Feb 23, 2022 · 6 comments
Labels
enhancement New feature or request

Comments

@kurisusnowdeng
Copy link
Member

Describe the feature

In the PyTorch fashion, we usually train a model like

for x, y in dataloader:
    ... # do something before forward
    out = model(x)
    loss = criterion(out, y)
    ... # do something between forward and backward
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    ... # do something after backward

In the trainer of Colossal-AI, it is only allowed to add hooks before and after a training step, while users cannot customize the behaviors between fetching an input batch and forward pass, or between forward and backward pass.
Also, since the OpHook is applied to modules recursively, it is not appropriate for this issue either. We may need to add at least two more hooks as mentioned above.

@FrankLeeeee
Copy link
Contributor

Data fetching, forward pass and back prop are implemented in the schedule. Thus, I don't think they are trainer hooks. Is there any use case for such hooks?

@kurisusnowdeng
Copy link
Member Author

Data fetching, forward pass and back prop are implemented in the schedule. Thus, I don't think they are trainer hooks. Is there any use case for such hooks?

Correct, and that is why I didnt call them trainer hooks.
There are some cases that this can be helpful, like splitting the batch in tensor parallelisms, applying mixup, etc.
And the main issue is, such customization is allowed by PyTorch but currently not allowed by Colossal-AI.

@FrankLeeeee
Copy link
Contributor

I do agree that this is not supported by Colossal-AI. I found these use cases are indeed not related to schedule if we are adding hooks to schedule.Splitting the batch can be done at the dataset/dataloader or the first layer of model and applying mixup should be done at the dataset/dataloader.

@kurisusnowdeng
Copy link
Member Author

I do agree that this is not supported by Colossal-AI. I found these use cases are indeed not related to schedule if we are adding hooks to schedule.Splitting the batch can be done at the dataset/dataloader or the first layer of model and applying mixup should be done at the dataset/dataloader.

I am also not sure how to implement such hooks. Just open the issue to collect ideas.

@kurisusnowdeng kurisusnowdeng added the enhancement New feature or request label Feb 23, 2022
@FrankLeeeee
Copy link
Contributor

I think if we can abstract this part, it will provide some flexibility and extensibility to the schedule class. For example, there is a batch_data_process_func parameter to allow some customization (e.g. apply mixup if a user really wants to).

@binmakeswell
Copy link
Member

We have updated a lot. This issue was closed due to inactivity. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants