Need more runtime hooks during a training step #250

kurisusnowdeng · 2022-02-23T02:16:28Z

Describe the feature

In the PyTorch fashion, we usually train a model like

for x, y in dataloader:
    ... # do something before forward
    out = model(x)
    loss = criterion(out, y)
    ... # do something between forward and backward
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    ... # do something after backward

In the trainer of Colossal-AI, it is only allowed to add hooks before and after a training step, while users cannot customize the behaviors between fetching an input batch and forward pass, or between forward and backward pass.
Also, since the OpHook is applied to modules recursively, it is not appropriate for this issue either. We may need to add at least two more hooks as mentioned above.

FrankLeeeee · 2022-02-23T02:19:14Z

Data fetching, forward pass and back prop are implemented in the schedule. Thus, I don't think they are trainer hooks. Is there any use case for such hooks?

kurisusnowdeng · 2022-02-23T02:27:50Z

Data fetching, forward pass and back prop are implemented in the schedule. Thus, I don't think they are trainer hooks. Is there any use case for such hooks?

Correct, and that is why I didnt call them trainer hooks.
There are some cases that this can be helpful, like splitting the batch in tensor parallelisms, applying mixup, etc.
And the main issue is, such customization is allowed by PyTorch but currently not allowed by Colossal-AI.

FrankLeeeee · 2022-02-23T02:31:43Z

I do agree that this is not supported by Colossal-AI. I found these use cases are indeed not related to schedule if we are adding hooks to schedule.Splitting the batch can be done at the dataset/dataloader or the first layer of model and applying mixup should be done at the dataset/dataloader.

kurisusnowdeng · 2022-02-23T02:36:21Z

I do agree that this is not supported by Colossal-AI. I found these use cases are indeed not related to schedule if we are adding hooks to schedule.Splitting the batch can be done at the dataset/dataloader or the first layer of model and applying mixup should be done at the dataset/dataloader.

I am also not sure how to implement such hooks. Just open the issue to collect ideas.

FrankLeeeee · 2022-02-23T02:39:36Z

I think if we can abstract this part, it will provide some flexibility and extensibility to the schedule class. For example, there is a batch_data_process_func parameter to allow some customization (e.g. apply mixup if a user really wants to).

binmakeswell · 2023-04-13T03:41:10Z

We have updated a lot. This issue was closed due to inactivity. Thanks.

kurisusnowdeng added the enhancement New feature or request label Feb 23, 2022

binmakeswell closed this as completed Apr 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Need more runtime hooks during a training step #250

Need more runtime hooks during a training step #250

kurisusnowdeng commented Feb 23, 2022

FrankLeeeee commented Feb 23, 2022

kurisusnowdeng commented Feb 23, 2022

FrankLeeeee commented Feb 23, 2022

kurisusnowdeng commented Feb 23, 2022

FrankLeeeee commented Feb 23, 2022

binmakeswell commented Apr 13, 2023

Need more runtime hooks during a training step #250

Need more runtime hooks during a training step #250

Comments

kurisusnowdeng commented Feb 23, 2022

Describe the feature

FrankLeeeee commented Feb 23, 2022

kurisusnowdeng commented Feb 23, 2022

FrankLeeeee commented Feb 23, 2022

kurisusnowdeng commented Feb 23, 2022

FrankLeeeee commented Feb 23, 2022

binmakeswell commented Apr 13, 2023