[QEff. Finetune] Adding the support to resume the fine tuning using pre computed #233

quic-swatia · 2025-01-21T09:34:06Z

Adding the support to resume the fine tuning using checkpoints from a prev run which would have stopped in between.
Checkpoints, both intermediate and for complete epoch, will get saved for each epoch through these changes.
There's no necessity to pass tokenizer_name if a model_name is passed. It will take the same name as model_name by default.
If a different tokenizer_name is required than the model_name, then it can be passed separately as an argument in the command.

… prev run whoch would have stopped in between. There's no necessity to pass tokenizer_name if a model_name is passed. It will take the same name as model_name by default. If a different tokenizer_name is required than the model_name, then it can be passed separately as an argument. Signed-off-by: Swati Allabadi <quic_sallabad@quicinc.com>

QEfficient/finetune/configs/training.py

…ers into finetune Signed-off-by: Swati Allabadi <quic_sallabad@quicinc.com>

Signed-off-by: Swati Allabadi <quic_sallabad@quicinc.com>

…ts and check for loss convergence. Signed-off-by: Swati Allabadi <quic_sallabad@quicinc.com>

quic-mamta · 2025-02-24T08:00:16Z

If we don't change output_dir then after resuming FT, will new tensorboard data be appended to previous tensorboard log files?

quic-swatia · 2025-02-24T20:38:09Z

Irrespective of the value of the output_dir, the tensorboard files get saved inside directory named runs. For each fine tuning job, a new directory is created inside runs. So, if we run the following command : "tensorboard --logdir runs --bind_all", tensorboard data from both the jobs will show up together in a single plot.

…re computed (#233) 1) Adding the support to resume the fine tuning using checkpoints from a prev run which would have stopped in between. 2) Checkpoints, both intermediate and for complete epoch, will get saved for each epoch through these changes. 3) There's no necessity to pass tokenizer_name if a model_name is passed. It will take the same name as model_name by default. If a different tokenizer_name is required than the model_name, then it can be passed separately as an argument in the command. --------- Signed-off-by: Swati Allabadi <quic_swatia@quicinc.com> Co-authored-by: Swati Allabadi <quic-swatia@quicinc.com>

…re computed (quic#233) 1) Adding the support to resume the fine tuning using checkpoints from a prev run which would have stopped in between. 2) Checkpoints, both intermediate and for complete epoch, will get saved for each epoch through these changes. 3) There's no necessity to pass tokenizer_name if a model_name is passed. It will take the same name as model_name by default. If a different tokenizer_name is required than the model_name, then it can be passed separately as an argument in the command. --------- Signed-off-by: Swati Allabadi <quic_sallabad@quicinc.com> Co-authored-by: Swati Allabadi <quic-swatia@quicinc.com> Signed-off-by: Rishin Raj <quic_rishinr@quicinc.com>

…re computed (quic#233) 1) Adding the support to resume the fine tuning using checkpoints from a prev run which would have stopped in between. 2) Checkpoints, both intermediate and for complete epoch, will get saved for each epoch through these changes. 3) There's no necessity to pass tokenizer_name if a model_name is passed. It will take the same name as model_name by default. If a different tokenizer_name is required than the model_name, then it can be passed separately as an argument in the command. --------- Signed-off-by: Swati Allabadi <quic_sallabad@quicinc.com> Co-authored-by: Swati Allabadi <quic-swatia@quicinc.com> Signed-off-by: Dipankar Sarkar <quic_dipankar@quicinc.com>

…re computed (quic#233) 1) Adding the support to resume the fine tuning using checkpoints from a prev run which would have stopped in between. 2) Checkpoints, both intermediate and for complete epoch, will get saved for each epoch through these changes. 3) There's no necessity to pass tokenizer_name if a model_name is passed. It will take the same name as model_name by default. If a different tokenizer_name is required than the model_name, then it can be passed separately as an argument in the command. --------- Signed-off-by: Swati Allabadi <quic_swatia@quicinc.com> Co-authored-by: Swati Allabadi <quic-swatia@quicinc.com>

…re computed (quic#233) 1) Adding the support to resume the fine tuning using checkpoints from a prev run which would have stopped in between. 2) Checkpoints, both intermediate and for complete epoch, will get saved for each epoch through these changes. 3) There's no necessity to pass tokenizer_name if a model_name is passed. It will take the same name as model_name by default. If a different tokenizer_name is required than the model_name, then it can be passed separately as an argument in the command. --------- Signed-off-by: Swati Allabadi <quic_swatia@quicinc.com> Co-authored-by: Swati Allabadi <quic-swatia@quicinc.com> Signed-off-by: eplatero <quic_eplatero@quicinc.com>

quic-swatia requested review from ochougul and quic-rishinr as code owners January 21, 2025 09:34

quic-amitraj requested a review from vbaddi January 21, 2025 10:38

quic-amitraj marked this pull request as draft January 21, 2025 10:39

quic-amitraj assigned quic-amitraj and quic-swatia and unassigned quic-amitraj Jan 21, 2025

quic-mamta requested changes Jan 23, 2025

View reviewed changes

QEfficient/finetune/configs/training.py Show resolved Hide resolved

quic-swatia added 2 commits January 23, 2025 17:12

Merge branch 'quic:main' into finetune

3ab4a33

Merge branch 'quic:main' into finetune

be0c754

quic-swatia marked this pull request as ready for review January 28, 2025 07:44

quic-swatia force-pushed the finetune branch from 42a11fe to 8cd7964 Compare January 28, 2025 08:19

Merge branch 'finetune' of github.com:quic-swatia/efficient-transform…

da18e86

…ers into finetune Signed-off-by: Swati Allabadi <quic_sallabad@quicinc.com>

quic-swatia force-pushed the finetune branch from 8cd7964 to da18e86 Compare January 28, 2025 14:17

quic-swatia force-pushed the finetune branch 2 times, most recently from 0dfaa2f to da18e86 Compare February 19, 2025 21:45

Merge branch 'main' into finetune

6e729fe

Signed-off-by: Swati Allabadi <quic_sallabad@quicinc.com>

quic-swatia force-pushed the finetune branch from 0d4affa to 6e729fe Compare February 20, 2025 08:27

Adding changes of both together: resuming fine tuning using checkpoin…

e037095

…ts and check for loss convergence. Signed-off-by: Swati Allabadi <quic_sallabad@quicinc.com>

quic-swatia requested a review from quic-mamta February 24, 2025 07:35

quic-mamta approved these changes Feb 25, 2025

View reviewed changes

vbaddi approved these changes Mar 18, 2025

View reviewed changes

quic-swatia merged commit f3d87ad into quic:main Mar 18, 2025
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[QEff. Finetune] Adding the support to resume the fine tuning using pre computed #233

[QEff. Finetune] Adding the support to resume the fine tuning using pre computed #233

Uh oh!

quic-swatia commented Jan 21, 2025 •

edited

Loading

Uh oh!

Uh oh!

quic-mamta commented Feb 24, 2025

Uh oh!

quic-swatia commented Feb 24, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[QEff. Finetune] Adding the support to resume the fine tuning using pre computed #233

[QEff. Finetune] Adding the support to resume the fine tuning using pre computed #233

Uh oh!

Conversation

quic-swatia commented Jan 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

quic-mamta commented Feb 24, 2025

Uh oh!

quic-swatia commented Feb 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

quic-swatia commented Jan 21, 2025 •

edited

Loading

quic-swatia commented Feb 24, 2025 •

edited

Loading