Runtime estimator #1991

mvpatel2000 · 2023-02-22T22:21:32Z

What does this PR do?

Estimates remaining runtime. Uses a simple timer interpolation + correction for eval. Graph shows a "normal" run and a corrected run with eval.

Note that this doesn't always work super smoothly, e.g. resnet estimates are not great because first epoch is a lot slower due to dataloader issues.

We will probably hit more issues as well later on, so this will require corrections.

Import Questions:

is there a different place to log this? wall_clock/ is intuitive, but everything else is in seconds so we need to report in seconds (whereas we would prefer to probably report hours?)

What issue(s) does this change relate to?

CO-1664

…mvpatel2000/composer into mvpatel2000/bottom-up-runtime-estimator

mvpatel2000 · 2023-02-23T03:37:59Z

Note: The weird spikes in the wandb graph are because wandb associates the last logged time for a given step, which makes graph weird when x-axis is time. I spent a while investigating -- happy to discuss offline since it's a little too complicated to writeup. As a side effect, @eracah and I discovered where timestamp is advanced for batches vs. epochs compared to when metrics are calculated is inconsistent -- along for the ride I reorder the epoch one to be consistent.

Before and after for metrics not changing

Glabred

say sorry to me

Glabred

jk

Let's discuss offline... we should get you through a starter task and maybe a training project before you're ready

dakinggg

LGTM with some comments

composer/trainer/trainer.py

composer/callbacks/runtime_estimator.py

dakinggg · 2023-02-24T08:31:37Z

Also, I'm willing to merge this with just manual tests, but please do add some unit tests in a follow on pr

tests/common/models.py

bcui19

LGTM, after addressing comments. +1 to adding tests in a separate PR

mvpatel2000 · 2023-02-24T18:46:35Z

Added a test

mvpatel2000 added 14 commits February 7, 2023 19:39

checkdown

797c50b

Merge branch 'dev' into mvpatel2000/bottom-up-runtime-estimator

4a1b452

checkdown

10ad629

add runtime estimator

cb35e15

exprot

959dbeb

add prints

f8aeaa1

tweak logs

a884447

fit start

3a958a1

fix start time

eb456e4

update

db58030

update guards

6b80706

add eval adjustment

947130a

update comments

2a04051

revert speed monitor changes

93aa9a7

mvpatel2000 requested review from dakinggg and eracah February 22, 2023 22:24

mvpatel2000 added 7 commits February 22, 2023 14:25

Merge branch 'dev' into mvpatel2000/bottom-up-runtime-estimator

40ffdc2

add logs

e4f6d1a

Merge branch 'mvpatel2000/bottom-up-runtime-estimator' of github.com:…

69b5515

…mvpatel2000/composer into mvpatel2000/bottom-up-runtime-estimator

add more logs

4c6eaee

move timestamp advance

3bd132d

revert

b9321ab

Merge branch 'dev' into mvpatel2000/bottom-up-runtime-estimator

b86e6f9

mvpatel2000 added 6 commits February 23, 2023 12:02

simplify ghost batchnorm

eaca75c

scale down image sizes

ce6abad

Merge branch 'dev' into mvpatel2000/bottom-up-runtime-estimator

e7e839d

tweak tests

a0c6d4b

add norms

a9e1187

add norms

dfdd866

mvpatel2000 mentioned this pull request Feb 24, 2023

Low precision groupnorm #1976

Merged

mvpatel2000 added 3 commits February 23, 2023 16:53

reset

4f682a9

make none

2c2645e

fix change

e4d92cb

Glabred previously requested changes Feb 24, 2023

View reviewed changes

eracah requested review from Glabred and removed request for Glabred February 24, 2023 02:28

Glabred self-requested a review February 24, 2023 02:42

Glabred reviewed Feb 24, 2023

View reviewed changes

mvpatel2000 added 3 commits February 23, 2023 20:40

add warning

b85d2f1

update filter

d0e07df

Merge branch 'dev' into mvpatel2000/bottom-up-runtime-estimator

2369238

mvpatel2000 requested a review from bandish-shah February 24, 2023 05:22

mvpatel2000 requested a review from bcui19 February 24, 2023 05:53

dakinggg approved these changes Feb 24, 2023

View reviewed changes

bcui19 reviewed Feb 24, 2023

View reviewed changes

tests/common/models.py Show resolved Hide resolved

bcui19 approved these changes Feb 24, 2023

View reviewed changes

mvpatel2000 added 2 commits February 24, 2023 10:16

respond to comments

5715e53

update ignore warnings

1b952a9

mvpatel2000 requested a review from a team as a code owner February 24, 2023 18:46

hanlint approved these changes Feb 24, 2023

View reviewed changes

mvpatel2000 merged commit 850e34d into mosaicml:dev Feb 24, 2023

mvpatel2000 deleted the mvpatel2000/bottom-up-runtime-estimator branch February 24, 2023 19:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Runtime estimator #1991

Runtime estimator #1991

mvpatel2000 commented Feb 22, 2023 •

edited

Loading

mvpatel2000 commented Feb 23, 2023 •

edited

Loading

Glabred left a comment

Glabred left a comment

dakinggg left a comment

dakinggg commented Feb 24, 2023

bcui19 left a comment

mvpatel2000 commented Feb 24, 2023

Runtime estimator #1991

Runtime estimator #1991

Conversation

mvpatel2000 commented Feb 22, 2023 • edited Loading

What does this PR do?

What issue(s) does this change relate to?

mvpatel2000 commented Feb 23, 2023 • edited Loading

Glabred left a comment

Choose a reason for hiding this comment

Glabred left a comment

Choose a reason for hiding this comment

dakinggg left a comment

Choose a reason for hiding this comment

dakinggg commented Feb 24, 2023

bcui19 left a comment

Choose a reason for hiding this comment

mvpatel2000 commented Feb 24, 2023

mvpatel2000 commented Feb 22, 2023 •

edited

Loading

mvpatel2000 commented Feb 23, 2023 •

edited

Loading