`print` fix in `lr_scheduler` #68338

OverLordGoldDragon · 2021-11-15T13:11:08Z

{:5d} fails for CosineAnnealingWarmRestarts which has float epoch

pytorch-probot · 2021-11-15T13:11:11Z

CI Flow Status

⚛️ CI Flow

Ruleset - Version: v1
Ruleset - File: https://github.com/OverLordGoldDragon/pytorch/blob/9b0b5028b3ee091e7c671f0ea640b58bc854a248/.github/generated-ciflow-ruleset.json
PR ciflow labels: ciflow/default

Workflows	Labels (bold enabled)	Status
Triggered Workflows
linux-bionic-py3.6-clang9	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/noarch`, `ciflow/xla`	✅ triggered
linux-docs	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/docs`, `ciflow/linux`	✅ triggered
linux-vulkan-bionic-py3.6-clang9	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/vulkan`	✅ triggered
linux-xenial-cuda11.3-py3.6-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/default`, `ciflow/linux`	✅ triggered
linux-xenial-py3-clang5-mobile-build	`ciflow/all`, `ciflow/default`, `ciflow/linux`, `ciflow/mobile`	✅ triggered
linux-xenial-py3-clang5-mobile-custom-build-static	`ciflow/all`, `ciflow/default`, `ciflow/linux`, `ciflow/mobile`	✅ triggered
linux-xenial-py3.6-clang7-asan	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/sanitizers`	✅ triggered
linux-xenial-py3.6-clang7-onnx	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/onnx`	✅ triggered
linux-xenial-py3.6-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`	✅ triggered
linux-xenial-py3.6-gcc7	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`	✅ triggered
linux-xenial-py3.6-gcc7-bazel-test	`ciflow/all`, `ciflow/bazel`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`	✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single	`ciflow/all`, `ciflow/android`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`	✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit	`ciflow/all`, `ciflow/android`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`	✅ triggered
win-vs2019-cpu-py3	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/win`	✅ triggered
win-vs2019-cuda11.3-py3	`ciflow/all`, `ciflow/cuda`, `ciflow/default`, `ciflow/win`	✅ triggered
Skipped Workflows
caffe2-linux-xenial-py3.6-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`	🚫 skipped
docker-builds	`ciflow/all`	🚫 skipped
ios-12-5-1-arm64	`ciflow/all`, `ciflow/ios`, `ciflow/macos`	🚫 skipped
ios-12-5-1-arm64-coreml	`ciflow/all`, `ciflow/ios`, `ciflow/macos`	🚫 skipped
ios-12-5-1-arm64-custom-ops	`ciflow/all`, `ciflow/ios`, `ciflow/macos`	🚫 skipped
ios-12-5-1-arm64-full-jit	`ciflow/all`, `ciflow/ios`, `ciflow/macos`	🚫 skipped
ios-12-5-1-arm64-metal	`ciflow/all`, `ciflow/ios`, `ciflow/macos`	🚫 skipped
ios-12-5-1-x86-64	`ciflow/all`, `ciflow/ios`, `ciflow/macos`	🚫 skipped
ios-12-5-1-x86-64-coreml	`ciflow/all`, `ciflow/ios`, `ciflow/macos`	🚫 skipped
ios-12-5-1-x86-64-full-jit	`ciflow/all`, `ciflow/ios`, `ciflow/macos`	🚫 skipped
libtorch-linux-xenial-cuda10.2-py3.6-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`	🚫 skipped
libtorch-linux-xenial-cuda11.3-py3.6-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`	🚫 skipped
linux-bionic-cuda10.2-py3.9-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/slow`	🚫 skipped
linux-docs-push	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
macos-10-15-py3-arm64	`ciflow/all`, `ciflow/macos`	🚫 skipped
macos-10-15-py3-lite-interpreter-x86-64	`ciflow/all`, `ciflow/macos`	🚫 skipped
macos-11-py3-x86-64	`ciflow/all`, `ciflow/macos`	🚫 skipped
parallelnative-linux-xenial-py3.6-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`	🚫 skipped
periodic-libtorch-linux-bionic-cuda11.5-py3.6-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-libtorch-linux-xenial-cuda11.1-py3.6-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-linux-bionic-cuda11.5-py3.6-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`, `ciflow/slow`, `ciflow/slow-gradcheck`	🚫 skipped
periodic-linux-xenial-cuda11.1-py3.6-gcc7-debug	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-win-vs2019-cuda11.1-py3	`ciflow/all`, `ciflow/cuda`, `ciflow/scheduled`, `ciflow/win`	🚫 skipped

You can add a comment to the PR and tag @pytorchbot with the following commands:

# ciflow rerun, "ciflow/default" will always be added automatically
@pytorchbot ciflow rerun

# ciflow rerun with additional labels "-l <ciflow/label_name>", which is equivalent to adding these labels manually and trigger the rerun
@pytorchbot ciflow rerun -l ciflow/scheduled -l ciflow/slow

For more information, please take a look at the CI Flow Wiki.

facebook-github-bot · 2021-11-15T13:11:13Z

🔗 Helpful links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/68338
🔧 Opt-in to CIFlow to control what jobs run on your PRs

💊 CI failures summary and remediations

As of commit 9b0b502 (more details on the Dr. CI page):

💚 💚 Looks good so far! There are no failures yet. 💚 💚

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

mrshenli

cc @albanD should we fix CosineAnnealingWarmRestarts to use integer epoch instead?

pytorch/torch/optim/lr_scheduler.py

Line 1300 in 94b6fa6

self.last_epoch = math.floor(epoch)

OverLordGoldDragon · 2021-11-16T12:44:31Z

@mrshenli Float is used in compute logic and wouldn't make sense to print the same epoch with different lr over iterations. Also epoch rather than self.last_epoch is passed to print_lr.

Maybe override print_lr, or condition :5d vs :5f on epoch.is_integer()

albanD

Float is used in compute logic

From checking the code, I'm not sure to see where the value of epoch actually needs to be a float? It seems that we only ever increment it by 1 no?

albanD · 2021-11-18T14:46:47Z

torch/optim/lr_scheduler.py

@@ -110,7 +110,7 @@ def print_lr(self, is_verbose, group, lr, epoch=None):
                print('Adjusting learning rate'
                      ' of group {} to {:.4e}.'.format(group, lr))
            else:
-                print('Epoch {:5d}: adjusting learning rate'
+                print('Epoch {:5f}: adjusting learning rate'


This would lead to many digits after the . in all cases. The print would look weird I think a bit un-expected I think.

OverLordGoldDragon · 2021-11-19T07:17:40Z

Cosine LR can restart every epoch, so intermediate values require epoch % 1 < 1; from example:

scheduler.step(epoch + i / iters)

Updated code.

albanD · 2021-11-22T14:41:30Z

torch/optim/lr_scheduler.py

@@ -929,7 +931,7 @@ def _reduce_lr(self, epoch):
            if old_lr - new_lr > self.eps:
                param_group['lr'] = new_lr
                if self.verbose:
-                    print('Epoch {:5d}: reducing learning rate'
+                    print('Epoch {:5f}: reducing learning rate'


Thanks for the fix! This one needs to be updated the same way right?

albanD · 2021-12-13T15:14:24Z

Looks good! Can you rebase on top of master to make sure the CI runs please?

OverLordGoldDragon · 2021-12-13T15:19:42Z

I'm not familiar with quick rebasing, can reopen PR if needed.

albanD · 2021-12-13T16:02:52Z

Ok, let me do it then.

albanD · 2021-12-13T16:07:19Z

btw this is what I did:

# Get latest master
git checkout master
git pull
# Go back to your branch
git checkout OverLordGoldDragon:patch-1
# Rebase on top of master
git rebase master
# Force push the change to this PR
git push -f

OverLordGoldDragon · 2021-12-13T16:39:35Z

It's what I had in mind but the repos I've tried it on gave a lot of trouble - guess PyTorch is much better structured. Thanks

facebook-github-bot · 2021-12-13T17:41:55Z

@albanD has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

albanD · 2021-12-13T17:42:00Z

No worries. I'll land this now.

OverLordGoldDragon · 2021-12-14T17:32:25Z

To be clear, are contributions still credited?

OverLordGoldDragon requested a review from albanD as a code owner November 15, 2021 13:11

pytorch-probot bot added the ciflow/default label Nov 15, 2021

facebook-github-bot added the cla signed label Nov 15, 2021

pytorchbot added the open source label Nov 15, 2021

mrshenli reviewed Nov 15, 2021

View reviewed changes

albanD reviewed Nov 18, 2021

View reviewed changes

ngimel added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Nov 18, 2021

albanD reviewed Nov 22, 2021

View reviewed changes

albanD mentioned this pull request Dec 13, 2021

CosineAnnealingWarmRestarts should use integer epoch #69841

Open

OverLordGoldDragon and others added 3 commits December 13, 2021 11:06

print fix in lr_scheduler

80d8757

Handle int and float separately

aad2640

apply to ReduceLROnPlateau

9b0b502

albanD force-pushed the patch-1 branch from 28dd7a2 to 9b0b502 Compare December 13, 2021 16:07

albanD approved these changes Dec 13, 2021

View reviewed changes

facebook-github-bot closed this in fdcb78d Dec 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`print` fix in `lr_scheduler` #68338

`print` fix in `lr_scheduler` #68338

OverLordGoldDragon commented Nov 15, 2021 •

edited

pytorch-probot bot commented Nov 15, 2021 •

edited

⚛️ CI Flow

facebook-github-bot commented Nov 15, 2021 •

edited

mrshenli left a comment

OverLordGoldDragon commented Nov 16, 2021

albanD left a comment

albanD Nov 18, 2021

OverLordGoldDragon commented Nov 19, 2021

albanD Nov 22, 2021

OverLordGoldDragon Dec 11, 2021

albanD commented Dec 13, 2021

OverLordGoldDragon commented Dec 13, 2021

albanD commented Dec 13, 2021

albanD commented Dec 13, 2021

OverLordGoldDragon commented Dec 13, 2021

facebook-github-bot commented Dec 13, 2021

albanD commented Dec 13, 2021 •

edited

OverLordGoldDragon commented Dec 14, 2021

print fix in lr_scheduler #68338

print fix in lr_scheduler #68338

Conversation

OverLordGoldDragon commented Nov 15, 2021 • edited

pytorch-probot bot commented Nov 15, 2021 • edited

⚛️ CI Flow

facebook-github-bot commented Nov 15, 2021 • edited

🔗 Helpful links

💊 CI failures summary and remediations

mrshenli left a comment

Choose a reason for hiding this comment

OverLordGoldDragon commented Nov 16, 2021

albanD left a comment

Choose a reason for hiding this comment

albanD Nov 18, 2021

Choose a reason for hiding this comment

OverLordGoldDragon commented Nov 19, 2021

albanD Nov 22, 2021

Choose a reason for hiding this comment

OverLordGoldDragon Dec 11, 2021

Choose a reason for hiding this comment

albanD commented Dec 13, 2021

OverLordGoldDragon commented Dec 13, 2021

albanD commented Dec 13, 2021

albanD commented Dec 13, 2021

OverLordGoldDragon commented Dec 13, 2021

facebook-github-bot commented Dec 13, 2021

albanD commented Dec 13, 2021 • edited

OverLordGoldDragon commented Dec 14, 2021

`print` fix in `lr_scheduler` #68338

`print` fix in `lr_scheduler` #68338

OverLordGoldDragon commented Nov 15, 2021 •

edited

pytorch-probot bot commented Nov 15, 2021 •

edited

facebook-github-bot commented Nov 15, 2021 •

edited

albanD commented Dec 13, 2021 •

edited