Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fix] Enable logging of metrics from Callbacks to ConsoleLogging #1884

Merged
merged 12 commits into from
Jan 19, 2023

Conversation

eracah
Copy link
Contributor

@eracah eracah commented Jan 13, 2023

What does this PR do?

Sample output from examples repo llm/main.py run:

[batch=1/3]:
	 Train epoch: 0
	 Train trainer/global_step: 0
	 Train trainer/batch_idx: 0
	 Train memory/alloc_requests: 38873
	 Train memory/free_requests: 38760
	 Train memory/allocated_mem: 2666514032128
	 Train memory/active_mem: 15283491328
	 Train memory/inactive_mem: 2903010816
	 Train memory/reserved_mem: 31037849600
	 Train memory/alloc_retries: 5
	 Train trainer/device_train_microbatch_size: 1
	 Train loss/train/total: 12.0570
	 Train metrics/train/LanguageCrossEntropy: 12.0570
	 Train metrics/train/Perplexity: 172309.0625
	 Train throughput/batches_per_sec: 0.0503
	 Train throughput/samples_per_sec: 3.2217
	 Train throughput/device/batches_per_sec: 0.0031
	 Train throughput/device/samples_per_sec: 0.2014
	 Train throughput/tokens_per_sec: 6598.0451
	 Train throughput/device/tokens_per_sec: 412.3778
	 Train throughput/flops_per_sec: 1242426587592322.2500
	 Train throughput/device/flops_per_sec: 77651661724520.1406
	 Train throughput/device/mfu: 0.2489
	 Train wall_clock/train: 19.8653
	 Train wall_clock/val: 0.0000
	 Train wall_clock/total: 19.8653
	 Train lr-DecoupledAdamW/group0: 0.0000
/usr/lib/python3/dist-packages/torch/utils/checkpoint.py:31: UserWarning: None of the inputs have requires_grad=True. Gradients will be None
  warnings.warn("None of the inputs have requires_grad=True. Gradients will be None")
[batch=1/3]:
	 Eval LanguageCrossEntropy: 12.0946
	 Eval Perplexity: 179090.5625
[batch=2/3]:
	 Train epoch: 0
	 Train trainer/global_step: 1
	 Train metrics/eval/LanguageCrossEntropy: 12.0946
	 Train metrics/eval/Perplexity: 179090.5625
	 Train trainer/batch_idx: 1
	 Train memory/alloc_requests: 67172
	 Train memory/free_requests: 66933
	 Train memory/allocated_mem: 5004554130432
	 Train memory/active_mem: 30329803776
	 Train memory/inactive_mem: 1221848064
	 Train memory/reserved_mem: 33632026624
	 Train memory/alloc_retries: 186
	 Train trainer/device_train_microbatch_size: 1
	 Train loss/train/total: 12.0565
	 Train metrics/train/LanguageCrossEntropy: 12.0565
	 Train metrics/train/Perplexity: 172214.4375
	 Train throughput/batches_per_sec: 0.0223
	 Train throughput/samples_per_sec: 1.4281
	 Train throughput/device/batches_per_sec: 0.0014
	 Train throughput/device/samples_per_sec: 0.0893
	 Train throughput/tokens_per_sec: 2924.7470
	 Train throughput/device/tokens_per_sec: 182.7967
	 Train throughput/flops_per_sec: 550736380511142.4375
	 Train throughput/device/flops_per_sec: 34421023781946.4023
	 Train throughput/device/mfu: 0.1103
	 Train wall_clock/train: 64.6801
	 Train wall_clock/val: 15.8078
	 Train wall_clock/total: 80.4879
	 Train lr-DecoupledAdamW/group0: 0.0000
[batch=2/3]:
	 Eval LanguageCrossEntropy: 12.0946
	 Eval Perplexity: 179090.5625

What issue(s) does this change relate to?

fix CO-1636

@eracah eracah marked this pull request as ready for review January 19, 2023 00:01
@eracah eracah requested a review from dakinggg as a code owner January 19, 2023 00:01
@mvpatel2000
Copy link
Contributor

Discussed offline, we're going to remove the trainer arg and that arg to the ConsoleLogger

Copy link
Contributor

@mvpatel2000 mvpatel2000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

defer to @abhi-mosaic for approval to verify this solves his issue, but lgtm

composer/trainer/trainer.py Outdated Show resolved Hide resolved
Co-authored-by: Mihir Patel <mihir.v.patel7@gmail.com>
Copy link
Contributor

@dakinggg dakinggg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with one comment

@eracah
Copy link
Contributor Author

eracah commented Jan 19, 2023

defer to @abhi-mosaic for approval to verify this solves his issue, but lgtm

I sent Abhi a console log output from a mosaicml/examples GPT run and he confirmed over DM that it looks good.

@eracah eracah merged commit d8325d5 into mosaicml:dev Jan 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants