Skip to content

Log Epochs in Console and Loggers#680

Merged
yutong-xiang-97 merged 2 commits intomainfrom
yutong-trn-1808-log-epoch
Apr 2, 2026
Merged

Log Epochs in Console and Loggers#680
yutong-xiang-97 merged 2 commits intomainfrom
yutong-trn-1808-log-epoch

Conversation

@yutong-xiang-97
Copy link
Copy Markdown
Contributor

What has changed and why?

As the title suggests.

How has it been tested?

Manually:

[2026-04-01 13:02:26,052][INFO] Train Step 200/200 | Epoch 1 | train_loss: 28.8206 | lr: 0.00005457 | Profiling [ GPU Util 52.3% | GPU Max Mem  9760.3 MB | Step Time 0.27s | Data Time 0.03s |  121 img/s ]
[2026-04-01 13:02:26,149][INFO] Saving the last checkpoint to '/home/yutong/benchmark_logs/debug/1808_log_epoch/checkpoints/last.ckpt'
[2026-04-01 13:02:26,602][INFO] Exporting the last model to '/home/yutong/benchmark_logs/debug/1808_log_epoch/exported_models/exported_last.pt'
[2026-04-01 13:02:26,694][INFO] Validating...
[2026-04-01 13:02:28,281][INFO] Val Step   1/157 | Epoch 1 | Profiling [ GPU Util 30.0% | GPU Max Mem   874.0 MB | Step Time 1.58s | Data Time 0.79s |   20 img/s ]
Screenshot 2026-04-01 at 15 10 32

Did you update CHANGELOG.md?

  • Yes
  • Not needed (internal change)

Did you update the documentation?

  • Yes
  • Not needed (internal change without effects for user)

Copilot AI review requested due to automatic review settings April 1, 2026 13:12
@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

@yutong-xiang-97
Copy link
Copy Markdown
Contributor Author

/review

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds 1-based epoch information to both console step logs and structured logger outputs during training/validation, derived from the training step count, train dataloader length, and gradient accumulation.

Changes:

  • Compute a current_epoch during the training loop and pass it into console logging for train/val steps.
  • Add get_training_epoch(...) helper to derive a 1-based epoch from step and batch counts.
  • Include epoch as a logged scalar in log_fabric(...) and document the addition in CHANGELOG.md.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
src/lightly_train/_commands/train_task.py Computes current_epoch each step and plumbs it into console + fabric logging for train/val.
src/lightly_train/_commands/train_task_helpers.py Adds epoch derivation helper and extends log_step / log_fabric to emit epoch.
CHANGELOG.md Notes the new 1-based epoch logging behavior.

Comment thread src/lightly_train/_commands/train_task_helpers.py
Comment thread src/lightly_train/_commands/train_task.py
@yutong-xiang-97 yutong-xiang-97 merged commit 9697d61 into main Apr 2, 2026
20 checks passed
@yutong-xiang-97 yutong-xiang-97 deleted the yutong-trn-1808-log-epoch branch April 2, 2026 08:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants