Conversation
WalkthroughThis pull request updates a third-party subproject commit in the Changes
Sequence Diagram(s)sequenceDiagram
participant Trainer as flame/train.py
participant Logger as flame/components/metrics.py
Trainer->>Logger: log(step, avg_loss, max_loss, {gradient_norm: value})
Logger-->>Trainer: Acknowledge logging with extra metrics
Possibly related PRs
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🔇 Additional comments (3)
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
3rdparty/torchtitan (1)
1-1289: Address Pipeline Lint Issues in Third-party CodeThe pipeline log reports multiple lint issues in this file (e.g., E501 for lines that are too long, several E203 whitespace problems, as well as F841/F401 warnings for unused variables and imports). Since these issues are within a third-party subproject, you might consider one of the following:
- Exclude the 3rdparty directory from linting checks if upstream compliance isn’t feasible.
- Coordinate with the upstream repository to resolve these style issues if they impact integration.
Please verify that the lint configuration aligns with your policy regarding third-party code.
🧰 Tools
🪛 GitHub Actions: lint
[error] 65-65: E501 line too long (131 > 127 characters)
[error] 132-132: E203 whitespace before ':'
[error] 146-146: F821 undefined name 'n_microbatches'
[error] 71-71: E203 whitespace before ':'
[error] 39-39: E741 ambiguous variable name 'l'
[error] 59-59: E731 do not assign a lambda expression, use a def
[error] 92-92: E203 whitespace before ':'
[error] 285-285: E203 whitespace before ':'
[error] 948-948: E203 whitespace before ':'
[error] 952-952: E203 whitespace before ':'
[error] 1268-1268: E203 whitespace before ':'
[error] 1289-1289: E203 whitespace before ':'
[error] 205-205: E203 whitespace before ':'
[error] 158-158: E203 whitespace before ':'
[error] 82-82: F841 local variable 'token_indices' is assigned to but never used
[error] 262-262: E203 whitespace before ':'
[error] 306-306: E501 line too long (131 > 127 characters)
[error] 388-388: F841 local variable 'mean_prob' is assigned to but never used
[error] 11-11: F401 'functools' imported but unused
[error] 15-15: F401 'typing.Any' imported but unused
[error] 15-15: F401 'typing.Dict' imported but unused
[error] 15-15: F401 'typing.Optional' imported but unused
[error] 20-20: F401 'triton.Config as TConfig' imported but unused
[error] 21-21: F401 'triton.runtime.driver' imported but unused
[error] 25-25: F401 'tma_autotuning.ALIGN_SIZE_M' imported but unused
[error] 25-25: E402 module level import not at top of file
[error] 316-316: F841 local variable 'c_desc_ptr' is assigned to but never used
[error] 347-347: F841 local variable 'm_offset' is assigned to but never used
[error] 348-348: F841 local variable 'n_offset' is assigned to but never used
[error] 851-851: F841 local variable 'G' is assigned to but never used
[error] 885-885: F541 f-string is missing placeholders
[error] 902-902: F541 f-string is missing placeholders
[error] 1071-1071: F841 local variable 'has_tma_support' is assigned to but never used
[error] 11-11: F401 'functools' imported but unused
[error] 14-14: F401 'typing.Any' imported but unused
[error] 14-14: F401 'typing.Optional' imported but unused
[error] 14-14: F401 'typing.Tuple' imported but unused
[error] 19-19: F401 'triton.Config as TConfig' imported but unused
[error] 8-8: F401 'logging' imported but unused
[error] 13-13: F401 'torch.nn' imported but unused
[error] 45-45: E203 whitespace before ':'
[error] 347-347: E203 whitespace before ':'
[error] 296-296: E203 whitespace before ':'
[error] 338-338: E501 line too long (129 > 127 characters)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
3rdparty/torchtitan(1 hunks)flame/train.py(1 hunks)
🧰 Additional context used
🪛 GitHub Actions: lint
3rdparty/torchtitan
[error] 65-65: E501 line too long (131 > 127 characters)
[error] 132-132: E203 whitespace before ':'
[error] 146-146: F821 undefined name 'n_microbatches'
[error] 71-71: E203 whitespace before ':'
[error] 39-39: E741 ambiguous variable name 'l'
[error] 59-59: E731 do not assign a lambda expression, use a def
[error] 92-92: E203 whitespace before ':'
[error] 285-285: E203 whitespace before ':'
[error] 948-948: E203 whitespace before ':'
[error] 952-952: E203 whitespace before ':'
[error] 1268-1268: E203 whitespace before ':'
[error] 1289-1289: E203 whitespace before ':'
[error] 205-205: E203 whitespace before ':'
[error] 158-158: E203 whitespace before ':'
[error] 82-82: F841 local variable 'token_indices' is assigned to but never used
[error] 262-262: E203 whitespace before ':'
[error] 306-306: E501 line too long (131 > 127 characters)
[error] 388-388: F841 local variable 'mean_prob' is assigned to but never used
[error] 11-11: F401 'functools' imported but unused
[error] 15-15: F401 'typing.Any' imported but unused
[error] 15-15: F401 'typing.Dict' imported but unused
[error] 15-15: F401 'typing.Optional' imported but unused
[error] 20-20: F401 'triton.Config as TConfig' imported but unused
[error] 21-21: F401 'triton.runtime.driver' imported but unused
[error] 25-25: F401 'tma_autotuning.ALIGN_SIZE_M' imported but unused
[error] 25-25: E402 module level import not at top of file
[error] 316-316: F841 local variable 'c_desc_ptr' is assigned to but never used
[error] 347-347: F841 local variable 'm_offset' is assigned to but never used
[error] 348-348: F841 local variable 'n_offset' is assigned to but never used
[error] 851-851: F841 local variable 'G' is assigned to but never used
[error] 885-885: F541 f-string is missing placeholders
[error] 902-902: F541 f-string is missing placeholders
[error] 1071-1071: F841 local variable 'has_tma_support' is assigned to but never used
[error] 11-11: F401 'functools' imported but unused
[error] 14-14: F401 'typing.Any' imported but unused
[error] 14-14: F401 'typing.Optional' imported but unused
[error] 14-14: F401 'typing.Tuple' imported but unused
[error] 19-19: F401 'triton.Config as TConfig' imported but unused
[error] 8-8: F401 'logging' imported but unused
[error] 13-13: F401 'torch.nn' imported but unused
[error] 45-45: E203 whitespace before ':'
[error] 347-347: E203 whitespace before ':'
[error] 296-296: E203 whitespace before ':'
[error] 338-338: E501 line too long (129 > 127 characters)
🔇 Additional comments (2)
3rdparty/torchtitan (1)
1-1: Subproject Commit Update VerifiedThe subproject commit has been updated to
5e2033c75c3c6e82882f87631942b942fde2c0d3, which appears to incorporate the necessary changes to support the reintroducedgrad_normmetric through the new logging API. This enables improved tracking of gradient norms during model training as intended by the PR objectives.flame/train.py (1)
732-737: LGTM! The grad_norm metric is successfully added back.This change enhances the logging functionality by including the gradient norm as an extra metric, which is valuable for monitoring training dynamics and diagnosing potential gradient issues during model training.
flame/train.py
Outdated
| train_state.step, | ||
| global_avg_loss, | ||
| global_max_loss, | ||
| extra_metrics={"loss_metrics/grad_norm": grad_norm.item()}, |
There was a problem hiding this comment.
How about adding some rounding for grad_norm, e.g., .4f?
There was a problem hiding this comment.
actually, this is not used for printing but logging via e.g. wandb and there the formatting is not an issue
|
The logging looks good enough to me for now. |

add back the grad_norm and skipped_step metric via the new log api in torchtitan
Summary by CodeRabbit
This release delivers improvements to system stability and training process monitoring:
Chores
New Features