Skip to content

Conversation

@HosseinKaviani-H
Copy link
Contributor

Fixing this issue here: #553

Problem

  • Calling get_or_create_metric_logger() inside actors fails with: AttributeError: NYI: attempting to get ProcMesh attribute 'slice' on object that's actually a ProcMeshRef
  • this_proc() returns ProcMeshRef (proxy) instead of ProcMesh (actual mesh) in actor contexts
  • Loss metrics were recorded but never flushed to WandB

Solution

  • Use get_or_spawn_controller("global_logger", GlobalLoggingActor) to get reference to the existing global logger
  • Avoids calling this_proc() which fails in actor contexts
  • Retrieves the singleton controller by name without triggering ProcMeshRef errors

Changes

  • Updated setup_metric_logger() to use get_or_spawn_controller() instead of get_or_create_metric_logger()
  • Added imports: GlobalLoggingActor and get_or_spawn_controller

Hossein Kavianihamedani added 2 commits November 10, 2025 20:59
- Fixes AttributeError when calling get_or_create_metric_logger() in actors
- Use get_or_spawn_controller() to get reference to global logger
- Avoids this_proc() call that returns ProcMeshRef instead of ProcMesh
- Enables WandB metric flushing from distributed training actors
- Loss values now appear in WandB dashboard across all ranks
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Nov 11, 2025
@felipemello1 felipemello1 merged commit a642464 into meta-pytorch:main Nov 11, 2025
10 checks passed
felipemello1 added a commit that referenced this pull request Nov 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants