Skip to content

Get metric logger reference in distributed actors #553

@HosseinKaviani-H

Description

@HosseinKaviani-H

🐛 Describe the bug

Get metric logger reference in distributed actors

Issue

Calling get_or_create_metric_logger() inside Forge actors fails with:

AttributeError: NYI: attempting to get ProcMesh attribute 'slice' on object that's actually a ProcMeshRef

Root Cause

get_or_create_metric_logger() internally calls this_proc() which doesn't work inside actors spawned from ProcMesh:

  • Inside actors: this_proc() returns ProcMeshRef (proxy object)
  • Expected: Returns ProcMesh (actual mesh object)
  • Result: AttributeError when trying to access ProcMesh methods

Solution

Use get_or_spawn_controller() from Monarch to get a reference to the already-created global logger:

File: apps/sft/main.py (line 112-120)

async def setup_metric_logger(self):
    """Retrieve the already-initialized metric logger from main process"""
    from monarch.actor import get_or_spawn_controller
    from forge.observability.metric_actors import GlobalLoggingActor

    # Get reference to the existing global logger (don't create new one)
    mlogger = await get_or_spawn_controller("global_logger", GlobalLoggingActor)
    return mlogger

Why This Works

  1. Main process (line 322): Creates global logger with get_or_create_metric_logger(process_name="Controller")
  2. Actor setup (line 132): Gets reference using get_or_spawn_controller("global_logger", GlobalLoggingActor)
    • Looks up the existing controller by name
    • Returns a reference without calling this_proc()
    • No ProcMeshRef errors!
  3. During training (line 297): Flushes metrics with await self.mlogger.flush.call_one(global_step=self.current_step)

Verified

python -m apps.sft.main --config apps/sft/llama3_8b.yaml
# WandB now shows:
# - ForgeSFTRecipe/train_step/loss
# - ForgeSFTRecipe/train/step

Versions

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions