Metric Logging updates 1/4 #345

felipemello1 · 2025-10-08T15:39:57Z

Env var FORGE_DISABLE_METRICS now disables spawning LocalFetcherActor when spawning processes
Before, when calling record_metrics(...), if GlobalLoggingActor.init_backends was not initialized, it would raise errors. Now it just prints a warning. This enables stand alone components to be tested without metric logging.
In the mode where wandb runs are shared, it would hang when init_backend was called AFTER initializing the other services. Calling service_token.clear_service_in_env() in shared mode fixes this issue.
Add unit tests for these changes and also general tests for metrics

allenwang28

thanks Felipe!

src/forge/observability/metrics.py

allenwang28 · 2025-10-08T17:18:49Z

src/forge/observability/metrics.py

+        # In multiprocessing environments, WandB service tokens can become stale and point
+        # to dead service processes. This causes wandb.init() to hang indefinitely trying
+        # to connect to non-existent services. Clearing forces fresh service connection.
+        from wandb.sdk.lib.service import service_token


should this also be imported at the top?

I dont think so. My opinion is that backends like wandb should be protected, otherwise user is required to have wandb even if they dont use it. Extrapolate that to mlflow, scuba, etc. Let me know if you disagree.

src/forge/controller/provisioner.py

commit

77488cf

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 8, 2025

allenwang28 reviewed Oct 8, 2025

View reviewed changes

src/forge/observability/metrics.py Outdated Show resolved Hide resolved

Felipe Mello added 2 commits October 8, 2025 10:17

update where we check FORGE_DISABLE_METRICS

8a24e71

remove protected import

3f3bc51

allenwang28 reviewed Oct 8, 2025

View reviewed changes

protect import

4fe2611

felipemello1 commented Oct 8, 2025

View reviewed changes

src/forge/controller/provisioner.py Show resolved Hide resolved

allenwang28 approved these changes Oct 8, 2025

View reviewed changes

felipemello1 merged commit 76371d1 into meta-pytorch:main Oct 8, 2025
8 checks passed

felipemello1 mentioned this pull request Oct 8, 2025

[metric logging ] - open TODOs #258

Open

14 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Metric Logging updates 1/4 #345

Metric Logging updates 1/4 #345

Uh oh!

felipemello1 commented Oct 8, 2025 •

edited

Loading

Uh oh!

allenwang28 left a comment

Uh oh!

Uh oh!

allenwang28 Oct 8, 2025

Uh oh!

felipemello1 Oct 8, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Metric Logging updates 1/4 #345

Metric Logging updates 1/4 #345

Uh oh!

Conversation

felipemello1 commented Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

allenwang28 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

allenwang28 Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

felipemello1 Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

felipemello1 commented Oct 8, 2025 •

edited

Loading

felipemello1 Oct 8, 2025 •

edited

Loading