Skip to content

Conversation

felipemello1
Copy link
Contributor

@felipemello1 felipemello1 commented Oct 8, 2025

  1. Env var FORGE_DISABLE_METRICS now disables spawning LocalFetcherActor when spawning processes
  2. Before, when calling record_metrics(...), if GlobalLoggingActor.init_backends was not initialized, it would raise errors. Now it just prints a warning. This enables stand alone components to be tested without metric logging.
  3. In the mode where wandb runs are shared, it would hang when init_backend was called AFTER initializing the other services. Calling service_token.clear_service_in_env() in shared mode fixes this issue.
  4. Add unit tests for these changes and also general tests for metrics

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 8, 2025
Copy link
Contributor

@allenwang28 allenwang28 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks Felipe!

# In multiprocessing environments, WandB service tokens can become stale and point
# to dead service processes. This causes wandb.init() to hang indefinitely trying
# to connect to non-existent services. Clearing forces fresh service connection.
from wandb.sdk.lib.service import service_token
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this also be imported at the top?

Copy link
Contributor Author

@felipemello1 felipemello1 Oct 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think so. My opinion is that backends like wandb should be protected, otherwise user is required to have wandb even if they dont use it. Extrapolate that to mlflow, scuba, etc. Let me know if you disagree.

@felipemello1 felipemello1 merged commit 76371d1 into meta-pytorch:main Oct 8, 2025
8 checks passed
@felipemello1 felipemello1 mentioned this pull request Oct 8, 2025
14 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants