Skip to content

Conversation

allenwang28
Copy link
Contributor

This PR does a few things:

  • Moves Replica's create_proc_mesh, spawn_actors, and stop functionality into ForgeActor as @classmethod for launch and shutdown. Why?
    • This couples definition of how an actor should be launched with the actor def itself, rather than being in a separate object.
    • This gives flexibility for more complex actors, like Policy, which spawns multiple proc meshes and actor types
  • Uses ForgeActor for everything we expect to be spawned as a service
    • GPUManager becomes a regular actor to avoid circular imports (it doesn't really need to be a ForgeActor either)
  • Modifies Policy to pick up these changes
  • GRPO example mods:
    • Dataset adds in **kwargs, launch() doesn't play well with *args and I couldn't figure out the exact right way to make args work correctly. Therefore spawn_service really only accepts kwargs at the moment.
    • Spawns and shutdowns all services at the same time to reduce the initialization time
  • vLLM example changes:
    • Use the with policy.session context manager and generally QoL updates

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Aug 29, 2025
@allenwang28 allenwang28 marked this pull request as ready for review August 29, 2025 19:08
@allenwang28 allenwang28 requested review from Jack-Khuu, Copilot, joecummings and pbontrager and removed request for joecummings August 29, 2025 19:12
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for custom replicas by moving process and actor lifecycle management from Replica into ForgeActor as @classmethod methods. It also updates Policy to use this new pattern and implement correct spawning within the service framework.

  • Moves replica management functionality from Replica class to ForgeActor as launch() and shutdown() class methods
  • Updates Policy to implement custom launch/shutdown logic that manages multiple process meshes and actor types
  • Modifies service spawn/shutdown to use ForgeActor pattern and removes positional arguments support

Reviewed Changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/forge/controller/actor.py Adds launch() and shutdown() class methods to ForgeActor for managing actor lifecycle
src/forge/controller/service/replica.py Removes proc mesh management and delegates to ForgeActor.launch()/shutdown()
src/forge/controller/service/spawn.py Adds type validation and shutdown_service() function, removes positional args
src/forge/actors/policy.py Implements custom launch()/shutdown() to manage multiple proc meshes
tests/unit_tests/test_service.py Updates test class and shutdown calls to use new service patterns
apps/vllm/main.py Updates to use new service context manager and shutdown function
apps/grpo/main.py Updates DatasetActor constructor and uses concurrent service spawning/shutdown

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Member

@joecummings joecummings left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few questions, but overall looks good.

return policy_config, service_config


async def run_vllm(service_config: ServiceConfig, config: PolicyConfig, prompt: str):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't be delete this now that we have the vllm app?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm I'm not sure I'm following, we still need a ServiceConfig for spawning a service here regardless?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it out of scope to move the vllm processing loop within the policy instead of having the start that in main.py?

Copy link
Contributor Author

@allenwang28 allenwang28 Aug 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

? yeah it should have been moved in this PR



class GpuManager(ForgeActor):
class GpuManager(Actor):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we want this to be a ForgeActor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Circular import since proc_mesh.py wants to use GpuManager, and ForgeActor uses proc_mesh :/ can be re-arranged later


async def spawn_service(
service_cfg: ServiceConfig, actor_def: Type[Actor], *actor_args, **actor_kwargs
service_cfg: ServiceConfig, actor_def: Type[ForgeActor], **actor_kwargs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm there will be things that likely need *args, not just **kwargs - is there a long term plan to make that possible?

Comment on lines +106 to +108
self._run_task: asyncio.Task | None = None
self._policy_proc: ProcMesh | None = None
self._worker_procs: ProcMesh | None = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice

Comment on lines +359 to +366
dataloader,
policy,
trainer,
replay_buffer,
compute_advantages,
ref_model,
reward_actor,
) = await asyncio.gather(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice optimization

Will note that this is harder to read and more error prone if the order gets changed or services list mutated. Not sure if there's a way to get our cake and eat it too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm we could do something like:

dataloader_task = spawn_service(...) # don't await yet
policy_task = spawn_service(...)

then do the bulk await at the end:

dataloader, policy = await asyncio.gather(...)

but I'm not sure if it fully solves the problem

@allenwang28 allenwang28 merged commit ccd2377 into meta-pytorch:main Aug 29, 2025
4 checks passed
@allenwang28 allenwang28 deleted the policy_replica branch August 29, 2025 23:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants