Skip to content

Conversation

pradeepfn
Copy link
Contributor

$subject.

here is the working execution. It fixes the following two error.

(forge) [pradeepfdo@devvm2487.eag0 ~/forge_fork (main)]$ python -m apps.rl.main --config apps/rl/llama3_8b.yaml
Traceback (most recent call last):
File "/home/pradeepfdo/.conda/envs/forge/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/pradeepfdo/.conda/envs/forge/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/pradeepfdo/forge_fork/apps/rl/main.py", line 63, in
sys.exit(recipe_main())
File "/home/pradeepfdo/.conda/envs/forge/lib/python3.10/site-packages/forge/cli/config.py", line 180, in wrapper
sys.exit(recipe_main(conf))
File "/home/pradeepfdo/forge_fork/apps/rl/main.py", line 59, in recipe_main
asyncio.run(run(cfg))
File "/home/pradeepfdo/.conda/envs/forge/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/home/pradeepfdo/.conda/envs/forge/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
return future.result()
File "/home/pradeepfdo/forge_fork/apps/rl/main.py", line 29, in run
trainer, buffer = await asyncio.gather(
File "/home/pradeepfdo/.conda/envs/forge/lib/python3.10/site-packages/forge/controller/proc_mesh.py", line 53, in spawn_actors
mesh = await get_proc_mesh(processes)
File "/home/pradeepfdo/.conda/envs/forge/lib/python3.10/site-packages/forge/controller/proc_mesh.py", line 77, in get_proc_mesh
if process_config.with_gpus:
File "/home/pradeepfdo/.conda/envs/forge/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 355, in getattr
self._format_and_raise(
File "/home/pradeepfdo/.conda/envs/forge/lib/python3.10/site-packages/omegaconf/base.py", line 231, in _format_and_raise
format_and_raise(
File "/home/pradeepfdo/.conda/envs/forge/lib/python3.10/site-packages/omegaconf/_utils.py", line 899, in format_and_raise
_raise(ex, cause)
File "/home/pradeepfdo/.conda/envs/forge/lib/python3.10/site-packages/omegaconf/_utils.py", line 797, in _raise
raise ex.with_traceback(sys.exc_info()[2]) # set env var OC_CAUSE=1 for full trace
File "/home/pradeepfdo/.conda/envs/forge/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 351, in getattr
return self._get_impl(
File "/home/pradeepfdo/.conda/envs/forge/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 442, in _get_impl
node = self._get_child(
File "/home/pradeepfdo/.conda/envs/forge/lib/python3.10/site-packages/omegaconf/basecontainer.py", line 73, in _get_child
child = self._get_node(
File "/home/pradeepfdo/.conda/envs/forge/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 480, in _get_node
raise ConfigKeyError(f"Missing key {key!s}")
omegaconf.errors.ConfigAttributeError: Missing key with_gpus
full_key: trainer.processes.with_gpus
object_type=dict

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Aug 31, 2025
@pradeepfn pradeepfn requested a review from pbontrager August 31, 2025 22:56
Copy link
Contributor

@pbontrager pbontrager left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@pradeepfn pradeepfn merged commit 7c02bd7 into meta-pytorch:main Aug 31, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants