Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about RL training on waymo #53

Open
xuqingyao opened this issue Jan 28, 2024 · 4 comments
Open

Question about RL training on waymo #53

xuqingyao opened this issue Jan 28, 2024 · 4 comments

Comments

@xuqingyao
Copy link

xuqingyao commented Jan 28, 2024

Hello, I'm trying to train an RL policy on waymo by simply run python scenarionet_training/scripts/train_waymo.py --num-gpus 1 script. However, I found that the program seemed keep pending after initialization. The output is as follows. May I konw what should the normal output should look like?

Successfully initialize Ray!
Available resources:  {'memory': 132170424525.0, 'accelerator_type:G': 1.0, 'GPU': 1.0, 'object_store_memory': 60930181939.0, 'CPU': 88.0, 'node:192.168.28.129': 1.0}
== Status ==
Current time: 2024-01-26 20:49:10 (running for 00:00:00.65)
Memory usage on this node: 299.9/472.3 GiB
Using FIFO scheduling algorithm.
Resources requested: 6.999999999999997/88 CPUs, 0.5/1 GPUs, 0.0/123.09 GiB heap, 0.0/56.75 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /DB/data/qingyaoxu/scenarionet/experiment/TEST
Number of trials: 5/5 (4 PENDING, 1 RUNNING)
+------------------------------------------+----------+-------+--------+
| Trial name                               | status   | loc   |   seed |
|------------------------------------------+----------+-------+--------|
| MultiWorkerPPO_GymEnvWrapper_4c76d_00000 | RUNNING  |       |      0 |
| MultiWorkerPPO_GymEnvWrapper_4c76d_00001 | PENDING  |       |    100 |
| MultiWorkerPPO_GymEnvWrapper_4c76d_00002 | PENDING  |       |    200 |
| MultiWorkerPPO_GymEnvWrapper_4c76d_00003 | PENDING  |       |    300 |
| MultiWorkerPPO_GymEnvWrapper_4c76d_00004 | PENDING  |       |    400 |
+------------------------------------------+----------+-------+--------+
@QuanyiLi
Copy link
Member

It is almost correct. You can set the num_gpus as 0 to disable using GPU or any number that is less than 0.2. Then all experiments can run parallel.

@QuanyiLi
Copy link
Member

It is a feature of ray. You can take a look at the documentation of ray1.2 (it is an outdated version, lol). In my experience, I usually set num_gpus=0 as CPUs are enough to handle the optimization of these MLPs. If GPU is used, moving data between devices is costly, while the acceleration brought by GPU is not obvious enough.

@xuqingyao
Copy link
Author

Thank you for your advice. I did successfully run this code when I set the num_gpus as 0. However, I encountered another problem. I followed python -m scenarionet.convert_waymo -d /path/to/your/database --raw_data_path ./waymo/training_20s --num_files=1000 to build the data, but I used the v1.1 version of waymo. And I found that the crosswalk data of the 126 scene(id bec43944a9017106) is a 3d point but not the required 2d points when I train the RL policy, Is it a problem cause by the original data? In order to solve this problem, my current approach is to directly intercept the first two dimensions as data input. Is this correct? Will it cause some problem?

  File "/DB/data/qingyaoxu/metadrive/metadrive/envs/base_env.py", line 522, in reset
    self.engine.reset()
  File "/DB/data/qingyaoxu/metadrive/metadrive/engine/base_engine.py", line 354, in reset
    manager.reset()
  File "/DB/data/qingyaoxu/metadrive/metadrive/manager/scenario_map_manager.py", line 41, in reset
    new_map = ScenarioMap(map_index=seed, map_data=m_data)
  File "/DB/data/qingyaoxu/metadrive/metadrive/component/map/scenario_map.py", line 19, in __init__
    super(ScenarioMap, self).__init__(dict(id=self.map_index), random_seed=random_seed)
  File "/DB/data/qingyaoxu/metadrive/metadrive/component/map/base_map.py", line 63, in __init__
    self._generate()
  File "/DB/data/qingyaoxu/metadrive/metadrive/component/map/scenario_map.py", line 36, in _generate
    block.construct_block(self.engine.worldNP, self.engine.physics_world, attach_to_world=True)
  File "/DB/data/qingyaoxu/metadrive/metadrive/component/block/base_block.py", line 124, in construct_block
    self._create_in_world()
  File "/DB/data/qingyaoxu/metadrive/metadrive/component/block/base_block.py", line 226, in _create_in_world
    self.create_in_world()
  File "/DB/data/qingyaoxu/metadrive/metadrive/component/scenario_block/scenario_block.py", line 72, in create_in_world
    self._construct_crosswalk()
  File "/DB/data/qingyaoxu/metadrive/metadrive/component/block/base_block.py", line 411, in _construct_crosswalk
    np = make_polygon_model(polygon, 1.5)
  File "/DB/data/qingyaoxu/metadrive/metadrive/utils/vertex.py", line 108, in make_polygon_model
    elif not is_anticlockwise(points) and auto_anticlockwise:
  File "/DB/data/qingyaoxu/metadrive/metadrive/utils/vertex.py", line 75, in is_anticlockwise
    x1, y1 = points[i]
ValueError: too many values to unpack (expected 2)

@QuanyiLi
Copy link
Member

QuanyiLi commented Feb 6, 2024

Could you try pulling the latest MetaDrive and running your script again?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants