Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lstm+ppo #886

Open
1900360 opened this issue Oct 13, 2022 · 0 comments
Open

lstm+ppo #886

1900360 opened this issue Oct 13, 2022 · 0 comments

Comments

@1900360
Copy link

1900360 commented Oct 13, 2022

While training Pendulum-v0 with lstm+ppo, the following problems occurred using parallel environment:

Traceback (most recent call last):
  File "D:\desktop\lunwen_dabao\xinsuanfa0912\tian_sac_test\launch_multiprocessing_traning_cylinder.py", line 91, in <module>
    runner = Runner(
  File "C:\Users\1900\.conda\envs\yl\lib\site-packages\tensorforce\execution\runner.py", line 168, in __init__
    environment = Environment.create(
  File "C:\Users\1900\.conda\envs\yl\lib\site-packages\tensorforce\environments\environment.py", line 94, in create
    environment = MultiprocessingEnvironment(
  File "C:\Users\1900\.conda\envs\yl\lib\site-packages\tensorforce\environments\multiprocessing_environment.py", line 62, in __init__
    process.start()
  File "C:\Users\1900\.conda\envs\yl\lib\multiprocessing\process.py", line 121, in start
    self._popen = self._Popen(self)
  File "C:\Users\1900\.conda\envs\yl\lib\multiprocessing\context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "C:\Users\1900\.conda\envs\yl\lib\multiprocessing\context.py", line 327, in _Popen
    return Popen(process_obj)
  File "C:\Users\1900\.conda\envs\yl\lib\multiprocessing\popen_spawn_win32.py", line 93, in __init__
    reduction.dump(process_obj, to_child)
  File "C:\Users\1900\.conda\envs\yl\lib\multiprocessing\reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'overwrite_staticmethod.<locals>.overwritten'

Below is my code, what is wrong with that part? And about the settings of lstm, is there a problem?

import argparse
import re
from tensorforce import Runner, Agent,Environment
# import envobject_cylinder
import os
import gym

parser = argparse.ArgumentParser()
shell_args = vars(parser.parse_args())
shell_args['num_episodes']=300
shell_args['max_episode_timesteps']=200

number_servers=10
environments = []
for i in range(10):
    env = Environment.create(
    environment='gym', level='Pendulum-v0', max_episode_timesteps=200
    )
    environments.append(env)

# environment = Environment.create(
#     environment='gym', level='Pendulum-v0', max_episode_timesteps=200
# )

network_spec = [
    dict(type='rnn', size=512,horizon=4,activation='tanh',cell='lstm'),
    dict(type='rnn', size=512,horizon=4,activation='tanh',cell='lstm')
]
baseline_spec = [
   dict(type='rnn', size=512,horizon=4,activation='tanh',cell='lstm', ),
    dict(type='rnn', size=512,horizon=4,activation='tanh',cell='lstm', )
]

# env=gym.make('Pendulum-v0')
# Instantiate a Tensorforce agent
agent = Agent.create(
    states=dict(
                type='float',
                shape=(int(3), )
            ),
    actions=dict(
            type='float',
            shape=(1, ),
            min_value=-2,
            max_value=2
        ),
    max_episode_timesteps=200,
    agent='ppo',
    # num_parallel=10,
    environment=env,
    # max_episode_timesteps=200,
    batch_size=20,
     network=network_spec,
    learning_rate=0.001,state_preprocessing=None,
    entropy_regularization=0.01, likelihood_ratio_clipping=0.2, subsampling_fraction=0.2,
    predict_terminal_values=True,
    discount=0.97,
    # baseline=dict(type='1', size=[32, 32]),
    baseline=baseline_spec,
    baseline_optimizer=dict(
        type='multi_step',
        optimizer=dict(
            type='adam',
            learning_rate=1e-3
        ),
        num_steps=5
    ),
    multi_step=25,
    parallel_interactions=number_servers,
    saver=dict(directory=os.path.join(os.getcwd(), 'saved_models/checkpoint'),frequency=1  
    # save checkpoint every 100 updates
    ),
    summarizer=dict(
        directory='summary',
        # list of labels, or 'all'
        summaries=['entropy', 'kl-divergence', 'loss', 'reward', 'update-norm']
    ),
)
print('Agent defined DONE!')

runner = Runner(
    agent=agent,
    num_parallel=10,
    environments=environments,
    max_episode_timesteps=200,
    evaluation=False,
    remote='multiprocessing',
)
print('Runner defined DONE!')

runner.run(num_episodes=shell_args['num_episodes'],
           save_best_agent ='best_model',
           sync_episodes=False,
           )
runner.close()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant