Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gaes fees #21

Open
crazypythonista opened this issue Mar 6, 2022 · 3 comments
Open

Gaes fees #21

crazypythonista opened this issue Mar 6, 2022 · 3 comments

Comments

@crazypythonista
Copy link

Hello, I was trying to work this out on my end from scratch, I have got it to the point of training the model and also visualize but it seems to drop in the middle of the training session without saving the model.

VC:
Python : 3.8.10
tensorflow = 2.3.1
Windows = 11
No IDLE, Using script mode from windows power shell virtual env.

Below is the complete Traceback of the error I received.

2022-03-07 04:17:43.095316: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2022-03-07 04:17:43.100610: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263]
Traceback (most recent call last):
File "RL-Bitcoin-trading-bot_7.py", line 501, in
train_multiprocessing(CustomEnv, agent, train_df, train_df_nomalized, num_worker = 5, training_batch_size=50, visualize=True, EPISODES=5)
File "D:\Mine\RLCurrent\multiprocessing_env.py", line 95, in train_multiprocessing
a_loss, c_loss = agent.replay(states[worker_id], actions[worker_id], rewards[worker_id], predictions[worker_id], dones[worker_id], next_states[worker_id])
File "RL-Bitcoin-trading-bot_7.py", line 121, in replay
advantages, target = self.get_gaes(rewards, dones, np.squeeze(values), np.squeeze(next_values))
File "RL-Bitcoin-trading-bot_7.py", line 93, in get_gaes
deltas = [r + gamma * (1 - d) * nv - v for r, d, nv, v in zip(rewards, dones, next_values, values)]
File "RL-Bitcoin-trading-bot_7.py", line 93, in
deltas = [r + gamma * (1 - d) * nv - v for r, d, nv, v in zip(rewards, dones, next_values, values)]
TypeError: unsupported operand type(s) for +: 'NoneType' and 'float'

Any sort of help is highly appreciated. If needed I'll post code snippets as well for more clarity.
Thanks.

@HoaxParagon
Copy link

This is a duplicate of #18

@HoaxParagon
Copy link

Also a duplicate of #9

@wanga10000
Copy link

wanga10000 commented Mar 14, 2022

Hey I think the problem is originated from the output of critic_predict. I guess that in the original PPO function implemented by the writer has included "Critic model also watched the previous predicted value", but he removed it in this tutorial. That means critic model doesn't check previous value input now. Maybe you should try removing the np.zero input in critirc_predict function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants