Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: could not broadcast input array from shape (2) into shape (7,3,5) #242

Closed
hn2 opened this issue Mar 24, 2019 · 28 comments
Closed
Labels
custom gym env Issue related to Custom Gym Env

Comments

@hn2
Copy link

hn2 commented Mar 24, 2019

Describe the bug
I am trying to run stable_baseline alogs such as ppo1, ddpg and get this error:
ValueError: could not broadcast input array from shape (2) into shape (7,1,5)

Code example

action will be the portfolio weights from 0 to 1 for each asset

    self.action_space = gym.spaces.Box(-1, 1, shape=(len(instruments) + 1,), dtype=np.float32)  # include cash

    # get the observation space from the data min and max
    self.observation_space = gym.spaces.Box(low=-np.inf, high=np.inf, shape=(len(instruments), window_length, history.shape[-1]), dtype=np.float32)

I tried using obs.reshape(-1), obs.flatten(), obs.ravel() nothing works. Also tried CnnPolicy onstead of MlpPolicy and got:

ValueError: Negative dimension size caused by subtracting 8 from 7 for 'model/c1/Conv2D' (op: 'Conv2D') with input shapes: [?,7,1,5], [8,8,5,32].

System Info
Describe the characteristic of your environment:
*library was installed using:
git clone https://github.com/hill-a/stable-baselines.git
cd stable-baselines
pip install -e .

  • GPU models and configuration: no gpu, cpu only
  • Python 3.7.2
  • tensorflow 1.12.0
  • stable-baselines 2.4.1

Additional context
Add any other context about the problem here.

tensorflow==1.13.1 cpu

@araffin araffin added the more information needed Please fill the issue template completely label Mar 24, 2019
@araffin
Copy link
Collaborator

araffin commented Mar 24, 2019

Hello again,

as written in the previous topic (issue #239 ):

A simple solution consists in reshaping your observation to a 1D vector, so you can use MlpPolicy on it.
Otherwise, if you want to keep your observation with that exact shape, then you have to define a custom policy, as the default CnnPolicy was made for images (of shape 64x64xn) and normalization (dividing by 255.) is automatically applied in this case.

Again, do not forget to add more context for your problem next time by filling COMPLETELY the issue template ;)

EDIT: Please note the word in uppercase ;)

@hn2
Copy link
Author

hn2 commented Mar 24, 2019 via email

@hn2
Copy link
Author

hn2 commented Mar 24, 2019 via email

@hn2
Copy link
Author

hn2 commented Mar 26, 2019

I tried CnnPolicy instead of MlpPolicy and got this error:
ValueError: Negative dimension size caused by subtracting 8 from 7 for 'model/c1/Conv2D' (op: 'Conv2D') with input shapes: [?,7,1,5], [8,8,5,32].

@araffin
Copy link
Collaborator

araffin commented Mar 26, 2019

For the fourth and last time, please fill in the issue template COMPLETELY otherwise we cannot help you, and i will have to close this issue.

@araffin araffin added custom gym env Issue related to Custom Gym Env and removed more information needed Please fill the issue template completely labels Mar 26, 2019
@araffin
Copy link
Collaborator

araffin commented Mar 26, 2019

Hello,
the problem comes from your observation space, you should define it properly so it is consistent with the observation of your environment.
Currently, your observation space has a shape of dimension 3 but your are giving a 1D vector to the agent.

@hn2
Copy link
Author

hn2 commented Mar 26, 2019

It is consistent. It should be 3d: first dimension is the list of assets, second dimension are historical prices (defined by window_length = 1..50, third dimension is open, high, low, close, volume. Where do you see that I am feeding the agent with 1D vector?

@araffin
Copy link
Collaborator

araffin commented Mar 26, 2019

It is consistent. It should be 3d: first dimension is the list of assets, second dimension are historical prices (defined by window_length = 1..50, third dimension is open, high, low, close, volume. Where do you see that I am feeding the agent with 1D vector?

sooner in the discussion...

A simple solution consists in reshaping your observation to a 1D vector, so you can use MlpPolicy on it.

@hn2
Copy link
Author

hn2 commented Mar 26, 2019

As said before, I tried that but it didn't work (with reshape, flatten, ravel).
Maybe I am not doing it correctly. Please advise.

@hill-a
Copy link
Owner

hill-a commented Mar 27, 2019

File "/home/ubuntu/anaconda2/lib/python3.6/site-packages/stable_baselines/common/base_class.py", line 523, in reset

return self.venv.reset()[0] 

File "/home/ubuntu/anaconda2/lib/python3.6/site-packages/stable_baselines/common/vec_env/dummy_vec_env.py", line 57, in reset

self._save_obs(env_idx, obs) 

File "/home/ubuntu/anaconda2/lib/python3.6/site-packages/stable_baselines/common/vec_env/dummy_vec_env.py", line 75, in _save_obs

self.buf_obs[key][env_idx] = obs 

ValueError: could not broadcast input array from shape (2) into shape (7,1,5)

This is a Numpy error, cause by buf_obs being shape=(7, 1, 5), which makes sense since you told it shape=(len(instruments), window_length, history.shape[-1]). As @araffin said, you cannot have a n-dimensional vector, except for n==1 or (n==3 AND shape[0:2] >= 64 AND shape[2] == 3 or 1). Here you have a 3 dimensional vector with 7, 1 and 5 as it's width, heigh and depth, which will not work for CNN.

The solution as such is to drop to a one dimensional vector:

self.observation_space = gym.spaces.Box(low=-np.inf, high=np.inf, shape=(len(instruments) * window_length * history.shape[-1],), dtype=np.float32)

NOTE the multiplication in shape=(len(instruments) * window_length * history.shape[-1],) and not commas, this gives you shape=(35,), which is now compatible with MlpPolicy, and now you can use obs.reshape(-1) in your step() AND reset() functions.

If you need more help, we are going to need you entire environment code, as I am incapable of deducing Numpy broadcasting errors without knowing how the numpy array was formed AND how it was used.

@hn2
Copy link
Author

hn2 commented Mar 27, 2019

Ok I changed it as suggested I now get:
model.learn(total_timesteps=10000)
File "/home/ubuntu/anaconda2/lib/python3.6/site-packages/stable_baselines/ppo1/pposgd_simple.py", line 230, in learn
seg = seg_gen.next()
File "/home/ubuntu/anaconda2/lib/python3.6/site-packages/stable_baselines/trpo_mpi/utils.py", line 35, in traj_segment_generator
observation = env.reset()
File "/home/ubuntu/anaconda2/lib/python3.6/site-packages/stable_baselines/common/base_class.py", line 523, in reset
return self.venv.reset()[0]
File "/home/ubuntu/anaconda2/lib/python3.6/site-packages/stable_baselines/common/vec_env/dummy_vec_env.py", line 57, in reset
self._save_obs(env_idx, obs)
File "/home/ubuntu/anaconda2/lib/python3.6/site-packages/stable_baselines/common/vec_env/dummy_vec_env.py", line 75, in _save_obs
self.buf_obs[key][env_idx] = obs
ValueError: cannot copy sequence with size 2 to array axis with dimension 35

@hill-a
Copy link
Owner

hill-a commented Mar 27, 2019

If you need more help, we are going to need you entire environment code, as I am incapable of deducing Numpy broadcasting errors without knowing how the numpy array was formed AND how it was used.

I have no idea what your environment is sending down to the DummyVecEnv, as such I'm not capable of helping you without the original code, for me you are simply sending the wrong information in reset() and I am clueless to why without further information.

Other than that, this does not seem like a stable baselines issues, rather an implementation issue, make sure you followed the guide for custom environment https://stable-baselines.readthedocs.io/en/master/guide/custom_env.html and that your reset() implementation is correctly returning the observation.

@araffin
Copy link
Collaborator

araffin commented Mar 27, 2019

I agree with @hill-a , this is apparently not an error due to stable-baselines but to a custom environment. I think we gave you already enough information to help you debug your environment.

We do not do personal debugging and focus on SB issues, so I will close this one.

@araffin araffin closed this as completed Mar 27, 2019
@araffin
Copy link
Collaborator

araffin commented Mar 27, 2019

Related issue: #88 and #189

@hn2
Copy link
Author

hn2 commented Mar 27, 2019

It looks like it did flattened the array as it now has dimension 35
I just did exactly what you said and changed the observation space

@araffin
Copy link
Collaborator

araffin commented Mar 27, 2019

You may have the same issue as here: #214

@hn2
Copy link
Author

hn2 commented Mar 27, 2019

Here is the code of my custom env reset:

def reset(self):
	self.infos = []
	self.old_portfolio_value = self.capital_base
	self.new_portfolio_value = self.capital_base
	self.current_step = 0

	# get data for this episode, each episode might be different.
	if self.start_date is None:
		self.idx = np.random.randint(low=self.window_length, high=self.history.shape[1] - self.steps)
	else:
		# compute index corresponding to start_date for repeatable sequence
		self.idx = date_to_index(self.start_date) - self.start_idx
		assert self.idx >= self.window_length and self.idx <= self.history.shape[1] - self.steps, \
			'Invalid start date, must be window_length day after start date and simulation steps day before end date'
	# print('Start date: {}'.format(index_to_date(self.idx)))
	data = self.history[:, self.idx - self.window_length:self.idx + self.steps + 1, :4]
	# apply augmentation?
	obs = data[:, self.current_step:self.current_step+ self.window_length, :].copy()
	ground_truth_obs = data[:, self.current_step+ self.window_length:self.current_step+ self.window_length + 1, :].copy()

	cash_obs = np.ones((1, self.window_length, obs.shape[2]))
	obs = np.concatenate((cash_obs, obs), axis=0)
	cash_ground_truth = np.ones((1, 1, ground_truth_obs.shape[2]))
	ground_truth_obs = np.concatenate((cash_ground_truth, ground_truth_obs), axis=0)
	info = {}
	info['next_obs'] = ground_truth_obs
	return obs.reshape(-1), info

@araffin
Copy link
Collaborator

araffin commented Mar 27, 2019

you are returning a tuple in the reset method. It should be only an observation, @hill-a was right.

@hn2
Copy link
Author

hn2 commented Mar 29, 2019

I finally succeeded in running the model. Do you have documentation how to interpret the output?

(verbosity=1)
********** Iteration 0 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00148 |      -0.11349 |       0.03412 |       0.01007 |      11.34949
     -0.01255 |      -0.11346 |       0.02966 |       0.02644 |      11.34579
     -0.02320 |      -0.11343 |       0.02311 |       0.02290 |      11.34317
     -0.02648 |      -0.11338 |       0.01937 |       0.01414 |      11.33826
Evaluating losses...
     -0.03064 |      -0.11335 |       0.01735 |       0.01275 |      11.33532
----------------------------------
| EpThisIter      | 0            |
| EpisodesSoFar   | 0            |
| TimeElapsed     | 15.9         |
| TimestepsSoFar  | 256          |
| ev_tdlam_before | -3.03        |
| loss_ent        | 11.335321    |
| loss_kl         | 0.01274954   |
| loss_pol_entpen | -0.113353215 |
| loss_pol_surr   | -0.030639375 |
| loss_vf_loss    | 0.017349362  |
----------------------------------
********** Iteration 1 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00121 |      -0.11333 |       0.02765 |       0.00017 |      11.33332
     -0.00925 |      -0.11329 |       0.02208 |       0.00160 |      11.32853
     -0.01676 |      -0.11323 |       0.02083 |       0.00704 |      11.32298
     -0.02069 |      -0.11318 |       0.01745 |       0.01167 |      11.31769
Evaluating losses...
     -0.02851 |      -0.11316 |       0.01342 |       0.01097 |      11.31555
----------------------------------
| EpThisIter      | 0            |
| EpisodesSoFar   | 0            |
| TimeElapsed     | 20.9         |
| TimestepsSoFar  | 512          |
| ev_tdlam_before | -17.4        |
| loss_ent        | 11.315554    |
| loss_kl         | 0.010965543  |
| loss_pol_entpen | -0.11315554  |
| loss_pol_surr   | -0.028507909 |
| loss_vf_loss    | 0.013420929  |
----------------------------------

@hill-a
Copy link
Owner

hill-a commented Mar 29, 2019

There is no coordination between the logging for the methods unfortunatly. I'm guessing from the losses in the log that you are using PPO1 here, so here are the descriptions:

  • EpThisIter: number of episodes that occured during this iteration
  • EpisodesSoFar: number of episodes that occured so far
  • TimeElapsed: the elapsed time in seconds
  • TimestepsSoFar: the number of timesteps so far
  • ev_tdlam_before: explained variance between predicted value function and TD(lambda) estimator
  • loss_ent: entropy loss
  • loss_kl: Kullback-Leibler loss
  • loss_pol_entpen: entropy loss times -entropy coef
  • loss_pol_surr: pessimistic surrogate loss
  • loss_vf_loss: value function loss

EDIT: if you wish to measure the performance of the method, please have a glance at tensorboard and the example code given in the doc for validating the method after learning.

@hn2
Copy link
Author

hn2 commented Mar 29, 2019

I am using the tensorboard integration. Is it possible to print additional info returning from the step on the tensorboard web page?

@hn2
Copy link
Author

hn2 commented Mar 29, 2019

When I changed to:

self.action_space = gym.spaces.Box(-1., 1., shape=(len(self.src.asset_names) + 1,), dtype=np.float32)
self.observation_space = gym.spaces.Box(low=-np.inf, high=np.inf, shape=(len(self.src.asset_names) + 1, window_length, history.shape[-1],), dtype=np.float32)

and I do:

def step(self, action):

    print(action)

I always get: [nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan ..

It is a problem with my env or with the agent?

@hn2
Copy link
Author

hn2 commented Mar 31, 2019

Another error that I get on windows when trying to import stable_baseline:
Exception has occurred: AssertionError
expected file or str, got <_pydevd_bundle.pydevd_io.IORedirector object at 0x000001EE6A3464E0>

in logger.py line 56:
assert hasattr(filename_or_file, 'read'), 'expected file or str, got %s' % filename_or_file
The same code works on ubuntu. I guess that something is wring with my windows python environment but what?

@hill-a
Copy link
Owner

hill-a commented Apr 1, 2019

I am using the tensorboard integration. Is it possible to print additional info returning from the step on the tensorboard web page?

No, that would be quite difficult to do. It would be better to use the validation part of the example code given in the doc:

obs = env.reset()
for i in range(1000):
    action, _states = model.predict(obs)
    obs, rewards, dones, info = env.step(action)
    print(info)
    env.render()

When I changed to:

self.action_space = gym.spaces.Box(-1., 1., shape=(len(self.src.asset_names) + 1,), dtype=np.float32)
self.observation_space = gym.spaces.Box(low=-np.inf, high=np.inf, shape=(len(self.src.asset_names) + 1, window_length, history.shape[-1],), dtype=np.float32)

and I do:

def step(self, action):
   print(action)

I always get: [nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan ..

It is a problem with my env or with the agent?

As discussed earlier, this will not work, observation space needs to be one dimensional:

self.observation_space = gym.spaces.Box(low=-np.inf, high=np.inf, shape=(len(instruments) * window_length * history.shape[-1],), dtype=np.float32)

Another error that I get on windows when trying to import stable_baseline:

Exception has occurred: AssertionError
    expected file or str, got <_pydevd_bundle.pydevd_io.IORedirector object at 0x000001EE6A3464E0>
in logger.py line 56:
    assert hasattr(filename_or_file, 'read'), 'expected file or str, got %s' % filename_or_file

The same code works on ubuntu. I guess that something is wring with my windows python environment but what?

Next time, tell me what you are using, it helps avoid guessing. So, I'm assuming you are using Pycharm, since it is a _pydevd_bundle object.
This error is caused by the logger taking stdout and asserting it is a string or a readable IO object. Here with pycharm under windows it seems to be the case.
I quick fix for this would be to comment that assertation line 56 in logger.py

@hn2
Copy link
Author

hn2 commented Apr 1, 2019

I am using VS code.

@hill-a
Copy link
Owner

hill-a commented Apr 1, 2019

I am using VS code.

Ok, this is what stable-baselines is crashing on, it seems that for some reason the object does not have a read attribute, not sure why though as this code should work and dates from the initial commit of VS... Again, comment that assertation line 56 in logger.py. it is only there as a safeguard for those who are playing with the logger, which I am assuming you are not.

@hn2
Copy link
Author

hn2 commented Apr 1, 2019

What is the solution then? move to another ide?

@hill-a
Copy link
Owner

hill-a commented Apr 1, 2019

For the last time

Again, comment that assertation line 56 in logger.py

In stable-baselines folder there is the file logger.py, comment line 56.
Will do a hotfix later. Locking thread as we do not do tech support.

Repository owner locked as resolved and limited conversation to collaborators Apr 1, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
custom gym env Issue related to Custom Gym Env
Projects
None yet
Development

No branches or pull requests

3 participants