-
Notifications
You must be signed in to change notification settings - Fork 725
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Images as observation_space in Pendulum-v0 (or in Classic Control) #915
Comments
stable-baselines agents support taking images as inputs if you set the policy to be "CnnPolicy", in which case the input is processed with a small CNN network. You need to modify those Gym envs itself to get the image output. Seems like the solution is to simply call from gym import Wrapper, spaces
class RGBArrayAsObservationWrapper(Wrapper):
"""
Use env.render(rgb_array) as observation
rather than the observation environment provides
"""
def __init__(self, env):
# TODO this might not work before environment has been reset
dummy_obs = env.render("rgb_array")
# Update observation space
# TODO assign correct low and high
self.observation_space = spaces.Box(low=0, high=255, shape=dummy_obs.shape, dtype=dummy_obs.dtype)
def reset(self, **kwargs):
obs = self.env.reset(**kwargs)
obs = env.render("rgb_array")
return obs
def step(self, action):
obs, reward, done, info = self.env.step(action)
obs = env.render("rgb_array")
return obs, reward, done, info |
First, thank you very much for your quick and detailed response! I added self.reset() to the first line with TODO and changed the env.render() calls to self.env.render() calls. Here is the complete code that I am using.
My goal is simple, trying to figure out the performance of the two algorithms TD3 and SAC when working with image inputs. My issues are listed below.
Again any help would be greatly appreciated. |
I am glad to hear you got the code to work 👍 SAC/TD3: Did you also change the policy to one from TD3, not SAC? Also regarding this: SAC/TD3 are known to be inefficient in learning from images. I suggest you try A2C/PPO, unless SAC/TD3 are the very things you want to study.
The logs are displayed after fixed number of updates, and if the training is slow this can take a long while.
I imagine the rendering code is not very fast as it is meant for debugging/enjoyment use (observed by humans). Also remember to resize the image to something smaller (e.g. 40x40), because the original image is probably too large.
Likely unavoidable as the rendering is done with OpenGL based code and OpenGL requires valid screen surface to draw on. If there are no more issues related to stable-baselines, you can close the issue. |
Thanks for the fast answer yet again, I am amazed by how fast you are replying! :)
Yes, it started working but the learning phase is even slower than TD3. I have been waiting for just one time_step for about 10 minutes now.
I am an absolute beginner when it comes to stable_baselines, can you maybe explain how I might do that? |
Again, this is not a place for tech support, and I am closing the issue as there does not seem to be issues related to stable-baselines.
You could double-check how fast the environment is with random agent (
Using a simple image resize function, like one from scikit-images. Edit:
Cheers :). I try my best. |
Hello,
I am working on the Pendulum-v0 environment and using SAC and TD3 implementations from Stable Baselines. I want to alter the observation_space so I can train the model using image inputs, instead of the float32 array it currently uses. Any help regarding how to solve this task would be greatly appreciated. Also, if there isn't already one, a framework or a wrapper to easily use image inputs as observations would be a nice feature to have.
The text was updated successfully, but these errors were encountered: