Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proposal] RL Common / Baselines Common Package #503

Closed
araffin opened this issue Oct 9, 2019 · 9 comments · Fixed by #540
Closed

[Proposal] RL Common / Baselines Common Package #503

araffin opened this issue Oct 9, 2019 · 9 comments · Fixed by #540
Labels
enhancement New feature or request

Comments

@araffin
Copy link
Collaborator

araffin commented Oct 9, 2019

With @hill-a , we came up to the same conclusion about things in the common folder: a lot is not specific to this library and could be separated in a stand-alone python package.

Doing so would:

  • allow people to develop/build things on that package without depending on a specific backend (i.e. this removes tensorflow dependency)
  • separate what is related to rl in general vs stable-baselines
  • ease the transition between tf 1 and tf 2 (Tensorflow 2.0 support? #366) as we could decide to have two separate versions of SB instead of maintaining a version with tf1 and tf2 support at the same time (which seems complicated especially because of the removal of tf.contrib)

As a personal example, I have now an internal (minimal) pytorch version of stable-baselines (I plan to release it at some point but it is used for my own research purposes for now) where I had to copy a lot of what was present in the common folder.

I would like to have your thoughts on that ;)
@erniejunior @AdamGleave @Miffyli

Suggestions for the name of the package are also welcomed, for now we have:

  • baselines-common (by @hill-a)
  • rl-common (my proposition)

the repo would be under the stable-baselines team organization: https://github.com/Stable-Baselines-Team

@araffin araffin added the enhancement New feature or request label Oct 9, 2019
@Miffyli
Copy link
Collaborator

Miffyli commented Oct 10, 2019

Which parts of common is this referring to, exactly? Files like distributions.py and policies.py, the other ones (not distributions/policies) or all files? Would the aim be to move most/all backend stuff to this new package, and then we can implement this package with different backends? The stable-baselines portion would then use these tools to implement the final RL algorithms?

I agree this would be a useful thing, especially now with the upcoming transitions to a new backend. It would be a lot of work and debugging to get everything right (and even then probably some clashing with interfaces), but it would help keeping current stable-baselines structure clean. At the same time we can tidy up other things, e.g.

  • Reorganize/rename some of the code under common which is somewhat sprinkled around
  • Get rid of the oddballs like layer definitions being in a2c.utils and common.distributions importing from there (but a2c then imports stuff from distributions via policies ???)

As for the name: I vote for "baselines-common". Term "Baselines" is already familiar in (D)RL scene. "rl-common" is nice too, but there is still the very active part of "non-deep reinforcement learning" which is quite different from the "deep reinforcement learning" where network-like models screw up everything :)

Edit: Cpt. Obvious here but the first thing to do is to gather solid baseline results of current stable-baselines/baselines we then should be able to replicate in the new backends or when major things change. I think results in the RLZoo would do the trick?

@araffin
Copy link
Collaborator Author

araffin commented Oct 10, 2019

Which parts of common is this referring to, exactly?

Everything that is not related to tensorflow, I see at least (not exhaustive list):

  • vec envs
  • noise
  • maths utils
  • monitor
  • debug envs
  • replay buffer

Reorganize/rename some of the code under common which is somewhat sprinkled around
Get rid of the oddballs like layer definitions being in a2c.utils

Totally agree

When using a new backend, you will need to implement those specific parts:

  • base class (at least for getting/setting the parameters)
  • policies
  • distributions
  • buffers (rollout buffer and replay buffer), the current replay buffer implementation rely on numpy only but could be improved if using a specific backend

I think results in the RLZoo would do the trick?

That is the whole point of rl zoo ;)
Note: I have successfully replicated results with the pytorch backend, so I think I now got some experience of what can go wrong ;)

@Miffyli
Copy link
Collaborator

Miffyli commented Oct 10, 2019

To check I understood this right: In summary, this would move backend independent tools used by one or more RL methods, used by environments or generally used for debugging to a new package.

This sounds good. The lack of assumptions on the backend makes throughout optimization impossible (e.g. using TF tensors to store replay data or so), but that is not what this package is for and I agree this would be very useful for quick debugging and setting up algorithms. 👍

I think repo for such package should be quick to setup by cloning stable-baselines setup with all the CI things and how tests/docs are done, etc?

PyTorch experiments: Yay! Now at least there is one of us who knows how to do them properly ^^.

@araffin
Copy link
Collaborator Author

araffin commented Oct 10, 2019

I think repo for such package should be quick to setup by cloning stable-baselines setup with all the CI things and how tests/docs are done, etc?

It will require at least one day I think and then some iteration to refactor things nicely.
But yes, in essence, this is mostly copy-pasting.
The repo will be under stable-baselines-team organization.

@araffin araffin pinned this issue Oct 10, 2019
@Miffyli
Copy link
Collaborator

Miffyli commented Oct 10, 2019

Ah, I was talking about setting it up so we can start working on it ^^. Naturally the full refactoring / better documentation / etc will take longer, but at least would have a place to start with.

Would still like to hear comments from others, though (@erniejunior @AdamGleave)

@AdamGleave
Copy link
Collaborator

I'm weakly in favour of this. I think separating the common dependencies out would be slightly cleaner, although it might also slightly complicate setting up the development environment, which could deter some new contributors.

I suspect you'll be one of the few users of this change. So if it'll save you time to restructure this then go for it. But unless others chime up I would not do it for the good of the community. Not many people write RL frameworks from scratch, and those that do will often not want to use Stable Baselines "common" implementation. Even if we clean things up there will be lots of design decisions embedded in this. For example, the VecEnv interface works well for parallelizing at small scale on a single machine, but the synchronous API prevents it scaling to distributed RL workers.

@araffin
Copy link
Collaborator Author

araffin commented Oct 13, 2019

I suspect you'll be one of the few users of this change.

That's true... Now, you make me ensure about what to do...

slightly complicate setting up the development environment

On that part, I would expect the "common" package to be relatively stable (in term of amount of change) compared to the main repo.

Not many people write RL frameworks from scratch

True, I was mostly thinking of people implementing only one algorithm. For instance, I've already seen the replay buffer implementation copied over.

Do you think there could be an alternative, like a git-submodule instead of a package?
Better option: I think I've seen that somewhere, but I'm not sure it's possible, we create a second package using options in the setup.py while keeping everything else the same (so everything stay in the same repo, we would need maybe to reorganize things though)

@Miffyli
Copy link
Collaborator

Miffyli commented Oct 14, 2019

The suggestions sound even more obscure than having a separate package. If goal is to also offer people a set of miscellaneous and independent utilities for RL algorithms, I do not see how git-submodules or the proposed setup.py method helps.

I have personally looked for independent implementations of some algorithms like more sophisticated return/advantage computations (e.g. GAE, V-trace), since it is easy have bugs in them. In that sense a repository of independent and common tools would be nice, but I do not think this would be easy if even possible to do in independent fashion, as the definition of the function can be closely tied to the learning algorithm itself.

@AdamGleave
Copy link
Collaborator

I agree with @Miffyli that if we're going to split it then it's better to be in a separate package. Probably best to be in its own repo too, although you can have multiple Python packages in a single repository.

@Miffyli Miffyli mentioned this issue Nov 4, 2019
4 tasks
@araffin araffin unpinned this issue Nov 23, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants