Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split CompilerEnv.step() into two methods for singular or lists of actions #611

Conversation

ChrisCummins
Copy link
Contributor

@ChrisCummins ChrisCummins commented Mar 6, 2022

CompilerEnv.step() currently accepts two types for the "action" argument:

(1) a scalar action:

>>> env.step(action)

(2) an iterable of actions:

>>> env.step([action1, action2])

This PR splits this overloaded behavior into two methods: CompilerEnv.step() only takes a single action, and CompilerEnv.multistep() only takes an iterable sequence of actions:

>>> env.step(action)
>>> env.multistep([action1, action2])

For now, calling CompilerEnv.step() with a list of actions still works, though with a deprecation warning. In the v0.2.4 release support for lists of actions in CompilerEnv.step() will be removed.

Benefits

Passing a list of actions to execute in a single step enables them to be executed in a single RPC invocation, significantly reducing the overhead of round trips to the backend, and removing the need to calculate observation/rewards for each individual step. We measured speedups of ~3x on typical LLVM workloads using this (more details here).

Drawbacks

Adding a new method means we probably need to refactor all step wrappers to overload the multistep() wrappers, as otherwise this could be missed:

class MyStepWrapper(Wrapper):
  def step(action):
    # .. my overload

env = MyStepWrapper(env)
env.multistep(actions)   # oh no! Not using the overload

Instead, we can require that wrappers that wish to change the behavior of an environments step function override the raw_step() method, as that method is the common denominator between step() and multistep(), and also has a tighter function signature with less room for mixing and matching different arg types.

Fixes #610.

CompilerEnv.step() currently accepts two types for the "action"
argument: a scalar action, or an iterable of actions. This kind of
type overloading does not work for list types.

This adds a new method, CompilerEnv.multistep(), that explicitly takes
takes an iterable sequence of actions. If you want to run multiple
actions in a single step, call this new method. Calling
CompilerEnv.step() with a list of actions still works, though with a
deprecation warning. In the v0.2.4 release support for lists of
actions in CompilerEnv.step() will be removed.

Fixes facebookresearch#610.
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 6, 2022
@ChrisCummins
Copy link
Contributor Author

ChrisCummins commented Mar 6, 2022

This does not subsume #606 as it does not refactor any of the existing env.step(actions) callsites. #606 also includes some other fixes for type annotations and wrappers that are not incorporated here.

Cheers,
Chris

@ChrisCummins ChrisCummins added this to the v0.2.3 milestone Mar 6, 2022
@ChrisCummins
Copy link
Contributor Author

cc @sogartar, @mostafaelhoushi, @hughleat

Still WIP but looking for feedback on the approach

@codecov-commenter
Copy link

codecov-commenter commented Mar 6, 2022

Codecov Report

Merging #611 (51452e1) into development (8cf7edd) will decrease coverage by 49.92%.
The diff coverage is 83.33%.

Impacted file tree graph

@@               Coverage Diff                @@
##           development     #611       +/-   ##
================================================
- Coverage        88.19%   38.26%   -49.93%     
================================================
  Files              114      114               
  Lines             6708     6708               
================================================
- Hits              5916     2567     -3349     
- Misses             792     4141     +3349     
Impacted Files Coverage Δ
compiler_gym/envs/compiler_env.py 56.51% <83.33%> (-35.33%) ⬇️
compiler_gym/util/logging.py 0.00% <0.00%> (-100.00%) ⬇️
compiler_gym/util/flags/seed.py 0.00% <0.00%> (-100.00%) ⬇️
compiler_gym/util/statistics.py 0.00% <0.00%> (-100.00%) ⬇️
compiler_gym/util/flags/nproc.py 0.00% <0.00%> (-100.00%) ⬇️
compiler_gym/wrappers/__init__.py 0.00% <0.00%> (-100.00%) ⬇️
compiler_gym/util/capture_output.py 0.00% <0.00%> (-100.00%) ⬇️
compiler_gym/util/flags/episodes.py 0.00% <0.00%> (-100.00%) ⬇️
compiler_gym/leaderboard/__init__.py 0.00% <0.00%> (-100.00%) ⬇️
compiler_gym/util/flags/output_dir.py 0.00% <0.00%> (-100.00%) ⬇️
... and 79 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8cf7edd...51452e1. Read the comment docs.

@ChrisCummins ChrisCummins force-pushed the feature/deprecate-action-list branch 4 times, most recently from aea65bf to 0a0747a Compare March 8, 2022 00:11
This makes the following changes:

- Changes env.step() `action` to accept only a single action, with a
deprecation warning if a list of actions are provided.

- Renames env.step() `observations` to `observation_spaces`. The old
parameter name is still accepted with a deprecation warning.

- Renames env.step() `rewards` to `reward_spaces`. The old parameter
name is still accepted with a deprecation warning.
@ChrisCummins ChrisCummins force-pushed the feature/deprecate-action-list branch from 0a0747a to b1f9227 Compare March 9, 2022 22:15
@sogartar
Copy link

@ChrisCummins, it looks like you have a lot on your hands. Would you like me to finish this?

@ChrisCummins
Copy link
Contributor Author

@ChrisCummins, it looks like you have a lot on your hands. Would you like me to finish this?

That would be great, thanks! 🙂 Couple of heads up: development is currently broken after I prematurely merged #620 (woops!), and if you look at the most-recent commit on this branch I expanded the scope to also rename {observation,reward}s args to the more clear {observation,reward}_spaces

Lemme know how you get on. Once this lands I'll make release v0.2.3. That'll unblock you guys from making the breaking change to env.step() in removing the action list workaround

Cheers,
Chris

@sogartar
Copy link

sogartar commented Mar 10, 2022

@ChrisCummins, isn't raw_step(...) a detail that should be used only by classes deriving from CompilerEnv? Is this really needed for all wrapper environments deriving from CompilerEnvWrapper? I understand why fork(...) is a good extension of the public interface, but raw_step seems to complicate things.
You can even have an implementation that does not derive from CompilerEnv.
I have a commit 7c6b377 that refactors out an interface from ComilerEnv, so that there is a clean separation of interface and implementation and the dependency inversion principle is followed. I have not submitted it yet as PR because it depends on this one.

@sogartar
Copy link

I may have misunderstood the intention, but isn't the main responsibility of CompilerEnv to be a gRPC bridge for a remote compiler session? I would even argue that the detail of accepting multiple actions per step is with each environment, but this will introduce a breaking change to the action spaces of the existing environments, so it is probably something you don't want to do.

@ChrisCummins
Copy link
Contributor Author

Hi @sogartar,

I may have misunderstood the intention, but isn't the main responsibility of CompilerEnv to be a gRPC bridge for a remote compiler session?

Yep that's right.

I would even argue that the detail of accepting multiple actions per step is with each environment, but this will introduce a breaking change to the action spaces of the existing environments, so it is probably something you don't want to do.

The idea behind step(actions) is for performance for cases where you have a list of actions to apply and you don't need intermediate observations/rewards. The reason you may want to avoid calling env.step() in a loop is that every op has the overhead of a gRPC round-trip, and calculating observations/rewards can be very expensive. Its not ideal as it is an implementation detail leaking into the API, but I think it has advantages. If we were starting from scratch, I could image a lazy API that would hide these optimizations from the user, kinda like:

op = Action(env)
op.add_action(1)
op.add_action(2)
op.add_action(3)
result = op.eval()

But in general, we try to make CompilerEnv as close to gym.Env as possible. Any code for gym.Env should run on CompilerEnv without modification. API extensions like env.multistep() and env.fork() can be ignored if the user doesn't need them.

isn't raw_step(...) a detail that should be used only by classes deriving from CompilerEnv? Is this really needed for all wrapper environments deriving from CompilerEnvWrapper?

The reason I refactored the wrappers to overload raw_step() is because it is the common denominator between step() and multistep():

class MyWrapper(CompilerEnvWrapper):
    def step(self, action, *args, **kwargs):
        print(action)  # do something interesting
        return self.env.step(action, *args, **kwargs)

env = compiler_gym.make("llvm-v0")
env = MyWrapper(env)

env.step(1)  # works as expected
env.multistep([1, 2, 3])  # oh dear :(

raw_step() is also called after string observation/reward space names have been converted into their underlying class representations, so IMO is easier to overload.

You can even have an implementation that does not derive from CompilerEnv. I have a commit 7c6b377 that refactors out an interface from ComilerEnv, so that there is a clean separation of interface and implementation and the dependency inversion principle is followed. I have not submitted it yet as PR because it depends on this one.

I'm not sure I follow. What is the advantage of having an additional Env base class, or an implementation that does not derive from CompilerEnv?

Cheers,
Chris

@sogartar
Copy link

I'm not sure I follow. What is the advantage of having an additional Env base class, or an implementation that does not derive from CompilerEnv?

For a user of an environment the communication with the service is a detail that may change to use another mechanism. You can have multiple such mechanism coexisting for different environments. For example running an environment in the same process thread or using some other form of inter-process communication like memory-mapped files. Another network protocol is also possible. CompilerGym's environment interface is a superset over OpenAI Gym interface and it would be good to be cleanly separated from any implementations.

@sogartar
Copy link

step is just a special case of multistep. You can have a default implementation in the wrapper

def step(action, **kwargs):
    return multistep([action], **kwargs)

Then you only have to override multistep instead of raw_step.
Either way, wrappers would have to be refactored to override something different from step. If raw_step leaks from CompilerEnv into the wrappers it adds one more method. Isn't multistep = raw_step + housekeeping? Is raw_step needed as a part of the interface?

@ChrisCummins
Copy link
Contributor Author

If raw_step leaks from CompilerEnv into the wrappers it adds one more method. Isn't multistep = raw_step + housekeeping? Is raw_step needed as a part of the interface?

You're right, raw_step is multistep with housekeeping. My initial feeling was that it would be easier to overload step after that housekeeping, but I'm not wedded to the idea. I can see that overloading multistep() has the advantage of one fewer methods.

Cheers,
Chris

@ChrisCummins
Copy link
Contributor Author

Superseded by #627.

@ChrisCummins ChrisCummins deleted the feature/deprecate-action-list branch March 17, 2022 18:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Deprecate passing a sequence of actions to env.step()
4 participants