Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Leaderboard Entry for PPO #580

Merged
merged 2 commits into from
Feb 23, 2022
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 11 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -127,16 +127,17 @@ This leaderboard tracks the results achieved by algorithms on the `llvm-ic-v0`
environment on the 23 benchmarks in the `cbench-v1` dataset.

| Author | Algorithm | Links | Date | Walltime (mean) | Codesize Reduction (geomean) |
| --- | --- | --- | --- | --- | --- |
| Facebook | Random search (t=10800) | [write-up](leaderboard/llvm_instcount/random_search/README.md), [results](leaderboard/llvm_instcount/random_search/results_p125_t10800.csv) | 2021-03 | 10,512.356s | **1.062×** |
| Facebook | Random search (t=3600) | [write-up](leaderboard/llvm_instcount/random_search/README.md), [results](leaderboard/llvm_instcount/random_search/results_p125_t3600.csv) | 2021-03 | 3,630.821s | 1.061× |
| Facebook | Greedy search | [write-up](leaderboard/llvm_instcount/e_greedy/README.md), [results](leaderboard/llvm_instcount/e_greedy/results_e0.csv) | 2021-03 | 169.237s | 1.055× |
| Facebook | Random search (t=60) | [write-up](leaderboard/llvm_instcount/random_search/README.md), [results](leaderboard/llvm_instcount/random_search/results_p125_t60.csv) | 2021-03 | 91.215s | 1.045× |
| Facebook | e-Greedy search (e=0.1) | [write-up](leaderboard/llvm_instcount/e_greedy/README.md), [results](leaderboard/llvm_instcount/e_greedy/results_e10.csv) | 2021-03 | 152.579s | 1.041× |
| Jiadong Guo | Tabular Q (N=5000, H=10) | [write-up](leaderboard/llvm_instcount/tabular_q/README.md), [results](leaderboard/llvm_instcount/tabular_q/results-H10-N5000.csv) | 2021-04 | 2534.305 | 1.036× |
| Facebook | Random search (t=10) | [write-up](leaderboard/llvm_instcount/random_search/README.md), [results](leaderboard/llvm_instcount/random_search/results_p125_t10.csv) | 2021-03 | **42.939s** | 1.031× |
| Patrick Hesse | DQN (N=4000, H=10) | [write-up](leaderboard/llvm_instcount/dqn/README.md), [results](leaderboard/llvm_instcount/dqn/results-instcountnorm-H10-N4000.csv) | 2021-06 | 91.018s | 1.029× |
| Jiadong Guo | Tabular Q (N=2000, H=5) | [write-up](leaderboard/llvm_instcount/tabular_q/README.md), [results](leaderboard/llvm_instcount/tabular_q/results-H5-N2000.csv) | 2021-04 | 694.105 | 0.988× |
| --- | --- | --- | --- |-----------------|------------------------------|
| Leibniz University| PPO + Guided Search | [write-up](leaderboard/llvm_instcount/ppo/README.md), [results](leaderboard/llvm_instcount/random_search/results.csv) | 2022-02 | 69.821s | **1.070×** |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's entirely up to you, but if you'd prefer to list your three names here rather than the institution, that's totally fine :)

| Facebook | Random search (t=10800) | [write-up](leaderboard/llvm_instcount/random_search/README.md), [results](leaderboard/llvm_instcount/random_search/results_p125_t10800.csv) | 2021-03 | 10,512.356s | 1.062× |
| Facebook | Random search (t=3600) | [write-up](leaderboard/llvm_instcount/random_search/README.md), [results](leaderboard/llvm_instcount/random_search/results_p125_t3600.csv) | 2021-03 | 3,630.821s | 1.061× |
| Facebook | Greedy search | [write-up](leaderboard/llvm_instcount/e_greedy/README.md), [results](leaderboard/llvm_instcount/e_greedy/results_e0.csv) | 2021-03 | 169.237s | 1.055× |
| Facebook | Random search (t=60) | [write-up](leaderboard/llvm_instcount/random_search/README.md), [results](leaderboard/llvm_instcount/random_search/results_p125_t60.csv) | 2021-03 | 91.215s | 1.045× |
| Facebook | e-Greedy search (e=0.1) | [write-up](leaderboard/llvm_instcount/e_greedy/README.md), [results](leaderboard/llvm_instcount/e_greedy/results_e10.csv) | 2021-03 | 152.579s | 1.041× |
| Jiadong Guo | Tabular Q (N=5000, H=10) | [write-up](leaderboard/llvm_instcount/tabular_q/README.md), [results](leaderboard/llvm_instcount/tabular_q/results-H10-N5000.csv) | 2021-04 | 2534.305 | 1.036× |
| Facebook | Random search (t=10) | [write-up](leaderboard/llvm_instcount/random_search/README.md), [results](leaderboard/llvm_instcount/random_search/results_p125_t10.csv) | 2021-03 | 42.939s | 1.031× |
| Patrick Hesse | DQN (N=4000, H=10) | [write-up](leaderboard/llvm_instcount/dqn/README.md), [results](leaderboard/llvm_instcount/dqn/results-instcountnorm-H10-N4000.csv) | 2021-06 | 91.018s | 1.029× |
| Jiadong Guo | Tabular Q (N=2000, H=5) | [write-up](leaderboard/llvm_instcount/tabular_q/README.md), [results](leaderboard/llvm_instcount/tabular_q/results-H5-N2000.csv) | 2021-04 | 694.105 | 0.988× |


## Contributing
Expand Down
98 changes: 98 additions & 0 deletions leaderboard/llvm_instcount/ppo/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
<!-- To submit a leaderboard entry please fill in this document follow the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: please strip the trailing whitespace in this file

instructions in the CONTRIBUTING.md document to file a pull request. -->
# Proximal Policy Optimization with Guided Random Search

**tldr;**
Proximal Policy Optimization (PPO) followed by guided search using the action
probabilities of the PPO-Model

**Authors:** Nicolas Fröhlich, Robin Schmöcker, Yannik Mahlau
<!-- A comma separated list of authors. -->


**Publication:** Not Available
<!-- A link to a publication, if applicable. -->


**Results:** Geometric Mean: 1.070, [results](/results.csv)
<!-- Add one or more links to CSV files containing the raw results. -->


**CompilerGym version:** 0.2.1
<!-- You may print the version of CompilerGym that is installed from the command
line by running:

python -c 'import compiler_gym; print(compiler_gym.__version__)'
-->



**Is the approach Open Source?:** Yes
<!-- Whether you have released the source code of your approach, yes/no. If
yes, please state the license. -->
The source code is available as Open-Source:
https://github.com/xtremey/ppo_compiler_gym
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For us to accept this as open source your code must have a license (further reading here). Please add a LICENSE file to the linked repository, and state the name of the license here, e.g. "Is the approach Open Source? Yes, under Apache v2. Source code is here: ...".

Also, please link to an exact commit, such as: https://github.com/xtremey/ppo_compiler_gym/tree/99a677eaefdb33eddd01299aa683f8d0c14d8643
this enables you to make changes to your code without preventing people from reproducing your results here.


**Did you modify the CompilerGym source code?:** No (apart from state space wrappers)
<!-- Whether you made any substantive changes to the CompilerGym source code,
e.g. to optimize the implementation or change the environment dynamics. yes/no.
If yes, please briefly summarize the modifications. -->

**What parameters does the approach have?**
<!-- A description of any tuning parameters. -->
| Hyperparameter | Value |
|--------------------------- |--------- |
| Number of Epochs | 80 |
| Epsilon Clip | 0.1 |
| Mean Square Error Factor | 0.5 |
| Entropy Factor | 0.01 |
| Learning Rate | 0.0005 |
| Trajectories until Update | 20 |
| Hidden Layer Size | 128 |
| Activation Function | TanH |
| Number of Layers | 4 |
| Shared Parameter Layers | First 3 |
| Optimizer | Adam |

**What range of values were considered for the above parameters?**
<!-- Briefly describe the ranges of values that were considered for each
parameter, and the metrics and dataset used to select from the values. -->
We experimented a little bit with the hyperparameters of PPO, but the results did not
change drastically. Therefore, we did not perform any Hyperparameter Optimization.

**Is the policy deterministic?:** No
<!-- Whether the (state, action) policy is deterministic, yes/no. -->

## Description
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, the PPO agent was trained on the same cbench programs that are used for the test set. Is that right? If so, this is an important distinction to make to make because it means that the agent is not being evaluated on its ability to generalize to an unseen program. Could you please add a few more details about the training process?

Since you are releasing your code, consider adding the hardware you used and commands needed to replicate the work. Take a look at this for an example: https://github.com/facebookresearch/CompilerGym/blob/development/leaderboard/llvm_instcount/dqn/README.md#experimental-setup


<!-- A brief summary of the approach. Please try to be sufficiently descriptive
such that someone could replicate your approach. Insert links to external sites,
publications, images, or other pages where relevant. -->

Our Model uses the Proximal Policy Optimization (PPO) Architecture:
https://arxiv.org/abs/1707.06347

We used a wrapper to extend the state space such that the number of remaining steps is
an additional entry in the state space (as a number, not one hot encoded). During
training we limited the number of steps per episode to 200.

In a second step we use the action probabilities of the model to perform a guided
random search (also for 200 steps). We limited the search time to one minute for each
environment.

In a third step we optimized the best trajectory found during random search by taking 500
additional steps using the models action probabilities. This did not yield improvement
for all environments, but sometimes improved solution a little with basically no
computational cost. Therefore, the maximum possible length of a trajectory is 700.
However, most trajectories are much shorter.

We excluded the Ghostscript benchmark during training since it took a lot of computation
and presented itself as a bottleneck. Additionally, we excluded the random search and additional
steps for this benchmark since it did not yield any improvement and drastically increased the mean
walltime.


### Credit
Credit to nikhilbarhate99
(https://github.com/nikhilbarhate99/PPO-PyTorch/blob/master/PPO.py).
Parts of the rollout buffer and the update method are taken from this repo.
Loading