[llvm] Add API to construct a benchmark from clang invocation. #577

ChrisCummins · 2022-02-17T14:36:17Z

This adds a new API for constructing a benchmark from a command line invocation of a compiler. For example, given the command line:

$ gcc a.c src/b.c -DNDEBUG -lm -Oz -o foo.exe

The make_benchmark_from_command_line() method interprets this command line and rewrites it to one that emits an LLVM-IR file:

/path/to/compilergym/clang a.c src/b.c -DNDEG -lm -Oz -emit-llvm -c -o /path/to/benchmark.bc -Xclang disable-llvm-optmzns

It then loads the resulting bitcode for use as a benchmark. When the user is done playing around with the benchmark, they call a special compiler() method on the benchmark which re-invokes the original command line, but this time using the bitcode as input:

/path/to/compilergym/clang /path/to/benchmark.nc -DNDEG -lm -Oz -o foo.exe

Putting this all together, here is an end-to-end example using CompilerGym as a substitute for LLVM's pipeline:

>>> import compiler_gym
>>> env = compiler_gym.make("llvm-v0")
>>> benchmark = env.make_benchmark_from_command_line(                                                                                                                                                                                                                       
...    "gcc a.c src/b.c -DNDEBUG -lm -Oz -o foo.exe"
... )                                                                                                                                                                                                                                                                       
>>> for _ in range(100):
...     _, _, done, _ env.step(env.action_space.sample()
...     if done:
...         env.reset()
>>> env.benchmark.compile()
# Now the file foo.exe is compiled

The idea is to make it easier to integrate CompilerGym into an existing build system, as it is normally possible to dump a verbose log of all of the compiler invocations run during a build.

codecov-commenter · 2022-02-17T14:57:20Z

Codecov Report

Merging #577 (72ef9b1) into development (63dbfac) will increase coverage by 0.17%.
The diff coverage is 95.97%.

@@               Coverage Diff               @@
##           development     #577      +/-   ##
===============================================
+ Coverage        88.36%   88.54%   +0.17%     
===============================================
  Files              125      127       +2     
  Lines             7565     7761     +196     
===============================================
+ Hits              6685     6872     +187     
- Misses             880      889       +9

Impacted Files	Coverage Δ
...piler_gym/envs/llvm/benchmark_from_command_line.py	`92.68% <92.68%> (ø)`
...ler_gym/third_party/gccinvocation/gccinvocation.py	`95.91% <95.91%> (ø)`
compiler_gym/envs/llvm/llvm_env.py	`91.61% <98.27%> (+3.32%)`	⬆️
compiler_gym/bin/validate.py	`87.09% <100.00%> (ø)`
compiler_gym/envs/llvm/__init__.py	`100.00% <100.00%> (ø)`
compiler_gym/service/connection.py	`77.92% <0.00%> (-0.98%)`	⬇️
...ompiler_gym/service/client_service_compiler_env.py	`91.26% <0.00%> (+0.41%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 63dbfac...72ef9b1. Read the comment docs.

mostafaelhoushi

Super Cool!
I am wondering is that environment compatible with LlvmEnv or GccEnv? I mean can we directly call the existing observation or action functions on that environment?

mostafaelhoushi · 2022-02-17T16:54:03Z

compiler_gym/envs/llvm/benchmark_from_command_line.py

+        uri = (
+            f"benchmark://clang-v0/{urllib.parse.quote_plus(shlex.join(command_line))}"
+        )


The dataset or benchmark needs to be named clang-v0?

mostafaelhoushi · 2022-02-17T16:58:04Z

compiler_gym/envs/llvm/llvm_env.py

+        For example, the command line:
+
+            >>> benchmark = env.make_benchmark_from_command_line(
+            ...     ["gcc", "-DNDEBUG", "a.c", "b.c", "-o", "foo", "-lm"]


I am wondering if "gcc" works here... as down below we add -Xclang to the arguments

Yes, this function takes an argument replace_driver (default=True) that swaps out whatever the first argument is for clang. So this will work:

>>> env.make_benchmark_from_command_line(["gcc", "in.c"])

and will get translated to something like:

/path/to/compiler_gym/llvm/bin/clang in.c -c -emit-llvm -o - -Xclang disable-llvm-optmzns

but this will not:

>>> env.make_benchmark_from_command_line(["gcc", "in.c"], replace_driver=False)

The idea with replace_driver=False is that it allows you to use your own compiler to produce the input bitcode, providing your compiler produces compatible bitcode.

The downside is that it won't work with command line options that aren't the same between whatever compiler the user uses and our version of clang.

I'll extend the docstring to explain this better

mostafaelhoushi · 2022-02-17T17:05:53Z

compiler_gym/third_party/gccinvocation/gccinvocation.py

+        DRIVER_NAMES = (
+            "c89",
+            "c99",
+            "cc",
+            "gcc",
+            "c++",
+            "g++",
+            "xgcc",
+            "clang",
+            "clang++",
+        )


so is this library literary gccinvokcation or is it also a generic C compiler invocator?

I've hacked on it to add support for clang by adding support for a few clang-specific flags, though I'm sure I've missed some

I don't want to change the name of the third-party library, though if my edits work well then I could submit them upstream and consider renaming it to "ccinvocation" or something

hughleat

LGTM.
Quick question though. How does it deal with mixed .c and .o files? E.g.:

gcc foo.c bar.o -o foo

My guess is that bar.o will be considered a source and that clang will barf trying to bitcode-ise it. Should those be filtered out for linking?

ChrisCummins · 2022-04-27T16:33:01Z

LGTM. Quick question though. How does it deal with mixed .c and .o files?

Good question! I added support so that .o files are excluded from the bitcode environment, but are kept for linking later.

The current implementation for this will not work if the CompilerGym backend and frontend are on different machines. To fix that, we need #325.

Cheers,
Chris

This adds a new API for constructing a benchmark from a command line invocation of a compiler. For example: >>> benchmark = env.make_benchmark_from_command_line( ... ["gcc", "in.c", "-DNDEBUG"] ... ) This provides control over the build, equivalent to: >>> benchmark = env.make_benchmark(["in.c"], copts=["-NDEBUG"]) The idea is to make it easier to integrate CompilerGym into an existing build system, as it is normally possible to dump a verbose log of all of the compiler invocations run during a build.

This enables commandlines with a combination of source and objectfile inputs to be supported.

@sogartar

This release adds a new compiler environment, new APIs, and a suite of backend improvements to improve the flexibility of CompilerGym environments. Many thanks to code contributors: @sogartar, @KyleHerndon, @SoumyajitKarmakar, @uduse, and @anthony0727! Highlights of this release include: - [mlir] Began work on a new environment for matrix multiplication using MLIR ([#652](#652), thanks @KyleHerndon and @sogartar!). Note this environment is not yet included in the pypi package and must be [compiled from source](https://github.com/facebookresearch/CompilerGym/blob/development/INSTALL.md#building-from-source-with-cmake). - [llvm] Added a new `env.benchmark_from_clang_invocation()` method ([#577](#577)) that can be used for constructing LLVM environment automatically from C/C++ compiler invocations. This makes it much easier to integrate CompilerGym with your existing build scripts. - Added three new wrapper classes: `Counter`, that provides op counts for analysis ([#683](#683)); `SynchronousSqliteLogger`, that provides logging of environment interactions to a relational database ([#679](#679)), and `ForkOnStep` that provides an `undo()` operation ([#682](#682)). - Added `reward_space` and `observation_space` parameters to `env.reset()` ([#659](#659), thanks @SoumyajitKarmakar!) This release includes a number of improvements to the backend APIs that make it easier to write new CompilerGym environments: - Refactored the backend to make `CompilerEnv` an abstract interface, and `ClientServiceCompilerEnv` the concrete implementation of this interface. This enables new environments to be implemented without using gRPC ([#633](#633), thanks @sogartar!). - Extended the support for different types of action and observation spaces ([#641](#641), [#643](#643), thanks @sogartar!), including new `Permutation` and `SpaceSequence` spaces ([#645](#645), thanks @sogartar!).. - Added a new `disk/` subdirectory to compiler service's working directories, which is symlinked to an on-disk location for devices which support in-memory working directories. This fixes a bug with leftover temporary directories from LLVM ([#672](#672)). This release also includes numerous bug fixes and improvements, many of which were reported or fixed by the community. For example, fixing a bug in cache file locations ([#656](#656), thanks @uduse!), and a missing flag definition in example code ([#684](#684), thanks @anthony0727!). **Full Changelog**: v0.2.3...v0.2.4 This release brings in deprecating changes to the core `env.step()` routine, and lays the groundwork for enabling new types of compiler optimizations to be exposed through CompilerGym. Many thanks to code contributors: @mostafaelhoushi, @sogartar, @KyleHerndon, @uduse, @parthchadha, and @xtremey! Highlights of this release include: - Added a new `TextSizeInBytes` observation space for LLVM ([#575](#575)). * Added a new PPO leaderboard entry ([#580](#580). Thanks @xtremey! - Fixed a bug in which temporary directories created by the LLVM environment were not cleaned up ([#592](#592)). - **[Backend]** The function `createAndRunCompilerGymService` now returns an int, which is the exit return code ([#592](#592)). - Improvements to the examples documentation ([#548](#548)) and FAQ ([#586](#586)) Deprecations and breaking changes: - `CompilerEnv.step` no longer accepts a list of actions ([#627](#627)). A new method, `CompilerEnv.multistep` provides this functionality. This is to provide compatibility with environments whose action spaces are lists. To update your code, replace any calls to `env.step()` which take a list of actions to use `env.multistep()`. Thanks @sogartar! - The arguments `observations` and `rewards` to `step()` have been renamed `observation_spaces` and `reward_spaces`, respectively ([#627](#627)). - `Reward.id` has been renamed `Reward.name` ([#565](#565), [#612](#612)). Thanks @parthchadha! * The backend protocol buffer schema has been updated to natively support more types of observation and action, and to support nested spaces ([#531](#531)). Thanks @sogartar!

@sogartar

This release adds a new compiler environment, new APIs, and a suite of backend improvements to improve the flexibility of CompilerGym environments. Many thanks to code contributors: @sogartar, @KyleHerndon, @SoumyajitKarmakar, @uduse, and @anthony0727! Highlights of this release include: - [mlir] Began work on a new environment for matrix multiplication using MLIR ([#652](#652), thanks @KyleHerndon and @sogartar!). Note this environment is not yet included in the pypi package and must be [compiled from source](https://github.com/facebookresearch/CompilerGym/blob/development/INSTALL.md#building-from-source-with-cmake). - [llvm] Added a new `env.benchmark_from_clang_invocation()` method ([#577](#577)) that can be used for constructing LLVM environment automatically from C/C++ compiler invocations. This makes it much easier to integrate CompilerGym with your existing build scripts. - Added three new wrapper classes: `Counter`, that provides op counts for analysis ([#683](#683)); `SynchronousSqliteLogger`, that provides logging of environment interactions to a relational database ([#679](#679)), and `ForkOnStep` that provides an `undo()` operation ([#682](#682)). - Added `reward_space` and `observation_space` parameters to `env.reset()` ([#659](#659), thanks @SoumyajitKarmakar!) This release includes a number of improvements to the backend APIs that make it easier to write new CompilerGym environments: - Refactored the backend to make `CompilerEnv` an abstract interface, and `ClientServiceCompilerEnv` the concrete implementation of this interface. This enables new environments to be implemented without using gRPC ([#633](#633), thanks @sogartar!). - Extended the support for different types of action and observation spaces ([#641](#641), [#643](#643), thanks @sogartar!), including new `Permutation` and `SpaceSequence` spaces ([#645](#645), thanks @sogartar!).. - Added a new `disk/` subdirectory to compiler service's working directories, which is symlinked to an on-disk location for devices which support in-memory working directories. This fixes a bug with leftover temporary directories from LLVM ([#672](#672)). This release also includes numerous bug fixes and improvements, many of which were reported or fixed by the community. For example, fixing a bug in cache file locations ([#656](#656), thanks @uduse!), and a missing flag definition in example code ([#684](#684), thanks @anthony0727!). **Full Changelog**: v0.2.3...v0.2.4 This release brings in deprecating changes to the core `env.step()` routine, and lays the groundwork for enabling new types of compiler optimizations to be exposed through CompilerGym. Many thanks to code contributors: @mostafaelhoushi, @sogartar, @KyleHerndon, @uduse, @parthchadha, and @xtremey! Highlights of this release include: - Added a new `TextSizeInBytes` observation space for LLVM ([#575](#575)). * Added a new PPO leaderboard entry ([#580](#580). Thanks @xtremey! - Fixed a bug in which temporary directories created by the LLVM environment were not cleaned up ([#592](#592)). - **[Backend]** The function `createAndRunCompilerGymService` now returns an int, which is the exit return code ([#592](#592)). - Improvements to the examples documentation ([#548](#548)) and FAQ ([#586](#586)) Deprecations and breaking changes: - `CompilerEnv.step` no longer accepts a list of actions ([#627](#627)). A new method, `CompilerEnv.multistep` provides this functionality. This is to provide compatibility with environments whose action spaces are lists. To update your code, replace any calls to `env.step()` which take a list of actions to use `env.multistep()`. Thanks @sogartar! - The arguments `observations` and `rewards` to `step()` have been renamed `observation_spaces` and `reward_spaces`, respectively ([#627](#627)). - `Reward.id` has been renamed `Reward.name` ([#565](#565), [#612](#612)). Thanks @parthchadha! * The backend protocol buffer schema has been updated to natively support more types of observation and action, and to support nested spaces ([#531](#531)). Thanks @sogartar!

@sogartar

This release adds a new compiler environment, new APIs, and a suite of backend improvements to improve the flexibility of CompilerGym environments. Many thanks to code contributors: @sogartar, @KyleHerndon, @SoumyajitKarmakar, @uduse, and @anthony0727! Highlights of this release include: - [mlir] Began work on a new environment for matrix multiplication using MLIR ([#652](#652), thanks @KyleHerndon and @sogartar!). Note this environment is not yet included in the pypi package and must be [compiled from source](https://github.com/facebookresearch/CompilerGym/blob/development/INSTALL.md#building-from-source-with-cmake). - [llvm] Added a new `env.benchmark_from_clang_invocation()` method ([#577](#577)) that can be used for constructing LLVM environment automatically from C/C++ compiler invocations. This makes it much easier to integrate CompilerGym with your existing build scripts. - Added three new wrapper classes: `Counter`, that provides op counts for analysis ([#683](#683)); `SynchronousSqliteLogger`, that provides logging of environment interactions to a relational database ([#679](#679)), and `ForkOnStep` that provides an `undo()` operation ([#682](#682)). - Added `reward_space` and `observation_space` parameters to `env.reset()` ([#659](#659), thanks @SoumyajitKarmakar!) This release includes a number of improvements to the backend APIs that make it easier to write new CompilerGym environments: - Refactored the backend to make `CompilerEnv` an abstract interface, and `ClientServiceCompilerEnv` the concrete implementation of this interface. This enables new environments to be implemented without using gRPC ([#633](#633), thanks @sogartar!). - Extended the support for different types of action and observation spaces ([#641](#641), [#643](#643), thanks @sogartar!), including new `Permutation` and `SpaceSequence` spaces ([#645](#645), thanks @sogartar!).. - Added a new `disk/` subdirectory to compiler service's working directories, which is symlinked to an on-disk location for devices which support in-memory working directories. This fixes a bug with leftover temporary directories from LLVM ([#672](#672)). This release also includes numerous bug fixes and improvements, many of which were reported or fixed by the community. For example, fixing a bug in cache file locations ([#656](#656), thanks @uduse!), and a missing flag definition in example code ([#684](#684), thanks @anthony0727!). **Full Changelog**: v0.2.3...v0.2.4 This release brings in deprecating changes to the core `env.step()` routine, and lays the groundwork for enabling new types of compiler optimizations to be exposed through CompilerGym. Many thanks to code contributors: @mostafaelhoushi, @sogartar, @KyleHerndon, @uduse, @parthchadha, and @xtremey! Highlights of this release include: - Added a new `TextSizeInBytes` observation space for LLVM ([#575](#575)). * Added a new PPO leaderboard entry ([#580](#580). Thanks @xtremey! - Fixed a bug in which temporary directories created by the LLVM environment were not cleaned up ([#592](#592)). - **[Backend]** The function `createAndRunCompilerGymService` now returns an int, which is the exit return code ([#592](#592)). - Improvements to the examples documentation ([#548](#548)) and FAQ ([#586](#586)) Deprecations and breaking changes: - `CompilerEnv.step` no longer accepts a list of actions ([#627](#627)). A new method, `CompilerEnv.multistep` provides this functionality. This is to provide compatibility with environments whose action spaces are lists. To update your code, replace any calls to `env.step()` which take a list of actions to use `env.multistep()`. Thanks @sogartar! - The arguments `observations` and `rewards` to `step()` have been renamed `observation_spaces` and `reward_spaces`, respectively ([#627](#627)). - `Reward.id` has been renamed `Reward.name` ([#565](#565), [#612](#612)). Thanks @parthchadha! * The backend protocol buffer schema has been updated to natively support more types of observation and action, and to support nested spaces ([#531](#531)). Thanks @sogartar!

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 17, 2022

ChrisCummins requested review from hughleat and mostafaelhoushi February 17, 2022 15:10

mostafaelhoushi reviewed Feb 17, 2022

View reviewed changes

mostafaelhoushi approved these changes Feb 17, 2022

View reviewed changes

ChrisCummins added this to In progress in CompilerGym roadmap Mar 6, 2022

hughleat approved these changes Mar 7, 2022

View reviewed changes

ChrisCummins mentioned this pull request Mar 24, 2022

[PTAL] [WIP] Questions regarding MLIR Environment for CompilerGym #584

Open

ChrisCummins force-pushed the feature/make-env-from-clang branch 5 times, most recently from 1018a74 to e7c6f44 Compare April 27, 2022 16:28

ChrisCummins and others added 10 commits May 11, 2022 17:31

Import third-party gccinvocation library.

6091697

Split gccinvocation tests into separate test file.

8e7413b

Fix shlex.join() backward compatibility.

351934f

Update and fix LLVM custom benchmark tests.

2e8394d

Fix cmake dependency name.

cc7fa7b

[llvm] Add support for commandlines with objectfile inputs.

f13f90b

This enables commandlines with a combination of source and objectfile inputs to be supported.

[llvm] Raise an error if only object file inputs.

ddad31b

Tiny code style fix.

5f2e3df

Cross-reference issue in TODO comment.

bc88088

ChrisCummins force-pushed the feature/make-env-from-clang branch from eb8754f to 9318df7 Compare May 12, 2022 00:32

Fix cmake dependency name.

8573d81

ChrisCummins force-pushed the feature/make-env-from-clang branch from 9318df7 to 8573d81 Compare May 12, 2022 16:42

[llvm] Improve documentation.

72ef9b1

ChrisCummins merged commit 38f2076 into facebookresearch:development May 20, 2022

CompilerGym roadmap automation moved this from In progress to Done (not yet shipped) May 20, 2022

ChrisCummins deleted the feature/make-env-from-clang branch May 20, 2022 01:31

ChrisCummins mentioned this pull request May 24, 2022

CompilerGym v0.2.4 #686

Merged

ChrisCummins mentioned this pull request May 25, 2022

CompilerGym v0.2.4 #689

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[llvm] Add API to construct a benchmark from clang invocation. #577

[llvm] Add API to construct a benchmark from clang invocation. #577

ChrisCummins commented Feb 17, 2022 •

edited

Loading

codecov-commenter commented Feb 17, 2022 •

edited

Loading

mostafaelhoushi left a comment

mostafaelhoushi Feb 17, 2022

mostafaelhoushi Feb 17, 2022

ChrisCummins Feb 17, 2022

ChrisCummins Feb 17, 2022

mostafaelhoushi Feb 17, 2022

ChrisCummins Feb 17, 2022

ChrisCummins Feb 17, 2022

hughleat left a comment

ChrisCummins commented Apr 27, 2022

[llvm] Add API to construct a benchmark from clang invocation. #577

[llvm] Add API to construct a benchmark from clang invocation. #577

Conversation

ChrisCummins commented Feb 17, 2022 • edited Loading

codecov-commenter commented Feb 17, 2022 • edited Loading

Codecov Report

mostafaelhoushi left a comment

Choose a reason for hiding this comment

mostafaelhoushi Feb 17, 2022

Choose a reason for hiding this comment

mostafaelhoushi Feb 17, 2022

Choose a reason for hiding this comment

ChrisCummins Feb 17, 2022

Choose a reason for hiding this comment

ChrisCummins Feb 17, 2022

Choose a reason for hiding this comment

mostafaelhoushi Feb 17, 2022

Choose a reason for hiding this comment

ChrisCummins Feb 17, 2022

Choose a reason for hiding this comment

ChrisCummins Feb 17, 2022

Choose a reason for hiding this comment

hughleat left a comment

Choose a reason for hiding this comment

ChrisCummins commented Apr 27, 2022

ChrisCummins commented Feb 17, 2022 •

edited

Loading

codecov-commenter commented Feb 17, 2022 •

edited

Loading