New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is running CompilerGym intended to leave cache directories behind? #582
Comments
Hi @vuoristo, Thanks for the report, google question!
This sounds like something is wrong. There are three locations where CompilerGym creates files at runtime. The first is
Then there is an on-disk cache, located at
Finally, there is a "transient cache". On your linux machine this will default to
My guess is that in your case, environments are not shutting down correctly, leaving behind files in the transient cache. Can you confirm what directory(s) are filling up? BTW, the locations of these directories can be overriden by environment variables. See source. Cheers, |
Hi Chris, thanks for the quick response! I'm seeing many lingering directories specifically in the $ find $COMPILER_GYM_CACHE -print
/home/risrio/cgym_test_cache2
/home/risrio/cgym_test_cache2/benchmark-scratch-4b26-fc6d
/home/risrio/cgym_test_cache2/benchmark-scratch-f112-9fd6
/home/risrio/cgym_test_cache2/benchmark-scratch-795a-911f
/home/risrio/cgym_test_cache2/benchmark-scratch-3046-29da
/home/risrio/cgym_test_cache2/benchmark-scratch-4d02-023a
/home/risrio/cgym_test_cache2/benchmark-scratch-7f82-da00
/home/risrio/cgym_test_cache2/benchmark-scratch-39e7-584e
/home/risrio/cgym_test_cache2/benchmark-scratch-b305-9218
/home/risrio/cgym_test_cache2/benchmark-scratch-9347-2109
/home/risrio/cgym_test_cache2/benchmark-scratch-6136-80fc
/home/risrio/cgym_test_cache2/benchmark-scratch-48c0-c0b4
/home/risrio/cgym_test_cache2/benchmark-scratch-825f-54f3
/home/risrio/cgym_test_cache2/benchmark-scratch-062c-0933
/home/risrio/cgym_test_cache2/benchmark-scratch-81d0-9408
... The number of items in As far as I can tell my script exited cleanly. |
Thanks for the extra details.
Yep, your code looks good. The problem is in CompilerGym, specifically, here. Either that code is not being called, or it is failing. Given that I can't repro the problem locally, it might take a bit more work to track down the cause. One thing that would be useful to me is to know if your environments are crashing. If an environment crashes, the obs, reward, done, info = env.step(...)
if done:
print("Error details:", info.get("error_details"))
env.reset() Cheers, |
Right, some problem with I ran env = make_env(env_config)
env.reset()
for i in range(100):
for j in range(45):
obs, reward, done, info = env.step(env.action_space.sample())
if done:
print("Error details:", info.get("error_details"))
env.reset() With |
Thanks so much for following up @vuoristo. One last thing, could you please prepend the following to your script above and attaching the full output that running it generates? import logging
import os
os.environ["COMPILER_GYM_DEBUG"] = "4"
logging.basicConfig(level=logging.DEBUG) Cheers, |
Of course. Here's the output for otherwise the same script, except it's running for only 40 steps total and the timelimit is set to 10. The same problem persists.
|
Thanks! I see that you're different benchmarks, what does your Here's my repro script: import logging
import os
import compiler_gym
os.environ["COMPILER_GYM_DEBUG"] = "4"
logging.basicConfig(level=logging.DEBUG)
env = compiler_gym.make("llvm-v0")
env.reset()
for i in range(10):
for j in range(10):
obs, reward, done, info = env.step(env.action_space.sample())
if done:
print("Error details:", info.get("error_details"))
env.reset() Which produces:
Cheers, |
Here's the whole script import compiler_gym
import compiler_gym.wrappers
import logging
import os
os.environ["COMPILER_GYM_DEBUG"] = "4"
logging.basicConfig(level=logging.DEBUG)
def make_env(env_config):
env = compiler_gym.make(env_config['cgym_id'])
env = compiler_gym.wrappers.TimeLimit(env, env_config['timelimit'])
dataset = env.datasets[env_config['dataset']]
env = compiler_gym.wrappers.CycleOverBenchmarks(
env, dataset.benchmarks())
return env
env_config = {
"cgym_id": "llvm-autophase-ic-v0",
"timelimit": 10,
"dataset": "benchmark://cbench-v1",
}
env = make_env(env_config)
env.reset()
for j in range(40):
obs, reward, done, info = env.step(env.action_space.sample())
if done:
print("Error details:", info.get("error_details"))
env.reset() |
I also ran your script and on a quick inspection it looks to be producing similar results. Hmm, it also looks like your script hangs on my machine after the last line of output. My script doesn't.
|
This is because you are not closing env = make_env(env_config)
try:
# use env...
finally:
env.close() or you can use a import compiler_gym
import compiler_gym.wrappers
import logging
import os
os.environ["COMPILER_GYM_DEBUG"] = "4"
logging.basicConfig(level=logging.DEBUG)
def make_env(env_config):
env = compiler_gym.make(env_config['cgym_id'])
env = compiler_gym.wrappers.TimeLimit(env, env_config['timelimit'])
dataset = env.datasets[env_config['dataset']]
env = compiler_gym.wrappers.CycleOverBenchmarks(
env, dataset.benchmarks())
return env
env_config = {
"cgym_id": "llvm-autophase-ic-v0",
"timelimit": 10,
"dataset": "benchmark://cbench-v1",
}
with make_env(env_config) as env:
env.reset()
for j in range(40):
obs, reward, done, info = env.step(env.action_space.sample())
if done:
print("Error details:", info.get("error_details"))
env.reset() Cheers, |
Ah gotcha, I missed that. Running your latest script on my machine still leaves cache directories behind though. So calling |
I think I have tracked down the problem! 🙂 The LLVM backend has a singleton BenchmarkFactory that is an in-memory cache of parsed bitcodes. It is this the All that is to say, you can safely periodically delete those cache directories. I will work on a patch to fix this, though longterm there is a larger refactor that I could do to the runtime to make sure that these kinds of resource leaks don't happen in other environments. Cheers, |
This changes the signature of createAndRunCompilerGymService() from [[noreturn]] void to [[nodiscard]] int. This removes the calls to exit() from inside the body of the function, allowing runtime services to insert cleanup code after the service has shut down.s Issue facebookresearch#582.
This adds an explicit call to BenchmarkFactory::close() on the global singleton, since otherwise it may not be called. Fixes facebookresearch#582. Issue facebookresearch#591.
I've written a patch for this in #592. If you're building from source, you can try that branch out. Otherwise it'll be merged in the 0.2.3 release. Cheers, |
This changes the signature of createAndRunCompilerGymService() from [[noreturn]] void to [[nodiscard]] int. This removes the calls to exit() from inside the body of the function, allowing runtime services to insert cleanup code after the service has shut down.s Issue facebookresearch#582.
This adds an explicit call to BenchmarkFactory::close() on the global singleton, since otherwise it may not be called. Fixes facebookresearch#582. Issue facebookresearch#591.
That makes sense. Do you also know why this seems to differ between the systems? Thanks for the help! Feel free to close this issue. |
That's a good question. To be honest, I don't know. In the logs you posted, it looks like the shutdown routine is running on your system, which leads me to believe it's likely a race condition. This would also explain the randomness. Cheers, |
Thanks for reporting the issue @vuoristo, it was a fun bug to hunt 🙂 Cheers, |
This changes the signature of createAndRunCompilerGymService() from [[noreturn]] void to [[nodiscard]] int. This removes the calls to exit() from inside the body of the function, allowing runtime services to insert cleanup code after the service has shut down.s Issue facebookresearch#582.
This adds an explicit call to BenchmarkFactory::close() on the global singleton, since otherwise it may not be called. Fixes facebookresearch#582. Issue facebookresearch#591.
❓ Questions and Help
Not sure if this is a bug or not, so submitting as a question. Running a CompilerGym experiment leaves behind many cache directories. When running a large experiment, this can create problems through the sheer number of directories in
COMPILER_GYM_CACHE
. I expected theCOMPILER_GYM_CACHE
to not have anything after the experiment exited cleanly.Is there a way to avoid the experiments leaving the directories behind?
Steps to reproduce
Running the following on my machine leaves behind about 270 cache directories.
Environment
Please fill in this checklist:
The text was updated successfully, but these errors were encountered: