Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate TVM playing nice with Docker #163

Closed
ninehusky opened this issue Jan 24, 2022 · 2 comments · Fixed by #164
Closed

Investigate TVM playing nice with Docker #163

ninehusky opened this issue Jan 24, 2022 · 2 comments · Fixed by #164
Labels
continuous integration Issues of Continuous Integration (aka Github Actions)

Comments

@ninehusky
Copy link
Collaborator

ninehusky commented Jan 24, 2022

docker build --tag glenside .
docker run glenside cargo test --no-default-features --features tvm

On a clean copy of the repository, running the commands above works on the first iteration.

However, subsequent runs of the test suite produce the following output:

failures:

---- codegen::tests::relay_op_softmax stdout ----
thread 'codegen::tests::relay_op_softmax' panicked at 'Running Relay code failed with code Some(1).
stdout:

stderr:
Traceback (most recent call last):
  File "/root/glenside/src/language/from_relay/run_relay.py", line 42, in <module>
    output = relay.create_executor(mod=expr, kind="graph").evaluate()(*inputs)
  File "/root/tvm/python/tvm/relay/backend/interpreter.py", line 172, in evaluate
    return self._make_executor()
  File "/root/tvm/python/tvm/relay/build_module.py", line 395, in _make_executor
    mod = build(self.mod, target=self.target)
  File "/root/tvm/python/tvm/relay/build_module.py", line 277, in build
    tophub_context = autotvm.tophub.context(list(target.values()))
  File "/root/tvm/python/tvm/autotvm/tophub.py", line 116, in context
    if not check_backend(tophub_location, name):
  File "/root/tvm/python/tvm/autotvm/tophub.py", line 158, in check_backend
    download_package(tophub_location, package_name)
  File "/root/tvm/python/tvm/autotvm/tophub.py", line 184, in download_package
    os.mkdir(path)
FileExistsError: [Errno 17] File exists: '/root/.tvm'
', src/codegen.rs:1836:9

---- codegen::tests::relay_op_relu stdout ----
thread 'codegen::tests::relay_op_relu' panicked at 'Running Relay code failed with code Some(1).
stdout:

stderr:
Traceback (most recent call last):
  File "/root/glenside/src/language/from_relay/run_relay.py", line 42, in <module>
    output = relay.create_executor(mod=expr, kind="graph").evaluate()(*inputs)
  File "/root/tvm/python/tvm/relay/backend/interpreter.py", line 172, in evaluate
    return self._make_executor()
  File "/root/tvm/python/tvm/relay/build_module.py", line 395, in _make_executor
    mod = build(self.mod, target=self.target)
  File "/root/tvm/python/tvm/relay/build_module.py", line 277, in build
    tophub_context = autotvm.tophub.context(list(target.values()))
  File "/root/tvm/python/tvm/autotvm/tophub.py", line 116, in context
    if not check_backend(tophub_location, name):
  File "/root/tvm/python/tvm/autotvm/tophub.py", line 158, in check_backend
    download_package(tophub_location, package_name)
  File "/root/tvm/python/tvm/autotvm/tophub.py", line 184, in download_package
    os.mkdir(path)
FileExistsError: [Errno 17] File exists: '/root/.tvm/tophub'
', src/codegen.rs:1836:9

---- codegen::tests::relay_op_batchflatten stdout ----
thread 'codegen::tests::relay_op_batchflatten' panicked at 'Running Relay code failed with code Some(1).
stdout:

stderr:
Traceback (most recent call last):
  File "/root/glenside/src/language/from_relay/run_relay.py", line 42, in <module>
    output = relay.create_executor(mod=expr, kind="graph").evaluate()(*inputs)
  File "/root/tvm/python/tvm/relay/backend/interpreter.py", line 172, in evaluate
    return self._make_executor()
  File "/root/tvm/python/tvm/relay/build_module.py", line 395, in _make_executor
    mod = build(self.mod, target=self.target)
  File "/root/tvm/python/tvm/relay/build_module.py", line 277, in build
    tophub_context = autotvm.tophub.context(list(target.values()))
  File "/root/tvm/python/tvm/autotvm/tophub.py", line 116, in context
    if not check_backend(tophub_location, name):
  File "/root/tvm/python/tvm/autotvm/tophub.py", line 158, in check_backend
    download_package(tophub_location, package_name)
  File "/root/tvm/python/tvm/autotvm/tophub.py", line 184, in download_package
    os.mkdir(path)
FileExistsError: [Errno 17] File exists: '/root/.tvm/tophub'
', src/codegen.rs:1836:9


failures:
    codegen::tests::relay_op_batchflatten
    codegen::tests::relay_op_relu
    codegen::tests::relay_op_softmax

test result: FAILED. 300 passed; 3 failed; 8 ignored; 0 measured; 0 filtered out; finished in 33.86s

Sometimes, clearing the Docker cache and rebuilding the image can fix this issue, but it doesn't always fix it for some reason.

We should look into this!

@ninehusky ninehusky added the continuous integration Issues of Continuous Integration (aka Github Actions) label Jan 24, 2022
@gussmith23
Copy link
Owner

@ninehusky can you see what happens when you run the tests on a single thread? See: https://doc.rust-lang.org/book/ch11-02-running-tests.html#running-tests-in-parallel-or-consecutively

I suspect what is happening is this:
cargo test runs tests in parallel. Multiple tests which use TVM get started at the same time. When TVM gets used for the first time by these tests, it does some kind of initialization in which it initializes the /root/.tvm/tophub directory. So when multiple tests trigger this initialization in parallel, there's a race condition to see which thread creates the directory first.

If that's the case, we'll probably need to find a way to trigger that setup before running the tests.

@gussmith23
Copy link
Owner

Oh, lol, this has already been fixed:
apache/tvm@bf20107

I was looking in the tophub.py file from which the error is triggered. It seemed like the error had been anticipated and fixed, though, so I checked the git blame and found the above PR, in which someone fixed the issue.

So to fix this issue we should just need to update TVM. This may be an easy fix; I'll give it a go right now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
continuous integration Issues of Continuous Integration (aka Github Actions)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants