Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CircleCI #267

Merged
merged 17 commits into from
Mar 17, 2021
Merged

CircleCI #267

merged 17 commits into from
Mar 17, 2021

Conversation

erikrose
Copy link
Contributor

@erikrose erikrose commented Jan 6, 2021

Get CircleCI builds going, now that Travis CI free service is going away.

`ImportError: libtinfo.so.5: cannot open shared object file: No such file or directory`
… if this helps.

`Error: The geckodriver executable could not be found on the current PATH.`
Maybe it needs Java around or something that I'm just taking for granted on my box. Also go back to the -browser image because that's in the example on the browser-tools page.
It won't run without a version number.
It was wrong in the docs.
@erikrose
Copy link
Contributor Author

JS tests fail:

61 passing (504ms)
  3 failing

  1) isVisible
       should return false when an element is hidden:
     Uncaught Error: Server terminated early with status 1
      at /home/circleci/project/fathom/node_modules/selenium-webdriver/remote/index.js:251:52
      at processTicksAndRejections (node:internal/process/task_queues:93:5)

  2) isVisible
       should return true when an element is visible:
     Uncaught Error: Server terminated early with status 1
      at /home/circleci/project/fathom/node_modules/selenium-webdriver/remote/index.js:251:52
      at processTicksAndRejections (node:internal/process/task_queues:93:5)

  3) isVisible
       "after all" hook for "should return true when an element is visible":
     Uncaught Error: Server terminated early with status 1
      at /home/circleci/project/fathom/node_modules/selenium-webdriver/remote/index.js:251:52
      at processTicksAndRejections (node:internal/process/task_queues:93:5)

It's not http_server.js. That runs and serves just fine.
It fails in isVisible.js at the .build() call, with "Server terminated early". You don't get that far if you don't make sure geckodriver is on the PATH.

@erikrose
Copy link
Contributor Author

Python tests fail:

fathom_web/test/test_extract.py .......                                  [ 21%]
fathom_web/test/test_label.py .......                                    [ 43%]
fathom_web/test/test_list.py .....                                       [ 59%]
fathom_web/test/test_pick.py ..                                          [ 65%]
fathom_web/test/test_test.py ........                                    [ 90%]
fathom_web/test/test_train.py .Fatal Python error: Aborted

Thread 0x00007f860f91f700 (most recent call first):
  File "/home/circleci/.pyenv/versions/3.7.9/lib/python3.7/threading.py", line 300 in wait
  File "/home/circleci/.pyenv/versions/3.7.9/lib/python3.7/queue.py", line 179 in get
  File "/home/circleci/project/cli/venv/lib/python3.7/site-packages/tensorboardX/event_file_writer.py", line 200 in run
  File "/home/circleci/.pyenv/versions/3.7.9/lib/python3.7/threading.py", line 926 in _bootstrap_inner
  File "/home/circleci/.pyenv/versions/3.7.9/lib/python3.7/threading.py", line 890 in _bootstrap

Current thread 0x00007f86a179d080 (most recent call first):
  File "/home/circleci/project/cli/venv/lib/python3.7/site-packages/torch/autograd/__init__.py", line 132 in backward
  File "/home/circleci/project/cli/venv/lib/python3.7/site-packages/torch/tensor.py", line 221 in backward
  File "/home/circleci/project/cli/fathom_web/commands/train.py", line 52 in learn
  File "/home/circleci/project/cli/fathom_web/commands/train.py", line 266 in train
  File "/home/circleci/project/cli/venv/lib/python3.7/site-packages/click/core.py", line 610 in invoke
  File "/home/circleci/project/cli/venv/lib/python3.7/site-packages/click/core.py", line 1066 in invoke
  File "/home/circleci/project/cli/venv/lib/python3.7/site-packages/click/core.py", line 782 in main
  File "/home/circleci/project/cli/venv/lib/python3.7/site-packages/click/testing.py", line 329 in invoke
  File "/home/circleci/project/cli/fathom_web/test/test_train.py", line 30 in test_auto_vectorization_smoke
  File "/home/circleci/project/cli/venv/lib/python3.7/site-packages/_pytest/python.py", line 170 in pytest_pyfunc_call
  File "/home/circleci/project/cli/venv/lib/python3.7/site-packages/pluggy/callers.py", line 187 in _multicall
  File "/home/circleci/project/cli/venv/lib/python3.7/site-packages/pluggy/manager.py", line 87 in <lambda>
  File "/home/circleci/project/cli/venv/lib/python3.7/site-packages/pluggy/manager.py", line 93 in _hookexec
  File "/home/circleci/project/cli/venv/lib/python3.7/site-packages/pluggy/hooks.py", line 286 in __call__
  File "/home/circleci/project/cli/venv/lib/python3.7/site-packages/_pytest/python.py", line 1423 in runtest
  File "/home/circleci/project/cli/venv/lib/python3.7/site-packages/_pytest/runner.py", line 117 in pytest_runtest_call
  File "/home/circleci/project/cli/venv/lib/python3.7/site-packages/pluggy/callers.py", line 187 in _multicall
  File "/home/circleci/project/cli/venv/lib/python3.7/site-packages/pluggy/manager.py", line 87 in <lambda>
  File "/home/circleci/project/cli/venv/lib/python3.7/site-packages/pluggy/manager.py", line 93 in _hookexec
  File "/home/circleci/project/cli/venv/lib/python3.7/site-packages/pluggy/hooks.py", line 286 in __call__
  File "/home/circleci/project/cli/venv/lib/python3.7/site-packages/_pytest/runner.py", line 192 in <lambda>
  File "/home/circleci/project/cli/venv/lib/python3.7/site-packages/_pytest/runner.py", line 220 in from_call
  File "/home/circleci/project/cli/venv/lib/python3.7/site-packages/_pytest/runner.py", line 192 in call_runtest_hook
  File "/home/circleci/project/cli/venv/lib/python3.7/site-packages/_pytest/runner.py", line 167 in call_and_report
  File "/home/circleci/project/cli/venv/lib/python3.7/site-packages/_pytest/runner.py", line 87 in runtestprotocol
  File "/home/circleci/project/cli/venv/lib/python3.7/site-packages/_pytest/runner.py", line 72 in pytest_runtest_protocol
  File "/home/circleci/project/cli/venv/lib/python3.7/site-packages/pluggy/callers.py", line 187 in _multicall
  File "/home/circleci/project/cli/venv/lib/python3.7/site-packages/pluggy/manager.py", line 87 in <lambda>
  File "/home/circleci/project/cli/venv/lib/python3.7/site-packages/pluggy/manager.py", line 93 in _hookexec
  File "/home/circleci/project/cli/venv/lib/python3.7/site-packages/pluggy/hooks.py", line 286 in __call__
  File "/home/circleci/project/cli/venv/lib/python3.7/site-packages/_pytest/main.py", line 256 in pytest_runtestloop
  File "/home/circleci/project/cli/venv/lib/python3.7/site-packages/pluggy/callers.py", line 187 in _multicall
  File "/home/circleci/project/cli/venv/lib/python3.7/site-packages/pluggy/manager.py", line 87 in <lambda>
  File "/home/circleci/project/cli/venv/lib/python3.7/site-packages/pluggy/manager.py", line 93 in _hookexec
  File "/home/circleci/project/cli/venv/lib/python3.7/site-packages/pluggy/hooks.py", line 286 in __call__
  File "/home/circleci/project/cli/venv/lib/python3.7/site-packages/_pytest/main.py", line 235 in _main
  File "/home/circleci/project/cli/venv/lib/python3.7/site-packages/_pytest/main.py", line 191 in wrap_session
  File "/home/circleci/project/cli/venv/lib/python3.7/site-packages/_pytest/main.py", line 228 in pytest_cmdline_main
  File "/home/circleci/project/cli/venv/lib/python3.7/site-packages/pluggy/callers.py", line 187 in _multicall
  File "/home/circleci/project/cli/venv/lib/python3.7/site-packages/pluggy/manager.py", line 87 in <lambda>
  File "/home/circleci/project/cli/venv/lib/python3.7/site-packages/pluggy/manager.py", line 93 in _hookexec
  File "/home/circleci/project/cli/venv/lib/python3.7/site-packages/pluggy/hooks.py", line 286 in __call__
  File "/home/circleci/project/cli/venv/lib/python3.7/site-packages/_pytest/config/__init__.py", line 78 in main
  File "./venv/bin/pytest", line 8 in <module>
Aborted (core dumped)

Pinning all the Python deps just like they are on my box (where tests pass) didn't help. I wonder if the vectorizer is spitting out nothing and making the autograd stuff divide by 0 or something. No, both training vectorizations yield the right contents! Args to learn() are identical.

The crash (which you can see the C frames of under ssh but sometimes not on the web) is hipErrorNoDevice, thrown indirectly by libtorch_cpu.so, which would seem to be the appropriate lib to call. (HIP is AMD's intermediate lang which compiles down to NVidia or AMD GPU code.) More directly, it's thrown by https://github.com/pytorch/pytorch/blob/22a34bcf4e5eaa348f0117c414c3dd760ec64b13/aten/src/ATen/hip/impl/HIPGuardImplMasqueradingAsCUDA.h#L98.

@erikrose
Copy link
Contributor Author

erikrose commented Mar 4, 2021

Glenda figured out that it's pytorch/pytorch#52571 that's causing the Python crashes. Just change https://download.pytorch.org/whl/torch_stable.html to https://download.pytorch.org/whl/cu110/torch_stable.html.

We were getting the ROCm versions, as per pytorch/pytorch#52571.

Remove the repeat of the `dependency_links` info in the Makefile.
We can fix it while enjoying that the rest of CI is passing.
I set the COVERALLS_REPO_TOKEN in the server-side CircleCI config. I also updated the package for good measure, but that didn't improve things on its own.
@erikrose erikrose force-pushed the circleci branch 3 times, most recently from e740e40 to d12a7dd Compare March 17, 2021 00:08
@erikrose erikrose closed this in 0f02d75 Mar 17, 2021
@erikrose erikrose merged commit 0f02d75 into mozilla:master Mar 17, 2021
@erikrose
Copy link
Contributor Author

@gleonard-m and @motin deserve the credit for getting to the bottom of some very obscure failures that got this finally running. Geckodriver/Firefox version interactions, GPU chipset versions, pip behavior changes—yikes. Bravo to you both!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant