Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support python3.10 #1137

Merged
merged 19 commits into from
Aug 7, 2022
Merged

Support python3.10 #1137

merged 19 commits into from
Aug 7, 2022

Conversation

kousu
Copy link
Contributor

@kousu kousu commented May 30, 2022

Checklist

GitHub

  • I've given this PR a concise, self-descriptive, and meaningful title
  • I've linked relevant issues in the PR body
  • I've applied the relevant labels to this PR
  • I've assigned a reviewer

PR contents

Description

Linked issues

@kousu kousu added enhancement category: improves performance/results of an existing feature dependencies Pull requests that update a dependency file installation category: installation-related stuff priority:medium labels May 30, 2022
@kousu kousu mentioned this pull request May 30, 2022
requirements_common.txt Outdated Show resolved Hide resolved
* bumps torch to 1.10
* bumps torchvision to latest
* uses onnxruntime 1.12 on py 3.10 (currently in prerelease:
  microsoft/onnxruntime#9782)
  which isn't ideal, so probably this isn't mergeable until
  https://github.com/microsoft/onnxruntime/projects/9 is done.
kousu added 2 commits June 6, 2022 14:57
torch doesn't support python 3.10 in older versions.
@kousu kousu marked this pull request as ready for review June 6, 2022 18:58
@coveralls
Copy link

coveralls commented Jun 19, 2022

Pull Request Test Coverage Report for Build 2813198870

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+9.8%) to 78.991%

Totals Coverage Status
Change from base Build 2813174901: 9.8%
Covered Lines: 4839
Relevant Lines: 6126

💛 - Coveralls

@konantian
Copy link
Collaborator

I removed the specific version requirement for ort-nightly for now and all checks passed after this change. Once onnxruntime 1.12 is released in mid-July, we just need to replace ort-nightly with onnxruntime==1.12.0 and python3.10 support should be good at that time.

@kousu
Copy link
Contributor Author

kousu commented Jun 20, 2022

I removed the specific version requirement for ort-nightly for now and all checks passed after this change. Once onnxruntime 1.12 is released in mid-July, we just need to replace ort-nightly with onnxruntime==1.12.0 and python3.10 support should be good at that time.

🍳 eggggggs-cellent. That's good news. Thanks for doing the leg-work @konantian :)

Now we just have to wait.

@dyt811 dyt811 added the on hold Issues/PRs that are on hold because of issues (please explain in the comment) label Jun 28, 2022
@dyt811 dyt811 marked this pull request as draft June 28, 2022 17:45
@kousu kousu removed the on hold Issues/PRs that are on hold because of issues (please explain in the comment) label Jul 23, 2022
@kousu kousu marked this pull request as ready for review July 23, 2022 23:21
This is the first onnxruntime to support Python 3.10.
@kousu
Copy link
Contributor Author

kousu commented Jul 26, 2022

I've updated this given that the latest onnxruntime is out with py310 support. This is ready for review! 🤞

@shuaisong9
Copy link

I installed ivadomed with conda (python=3.10) on NeuroPoly's romane, and 4 tests failed.
python --version: Python 3.10.5
Log for conda list: conda_list.txt
Log for pip list: pip_list.txt

The short test summary info is:

=========== short test summary info ===========
FAILED testing/functional_tests/test_automate_training.py::test_automate_training[subprocess] - assert False
FAILED testing/functional_tests/test_automate_training.py::test_automate_training_run_test_debug - RuntimeError: CUDA error: invalid device ordinal
FAILED testing/functional_tests/test_automate_training.py::test_automate_training_run_test[subprocess] - assert False
FAILED testing/unit_tests/test_utils.py::test_get_linux_system_memory - assert 503.56872940063477 < 256
=========== 4 failed, 212 passed, 2 skipped, 1 warning in 151.92s (0:02:31) ===========

I then ran the individual tests that failed:

  • pytest -v -s testing/functional_tests/test_automate_training.py::test_automate_training_run_test[subprocess]
    Log here: fail1.txt
  • pytest -v -s testing/functional_tests/test_automate_training.py::test_automate_training_run_test_debug
    Log here: fail2.txt
  • pytest -v -s testing/functional_tests/test_automate_training.py::test_automate_training_run_test[subprocess]
    Log here: fail3.txt
  • pytest -v -s testing/unit_tests/test_utils.py::test_get_linux_system_memory
    Log here: fail4.txt

@kousu
Copy link
Contributor Author

kousu commented Jul 28, 2022

Thank you for the very nice bug report @shuaisong9! It's weird that it failed there but not on GitHub. I'll have to look into it.

Though then again I'm not surprised, it's something to do with the GPUs, which we don't test regularly (because GitHub doesn't offer them):

RuntimeError: CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

@kousu
Copy link
Contributor Author

kousu commented Jul 30, 2022

I tried the same test on Python 3.9 with master:

(ivadomed) p115628@bireli:~/src/ivadomed$ git branch
* master
(ivadomed) p115628@bireli:~/src/ivadomed$ git show-ref HEAD
f0511a384ea1ff584c8ca136557655c2b0230ca8 refs/remotes/origin/HEAD
versions
(ivadomed) p115628@bireli:~/src/ivadomed$ python --version
Python 3.9.13
(ivadomed) p115628@bireli:~/src/ivadomed$ conda list
# packages in environment at /home/GRAMES.POLYMTL.CA/p115628/.conda/envs/ivadomed:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       2_gnu    conda-forge
absl-py                   1.2.0                    pypi_0    pypi
alabaster                 0.7.12                   pypi_0    pypi
apeye                     1.2.0                    pypi_0    pypi
astor                     0.8.1                    pypi_0    pypi
attrs                     22.1.0                   pypi_0    pypi
autodocsumm               0.2.8                    pypi_0    pypi
babel                     2.10.3                   pypi_0    pypi
beautifulsoup4            4.11.1                   pypi_0    pypi
bids-validator            1.9.7                    pypi_0    pypi
bzip2                     1.0.8                h7f98852_4    conda-forge
ca-certificates           2022.6.15            ha878542_0    conda-forge
cachecontrol              0.12.11                  pypi_0    pypi
cachetools                5.2.0                    pypi_0    pypi
certifi                   2022.6.15                pypi_0    pypi
cfgv                      3.3.1                    pypi_0    pypi
charset-normalizer        2.1.0                    pypi_0    pypi
click                     8.1.3                    pypi_0    pypi
consolekit                1.4.1                    pypi_0    pypi
coverage                  6.4.2                    pypi_0    pypi
coveralls                 3.3.1                    pypi_0    pypi
cssutils                  2.5.1                    pypi_0    pypi
csv-diff                  1.1                      pypi_0    pypi
cycler                    0.11.0                   pypi_0    pypi
decopatch                 1.4.10                   pypi_0    pypi
deprecated                1.2.13                   pypi_0    pypi
deprecation               2.1.0                    pypi_0    pypi
deprecation-alias         0.3.1                    pypi_0    pypi
dict2css                  0.3.0                    pypi_0    pypi
dictdiffer                0.9.0                    pypi_0    pypi
distlib                   0.3.5                    pypi_0    pypi
docker-pycreds            0.4.0                    pypi_0    pypi
docopt                    0.6.2                    pypi_0    pypi
docutils                  0.16                     pypi_0    pypi
domdf-python-tools        3.3.0                    pypi_0    pypi
filelock                  3.7.1                    pypi_0    pypi
flake8                    4.0.1                    pypi_0    pypi
fonttools                 4.34.4                   pypi_0    pypi
formulaic                 0.3.4                    pypi_0    pypi
gitdb                     4.0.9                    pypi_0    pypi
gitpython                 3.1.27                   pypi_0    pypi
google-auth               2.9.1                    pypi_0    pypi
google-auth-oauthlib      0.4.6                    pypi_0    pypi
grpcio                    1.48.0                   pypi_0    pypi
html5lib                  1.1                      pypi_0    pypi
humanize                  4.2.3                    pypi_0    pypi
identify                  2.5.2                    pypi_0    pypi
idna                      3.3                      pypi_0    pypi
imageio                   2.20.0                   pypi_0    pypi
imagesize                 1.4.1                    pypi_0    pypi
importlib-metadata        4.12.0                   pypi_0    pypi
iniconfig                 1.1.1                    pypi_0    pypi
interface-meta            1.3.0                    pypi_0    pypi
ivadomed                  2.9.6                     dev_0    <develop>
jinja2                    3.1.2                    pypi_0    pypi
joblib                    1.1.0                    pypi_0    pypi
jsonpointer               2.3                      pypi_0    pypi
kiwisolver                1.4.4                    pypi_0    pypi
ld_impl_linux-64          2.36.1               hea4e1c9_2    conda-forge
libffi                    3.4.2                h7f98852_5    conda-forge
libgcc-ng                 12.1.0              h8d9b700_16    conda-forge
libgomp                   12.1.0              h8d9b700_16    conda-forge
libnsl                    2.0.0                h7f98852_0    conda-forge
libuuid                   2.32.1            h7f98852_1000    conda-forge
libzlib                   1.2.12               h166bdaf_2    conda-forge
lockfile                  0.12.2                   pypi_0    pypi
loguru                    0.6.0                    pypi_0    pypi
makefun                   1.14.0                   pypi_0    pypi
markdown                  3.4.1                    pypi_0    pypi
markupsafe                2.1.1                    pypi_0    pypi
matplotlib                3.5.2                    pypi_0    pypi
mccabe                    0.6.1                    pypi_0    pypi
mistletoe                 0.8.2                    pypi_0    pypi
msgpack                   1.0.4                    pypi_0    pypi
natsort                   8.1.0                    pypi_0    pypi
ncurses                   6.3                  h27087fc_1    conda-forge
networkx                  2.8.5                    pypi_0    pypi
nibabel                   3.2.2                    pypi_0    pypi
nodeenv                   1.7.0                    pypi_0    pypi
num2words                 0.5.10                   pypi_0    pypi
numpy                     1.23.1                   pypi_0    pypi
oauthlib                  3.2.0                    pypi_0    pypi
onnxruntime               1.7.0                    pypi_0    pypi
openssl                   3.0.5                h166bdaf_0    conda-forge
packaging                 21.3                     pypi_0    pypi
pandas                    1.4.3                    pypi_0    pypi
pathtools                 0.1.2                    pypi_0    pypi
pillow                    9.2.0                    pypi_0    pypi
pip                       22.2.1             pyhd8ed1ab_0    conda-forge
platformdirs              2.5.2                    pypi_0    pypi
pluggy                    1.0.0                    pypi_0    pypi
pre-commit                2.20.0                   pypi_0    pypi
promise                   2.3                      pypi_0    pypi
protobuf                  3.19.4                   pypi_0    pypi
psutil                    5.9.1                    pypi_0    pypi
py                        1.11.0                   pypi_0    pypi
pyasn1                    0.4.8                    pypi_0    pypi
pyasn1-modules            0.2.8                    pypi_0    pypi
pybids                    0.15.2                   pypi_0    pypi
pycodestyle               2.8.0                    pypi_0    pypi
pyflakes                  2.4.0                    pypi_0    pypi
pygments                  2.12.0                   pypi_0    pypi
pypandoc                  1.8.1                    pypi_0    pypi
pyparsing                 2.4.7                    pypi_0    pypi
pytest                    6.2.5                    pypi_0    pypi
pytest-cases              3.6.13                   pypi_0    pypi
pytest-console-scripts    1.3.1                    pypi_0    pypi
pytest-cov                3.0.0                    pypi_0    pypi
pytest-ordering           0.6                      pypi_0    pypi
python                    3.9.13          h2660328_0_cpython    conda-forge
python-dateutil           2.8.2                    pypi_0    pypi
python_abi                3.9                      2_cp39    conda-forge
pytz                      2022.1                   pypi_0    pypi
pywavelets                1.3.0                    pypi_0    pypi
pyyaml                    6.0                      pypi_0    pypi
readline                  8.1.2                h0f457ee_0    conda-forge
requests                  2.28.1                   pypi_0    pypi
requests-oauthlib         1.3.1                    pypi_0    pypi
rsa                       4.9                      pypi_0    pypi
ruamel-yaml               0.17.21                  pypi_0    pypi
ruamel-yaml-clib          0.2.6                    pypi_0    pypi
scikit-image              0.19.3                   pypi_0    pypi
scikit-learn              1.1.1                    pypi_0    pypi
scipy                     1.9.0                    pypi_0    pypi
seaborn                   0.11.2                   pypi_0    pypi
sentry-sdk                1.9.0                    pypi_0    pypi
setproctitle              1.3.0                    pypi_0    pypi
setuptools                63.2.0           py39hf3d152e_0    conda-forge
shortuuid                 1.0.9                    pypi_0    pypi
simpleitk                 2.1.1.2                  pypi_0    pypi
six                       1.16.0                   pypi_0    pypi
smmap                     5.0.0                    pypi_0    pypi
snowballstemmer           2.2.0                    pypi_0    pypi
soupsieve                 2.3.2.post1              pypi_0    pypi
sphinx                    4.3.2                    pypi_0    pypi
sphinx-autodoc-typehints  1.11.1                   pypi_0    pypi
sphinx-jsonschema         1.19.1                   pypi_0    pypi
sphinx-prompt             1.5.0                    pypi_0    pypi
sphinx-rtd-theme          1.0.0                    pypi_0    pypi
sphinx-tabs               3.2.0                    pypi_0    pypi
sphinx-toolbox            2.15.2                   pypi_0    pypi
sphinxcontrib-applehelp   1.0.2                    pypi_0    pypi
sphinxcontrib-devhelp     1.0.2                    pypi_0    pypi
sphinxcontrib-htmlhelp    2.0.0                    pypi_0    pypi
sphinxcontrib-jsmath      1.0.1                    pypi_0    pypi
sphinxcontrib-qthelp      1.0.3                    pypi_0    pypi
sphinxcontrib-serializinghtml 1.1.5                    pypi_0    pypi
sqlalchemy                1.3.24                   pypi_0    pypi
sqlite                    3.39.2               h4ff8645_0    conda-forge
tabulate                  0.8.10                   pypi_0    pypi
tensorboard               2.9.1                    pypi_0    pypi
tensorboard-data-server   0.6.1                    pypi_0    pypi
tensorboard-plugin-wit    1.8.1                    pypi_0    pypi
threadpoolctl             3.1.0                    pypi_0    pypi
tifffile                  2022.7.28                pypi_0    pypi
tk                        8.6.12               h27826a3_0    conda-forge
toml                      0.10.2                   pypi_0    pypi
tomli                     2.0.1                    pypi_0    pypi
torch                     1.8.1                    pypi_0    pypi
torchio                   0.18.83                  pypi_0    pypi
torchvision               0.9.1                    pypi_0    pypi
tqdm                      4.64.0                   pypi_0    pypi
typing-extensions         4.3.0                    pypi_0    pypi
tzdata                    2022a                h191b570_0    conda-forge
urllib3                   1.26.11                  pypi_0    pypi
virtualenv                20.16.2                  pypi_0    pypi
wandb                     0.12.21                  pypi_0    pypi
webencodings              0.5.1                    pypi_0    pypi
werkzeug                  2.2.1                    pypi_0    pypi
wheel                     0.37.1             pyhd8ed1ab_0    conda-forge
wrapt                     1.14.1                   pypi_0    pypi
xz                        5.2.5                h516909a_1    conda-forge
zipp                      3.8.1                    pypi_0    pypi
zlib                      1.2.12               h166bdaf_2    conda-forge
(ivadomed) p115628@bireli:~/src/ivadomed$ pip list
Package                       Version     Editable project location
----------------------------- ----------- --------------------------------------------
absl-py                       1.2.0
alabaster                     0.7.12
apeye                         1.2.0
astor                         0.8.1
attrs                         22.1.0
autodocsumm                   0.2.8
Babel                         2.10.3
beautifulsoup4                4.11.1
bids-validator                1.9.7
CacheControl                  0.12.11
cachetools                    5.2.0
certifi                       2022.6.15
cfgv                          3.3.1
charset-normalizer            2.1.0
click                         8.1.3
consolekit                    1.4.1
coverage                      6.4.2
coveralls                     3.3.1
cssutils                      2.5.1
csv-diff                      1.1
cycler                        0.11.0
decopatch                     1.4.10
Deprecated                    1.2.13
deprecation                   2.1.0
deprecation-alias             0.3.1
dict2css                      0.3.0
dictdiffer                    0.9.0
distlib                       0.3.5
docker-pycreds                0.4.0
docopt                        0.6.2
docutils                      0.16
domdf-python-tools            3.3.0
filelock                      3.7.1
flake8                        4.0.1
fonttools                     4.34.4
formulaic                     0.3.4
gitdb                         4.0.9
GitPython                     3.1.27
google-auth                   2.9.1
google-auth-oauthlib          0.4.6
grpcio                        1.48.0
html5lib                      1.1
humanize                      4.2.3
identify                      2.5.2
idna                          3.3
imageio                       2.20.0
imagesize                     1.4.1
importlib-metadata            4.12.0
iniconfig                     1.1.1
interface-meta                1.3.0
ivadomed                      2.9.6       /home/GRAMES.POLYMTL.CA/p115628/src/ivadomed
Jinja2                        3.1.2
joblib                        1.1.0
jsonpointer                   2.3
kiwisolver                    1.4.4
lockfile                      0.12.2
loguru                        0.6.0
makefun                       1.14.0
Markdown                      3.4.1
MarkupSafe                    2.1.1
matplotlib                    3.5.2
mccabe                        0.6.1
mistletoe                     0.8.2
msgpack                       1.0.4
natsort                       8.1.0
networkx                      2.8.5
nibabel                       3.2.2
nodeenv                       1.7.0
num2words                     0.5.10
numpy                         1.23.1
oauthlib                      3.2.0
onnxruntime                   1.7.0
packaging                     21.3
pandas                        1.4.3
pathtools                     0.1.2
Pillow                        9.2.0
pip                           22.2.1
platformdirs                  2.5.2
pluggy                        1.0.0
pre-commit                    2.20.0
promise                       2.3
protobuf                      3.19.4
psutil                        5.9.1
py                            1.11.0
pyasn1                        0.4.8
pyasn1-modules                0.2.8
pybids                        0.15.2
pycodestyle                   2.8.0
pyflakes                      2.4.0
Pygments                      2.12.0
pypandoc                      1.8.1
pyparsing                     2.4.7
pytest                        6.2.5
pytest-cases                  3.6.13
pytest-console-scripts        1.3.1
pytest-cov                    3.0.0
pytest-ordering               0.6
python-dateutil               2.8.2
pytz                          2022.1
PyWavelets                    1.3.0
PyYAML                        6.0
requests                      2.28.1
requests-oauthlib             1.3.1
rsa                           4.9
ruamel.yaml                   0.17.21
ruamel.yaml.clib              0.2.6
scikit-image                  0.19.3
scikit-learn                  1.1.1
scipy                         1.9.0
seaborn                       0.11.2
sentry-sdk                    1.9.0
setproctitle                  1.3.0
setuptools                    63.2.0
shortuuid                     1.0.9
SimpleITK                     2.1.1.2
six                           1.16.0
smmap                         5.0.0
snowballstemmer               2.2.0
soupsieve                     2.3.2.post1
Sphinx                        4.3.2
sphinx-autodoc-typehints      1.11.1
sphinx-jsonschema             1.19.1
sphinx-prompt                 1.5.0
sphinx-rtd-theme              1.0.0
sphinx-tabs                   3.2.0
sphinx-toolbox                2.15.2
sphinxcontrib-applehelp       1.0.2
sphinxcontrib-devhelp         1.0.2
sphinxcontrib-htmlhelp        2.0.0
sphinxcontrib-jsmath          1.0.1
sphinxcontrib-qthelp          1.0.3
sphinxcontrib-serializinghtml 1.1.5
SQLAlchemy                    1.3.24
tabulate                      0.8.10
tensorboard                   2.9.1
tensorboard-data-server       0.6.1
tensorboard-plugin-wit        1.8.1
threadpoolctl                 3.1.0
tifffile                      2022.7.28
toml                          0.10.2
tomli                         2.0.1
torch                         1.8.1
torchio                       0.18.83
torchvision                   0.9.1
tqdm                          4.64.0
typing_extensions             4.3.0
urllib3                       1.26.11
virtualenv                    20.16.2
wandb                         0.12.21
webencodings                  0.5.1
Werkzeug                      2.2.1
wheel                         0.37.1
wrapt                         1.14.1
zipp                          3.8.1

I got the exact same error:

pytest -v -s testing/functional_tests/test_automate_training.py::test_automate_training_run_test[subprocess]
(ivadomed) p115628@bireli:~/src/ivadomed$ pytest -v -s testing/functional_tests/test_automate_training.py::test_automate_training_run_test[subprocess]
========================================================================= test session starts =========================================================================
platform linux -- Python 3.9.13, pytest-6.2.5, py-1.11.0, pluggy-1.0.0 -- /home/GRAMES.POLYMTL.CA/p115628/.conda/envs/ivadomed/bin/python3.9
cachedir: .pytest_cache
rootdir: /home/GRAMES.POLYMTL.CA/p115628/src/ivadomed, configfile: pytest.ini
plugins: cov-3.0.0, console-scripts-1.3.1, ordering-0.6, cases-3.6.13
collecting ... 2022-07-29 19:57:35.216 | INFO     | ivadomed.utils:init_ivadomed:454 - 
ivadomed (git-master-f0511a384ea1ff584c8ca136557655c2b0230ca8)

collected 1 item                                                                                                                                                      

testing/functional_tests/test_automate_training.py::test_automate_training_run_test[subprocess] 2022-07-29 19:57:35.219 | INFO     | testing.common_testing_util:download_dataset:54 - 
Downloading testing data... to data_functional_testing
2022-07-29 19:57:35.219 | INFO     | ivadomed.utils:init_ivadomed:454 - 
ivadomed (git-master-f0511a384ea1ff584c8ca136557655c2b0230ca8)

# Running console script: ivadomed_automate_training --config /home/GRAMES.POLYMTL.CA/p115628/src/ivadomed/data_functional_testing/automate_training_config.json --config-hyper /home/GRAMES.POLYMTL.CA/p115628/src/ivadomed/data_functional_testing/automate_training_hyperparameter_opt.json --path-data /home/GRAMES.POLYMTL.CA/p115628/src/ivadomed/data_functional_testing --output_dir /home/GRAMES.POLYMTL.CA/p115628/src/ivadomed/tmp/results --run-test
# Script return code: 1
# Script stdout:
2022-07-29 19:57:46.226 | ERROR    | ivadomed.scripts.automate_training:train_worker:100 - Got exception on main handler
Traceback (most recent call last):

  File "<string>", line 1, in <module>
  File "/home/GRAMES.POLYMTL.CA/p115628/.conda/envs/ivadomed/lib/python3.9/multiprocessing/spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
               │     │   └ 4
               │     └ 14
               └ <function _main at 0x7ff421070310>
  File "/home/GRAMES.POLYMTL.CA/p115628/.conda/envs/ivadomed/lib/python3.9/multiprocessing/spawn.py", line 129, in _main
    return self._bootstrap(parent_sentinel)
           │    │          └ 4
           │    └ <function BaseProcess._bootstrap at 0x7ff4211f2160>
           └ <SpawnProcess name='SpawnPoolWorker-1' parent=229145 started daemon>
  File "/home/GRAMES.POLYMTL.CA/p115628/.conda/envs/ivadomed/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
    │    └ <function BaseProcess.run at 0x7ff4211f1790>
    └ <SpawnProcess name='SpawnPoolWorker-1' parent=229145 started daemon>
  File "/home/GRAMES.POLYMTL.CA/p115628/.conda/envs/ivadomed/lib/python3.9/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
    │    │        │    │        │    └ {}
    │    │        │    │        └ <SpawnProcess name='SpawnPoolWorker-1' parent=229145 started daemon>
    │    │        │    └ (<multiprocessing.queues.SimpleQueue object at 0x7ff42108f070>, <multiprocessing.queues.SimpleQueue object at 0x7ff420eb89a0>...
    │    │        └ <SpawnProcess name='SpawnPoolWorker-1' parent=229145 started daemon>
    │    └ <function worker at 0x7ff420eafca0>
    └ <SpawnProcess name='SpawnPoolWorker-1' parent=229145 started daemon>
  File "/home/GRAMES.POLYMTL.CA/p115628/.conda/envs/ivadomed/lib/python3.9/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
                    │     │       └ {}
                    │     └ ((functools.partial(<function train_worker at 0x7ff375d03af0>, thr_incr=None), ({'command': 'train', 'gpu_ids': [7], 'path_ou...
                    └ <function mapstar at 0x7ff420ed6310>
  File "/home/GRAMES.POLYMTL.CA/p115628/.conda/envs/ivadomed/lib/python3.9/multiprocessing/pool.py", line 48, in mapstar
    return list(map(*args))
                     └ (functools.partial(<function train_worker at 0x7ff375d03af0>, thr_incr=None), ({'command': 'train', 'gpu_ids': [7], 'path_out...

> File "/home/GRAMES.POLYMTL.CA/p115628/src/ivadomed/ivadomed/scripts/automate_training.py", line 97, in train_worker
    ivado.run_command(config, thr_increment=thr_incr)
    │     │           │                     └ None
    │     │           └ {'command': 'train', 'gpu_ids': [7], 'path_output': 'tmp/logs-batch_size-2', 'model_name': 'unit_test', 'debugging': False, '...
    │     └ <function run_command at 0x7ff375d035e0>
    └ <module 'ivadomed.main' from '/home/GRAMES.POLYMTL.CA/p115628/src/ivadomed/ivadomed/main.py'>

  File "/home/GRAMES.POLYMTL.CA/p115628/src/ivadomed/ivadomed/main.py", line 375, in run_command
    cuda_available, device = imed_utils.define_device(context[ConfigKW.GPU_IDS][0])
                             │          │             │       │        └ 'gpu_ids'
                             │          │             │       └ <class 'ivadomed.keywords.ConfigKW'>
                             │          │             └ {'command': 'train', 'gpu_ids': [7], 'path_output': 'tmp/logs-batch_size-2', 'model_name': 'unit_test', 'debugging': False, '...
                             │          └ <function define_device at 0x7ff41cd5baf0>
                             └ <module 'ivadomed.utils' from '/home/GRAMES.POLYMTL.CA/p115628/src/ivadomed/ivadomed/utils.py'>

  File "/home/GRAMES.POLYMTL.CA/p115628/src/ivadomed/ivadomed/utils.py", line 165, in define_device
    torch.cuda.set_device(gpu_id)
    │     │    │          └ 7
    │     │    └ <function set_device at 0x7ff41a9cdc10>
    │     └ <module 'torch.cuda' from '/home/GRAMES.POLYMTL.CA/p115628/.conda/envs/ivadomed/lib/python3.9/site-packages/torch/cuda/__init...
    └ <module 'torch' from '/home/GRAMES.POLYMTL.CA/p115628/.conda/envs/ivadomed/lib/python3.9/site-packages/torch/__init__.py'>

  File "/home/GRAMES.POLYMTL.CA/p115628/.conda/envs/ivadomed/lib/python3.9/site-packages/torch/cuda/__init__.py", line 261, in set_device
    torch._C._cuda_setDevice(device)
    │     │  │               └ 7
    │     │  └ <built-in function _cuda_setDevice>
    │     └ <module 'torch._C' from '/home/GRAMES.POLYMTL.CA/p115628/.conda/envs/ivadomed/lib/python3.9/site-packages/torch/_C.cpython-39...
    └ <module 'torch' from '/home/GRAMES.POLYMTL.CA/p115628/.conda/envs/ivadomed/lib/python3.9/site-packages/torch/__init__.py'>

RuntimeError: CUDA error: invalid device ordinal
2022-07-29 19:57:46.232 | INFO     | ivadomed.scripts.automate_training:train_worker:101 - Unexpected error:
2022-07-29 19:57:46.232 | INFO     | ivadomed.main:set_output_path:212 - Creating output path: tmp/logs-batch_size-4
2022-07-29 19:57:46.288 | ERROR    | ivadomed.scripts.automate_training:train_worker:100 - Got exception on main handler
Traceback (most recent call last):

  File "<string>", line 1, in <module>
  File "/home/GRAMES.POLYMTL.CA/p115628/.conda/envs/ivadomed/lib/python3.9/multiprocessing/spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
               │     │   └ 4
               │     └ 14
               └ <function _main at 0x7ff421070310>
  File "/home/GRAMES.POLYMTL.CA/p115628/.conda/envs/ivadomed/lib/python3.9/multiprocessing/spawn.py", line 129, in _main
    return self._bootstrap(parent_sentinel)
           │    │          └ 4
           │    └ <function BaseProcess._bootstrap at 0x7ff4211f2160>
           └ <SpawnProcess name='SpawnPoolWorker-1' parent=229145 started daemon>
  File "/home/GRAMES.POLYMTL.CA/p115628/.conda/envs/ivadomed/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
    │    └ <function BaseProcess.run at 0x7ff4211f1790>
    └ <SpawnProcess name='SpawnPoolWorker-1' parent=229145 started daemon>
  File "/home/GRAMES.POLYMTL.CA/p115628/.conda/envs/ivadomed/lib/python3.9/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
    │    │        │    │        │    └ {}
    │    │        │    │        └ <SpawnProcess name='SpawnPoolWorker-1' parent=229145 started daemon>
    │    │        │    └ (<multiprocessing.queues.SimpleQueue object at 0x7ff42108f070>, <multiprocessing.queues.SimpleQueue object at 0x7ff420eb89a0>...
    │    │        └ <SpawnProcess name='SpawnPoolWorker-1' parent=229145 started daemon>
    │    └ <function worker at 0x7ff420eafca0>
    └ <SpawnProcess name='SpawnPoolWorker-1' parent=229145 started daemon>
  File "/home/GRAMES.POLYMTL.CA/p115628/.conda/envs/ivadomed/lib/python3.9/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
                    │     │       └ {}
                    │     └ ((functools.partial(<function train_worker at 0x7ff375d03af0>, thr_incr=None), ({'command': 'train', 'gpu_ids': [7], 'path_ou...
                    └ <function mapstar at 0x7ff420ed6310>
  File "/home/GRAMES.POLYMTL.CA/p115628/.conda/envs/ivadomed/lib/python3.9/multiprocessing/pool.py", line 48, in mapstar
    return list(map(*args))
                     └ (functools.partial(<function train_worker at 0x7ff375d03af0>, thr_incr=None), ({'command': 'train', 'gpu_ids': [7], 'path_out...

> File "/home/GRAMES.POLYMTL.CA/p115628/src/ivadomed/ivadomed/scripts/automate_training.py", line 97, in train_worker
    ivado.run_command(config, thr_increment=thr_incr)
    │     │           │                     └ None
    │     │           └ {'command': 'train', 'gpu_ids': [7], 'path_output': 'tmp/logs-batch_size-4', 'model_name': 'unit_test', 'debugging': False, '...
    │     └ <function run_command at 0x7ff375d035e0>
    └ <module 'ivadomed.main' from '/home/GRAMES.POLYMTL.CA/p115628/src/ivadomed/ivadomed/main.py'>

  File "/home/GRAMES.POLYMTL.CA/p115628/src/ivadomed/ivadomed/main.py", line 375, in run_command
    cuda_available, device = imed_utils.define_device(context[ConfigKW.GPU_IDS][0])
                             │          │             │       │        └ 'gpu_ids'
                             │          │             │       └ <class 'ivadomed.keywords.ConfigKW'>
                             │          │             └ {'command': 'train', 'gpu_ids': [7], 'path_output': 'tmp/logs-batch_size-4', 'model_name': 'unit_test', 'debugging': False, '...
                             │          └ <function define_device at 0x7ff41cd5baf0>
                             └ <module 'ivadomed.utils' from '/home/GRAMES.POLYMTL.CA/p115628/src/ivadomed/ivadomed/utils.py'>

  File "/home/GRAMES.POLYMTL.CA/p115628/src/ivadomed/ivadomed/utils.py", line 165, in define_device
    torch.cuda.set_device(gpu_id)
    │     │    │          └ 7
    │     │    └ <function set_device at 0x7ff41a9cdc10>
    │     └ <module 'torch.cuda' from '/home/GRAMES.POLYMTL.CA/p115628/.conda/envs/ivadomed/lib/python3.9/site-packages/torch/cuda/__init...
    └ <module 'torch' from '/home/GRAMES.POLYMTL.CA/p115628/.conda/envs/ivadomed/lib/python3.9/site-packages/torch/__init__.py'>

  File "/home/GRAMES.POLYMTL.CA/p115628/.conda/envs/ivadomed/lib/python3.9/site-packages/torch/cuda/__init__.py", line 261, in set_device
    torch._C._cuda_setDevice(device)
    │     │  │               └ 7
    │     │  └ <built-in function _cuda_setDevice>
    │     └ <module 'torch._C' from '/home/GRAMES.POLYMTL.CA/p115628/.conda/envs/ivadomed/lib/python3.9/site-packages/torch/_C.cpython-39...
    └ <module 'torch' from '/home/GRAMES.POLYMTL.CA/p115628/.conda/envs/ivadomed/lib/python3.9/site-packages/torch/__init__.py'>

RuntimeError: CUDA error: invalid device ordinal
2022-07-29 19:57:46.292 | INFO     | ivadomed.scripts.automate_training:train_worker:101 - Unexpected error:

# Script stderr:
2022-07-29 19:57:43.189 | WARNING  | ivadomed.scripts.visualize_and_compare_testing_models:<module>:26 - No backend can be used - Visualization will fail
2022-07-29 19:57:43.516 | INFO     | ivadomed.utils:init_ivadomed:454 - 
ivadomed (git-master-f0511a384ea1ff584c8ca136557655c2b0230ca8)

2022-07-29 19:57:43.519 | INFO     | ivadomed.scripts.automate_training:automate_training:695 - [7]
2022-07-29 19:57:45.822 | WARNING  | ivadomed.scripts.visualize_and_compare_testing_models:<module>:26 - No backend can be used - Visualization will fail
2022-07-29 19:57:46.146 | INFO     | ivadomed.main:set_output_path:212 - Creating output path: tmp/logs-batch_size-2
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/home/GRAMES.POLYMTL.CA/p115628/.conda/envs/ivadomed/lib/python3.9/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/home/GRAMES.POLYMTL.CA/p115628/.conda/envs/ivadomed/lib/python3.9/multiprocessing/pool.py", line 48, in mapstar
    return list(map(*args))
  File "/home/GRAMES.POLYMTL.CA/p115628/src/ivadomed/ivadomed/scripts/automate_training.py", line 97, in train_worker
    ivado.run_command(config, thr_increment=thr_incr)
  File "/home/GRAMES.POLYMTL.CA/p115628/src/ivadomed/ivadomed/main.py", line 375, in run_command
    cuda_available, device = imed_utils.define_device(context[ConfigKW.GPU_IDS][0])
  File "/home/GRAMES.POLYMTL.CA/p115628/src/ivadomed/ivadomed/utils.py", line 165, in define_device
    torch.cuda.set_device(gpu_id)
  File "/home/GRAMES.POLYMTL.CA/p115628/.conda/envs/ivadomed/lib/python3.9/site-packages/torch/cuda/__init__.py", line 261, in set_device
    torch._C._cuda_setDevice(device)
RuntimeError: CUDA error: invalid device ordinal
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/GRAMES.POLYMTL.CA/p115628/.conda/envs/ivadomed/bin/ivadomed_automate_training", line 33, in <module>
    sys.exit(load_entry_point('ivadomed', 'console_scripts', 'ivadomed_automate_training')())
  File "/home/GRAMES.POLYMTL.CA/p115628/src/ivadomed/ivadomed/scripts/automate_training.py", line 801, in main
    automate_training(file_config=args.config,
  File "/home/GRAMES.POLYMTL.CA/p115628/src/ivadomed/ivadomed/scripts/automate_training.py", line 715, in automate_training
    validation_scores = pool.map(partial(train_worker, thr_incr=thr_increment), config_list)
  File "/home/GRAMES.POLYMTL.CA/p115628/.conda/envs/ivadomed/lib/python3.9/multiprocessing/pool.py", line 364, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/home/GRAMES.POLYMTL.CA/p115628/.conda/envs/ivadomed/lib/python3.9/multiprocessing/pool.py", line 771, in get
    raise self._value
RuntimeError: CUDA error: invalid device ordinal

FAILED

============================================================================== FAILURES ===============================================================================
_____________________________________________________________ test_automate_training_run_test[subprocess] _____________________________________________________________

download_functional_test_files = None, script_runner = <ScriptRunner subprocess>

    @pytest.mark.script_launch_mode('subprocess')
    def test_automate_training_run_test(download_functional_test_files, script_runner):
        file_config = Path(__data_testing_dir__, 'automate_training_config.json')
        file_config_hyper = Path(__data_testing_dir__, 'automate_training_hyperparameter_opt.json')
        __output_dir__ = Path(__tmp_dir__, 'results')
    
        ret = script_runner.run('ivadomed_automate_training', '--config', f'{file_config}',
                                '--config-hyper', f'{file_config_hyper}',
                                '--path-data', f'{__data_testing_dir__}',
                                '--output_dir', f'{__output_dir__}',
                                '--run-test')
        logger.debug(f"{ret.stdout}")
        logger.debug(f"{ret.stderr}")
>       assert ret.success
E       assert False
E        +  where False = <pytest_console_scripts.RunResult object at 0x7f0d97dd40d0>.success

testing/functional_tests/test_automate_training.py:83: AssertionError
========================================================================== warnings summary ===========================================================================
ivadomed/utils.py:528
  /home/GRAMES.POLYMTL.CA/p115628/src/ivadomed/ivadomed/utils.py:528: DeprecationWarning: invalid escape sequence \s
    sep = re.compile('[\s]+')

-- Docs: https://docs.pytest.org/en/stable/warnings.html
======================================================================= short test summary info =======================================================================
FAILED testing/functional_tests/test_automate_training.py::test_automate_training_run_test[subprocess] - assert False
==================================================================== 1 failed, 1 warning in 13.95s ====================================================================

So I think this is a pre-existing bug, it's only a problem when working with GPUs, and it should not hold up merging this PR.

EDIT: I've fixed them: #1189

@kousu kousu requested a review from lifetheater57 July 30, 2022 00:06
@kousu
Copy link
Contributor Author

kousu commented Jul 30, 2022

Oh and fail4 is because those tests were run on romane. The test itself is wrong:

    @pytest.mark.skipif(current_platform != "Linux", reason="Function only works for Linux, skip on all other OS")
    def test_get_linux_system_memory():
        """
        Get Windows memory size
        Returns:

        """
        # Most computers/clusters should have memory of at least 100mb and no more than 256GB RAM
>       assert 0.1 < get_linux_system_memory() < 256
E       assert 503.56872940063477 < 256
E        +  where 503.56872940063477 = get_linux_system_memory()

romane has 512GiB of RAM (it's a big boy!). This test passes on bireli, so I think we shouldn't worry about it either.

Copy link
Contributor

@kanishk16 kanishk16 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a sanity check on my end using GPU (Compute Capability 7.6). Just an observation that would be nice to confirm on the lab GPUs as well: torch<1.11.0+cu111 doesn't complete the test in an hour, but torch<1.11.0+cu102 completes it in about 2 min. I believe this is related to #1141 which seems to be fixed in torch==1.11.0 release as torch==1.11.0+cu111 works great.

requirements.txt Outdated Show resolved Hide resolved
requirements.txt Outdated Show resolved Hide resolved
setup.py Outdated Show resolved Hide resolved
kousu and others added 3 commits August 3, 2022 14:47
onnxruntime>=1.12 is needed for python310, and is available for python39, python38 and python37 _from PyPI_.
But it's not available for all those versions on ComputeCanada's repo (https://docs.alliancecan.ca/wiki/Available_Python_wheels).
So onnxruntime>=1.8 should be equally functional and more broadly compatible.
Co-authored-by: Kanishk Kalra <36276423+kanishk16@users.noreply.github.com>
@kousu
Copy link
Contributor Author

kousu commented Aug 3, 2022

@kanishk16 what do you think now?

I admit don't really understand python versioning. There's too many operators. I don't know when it's best to use which, if I should use ~=M.m.p or ~=M.m or >=M.m or what.

@kousu kousu mentioned this pull request Aug 3, 2022
7 tasks
Copy link
Contributor

@kanishk16 kanishk16 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I admit don't really understand python versioning. There's too many operators. I don't know when it's best to use which, if I should use ~=M.m.p or ~=M.m or >=M.m or what.

I couldn't agree more, but reading up a bit on this, especially https://bernat.tech/posts/version-numbers/ and https://snarky.ca/why-i-dont-like-semver/ convinced me about not having an upper cap unless absolutely necessary. IMO, it boils down to how much we would like to restrict ourselves as ~=M.m.p < ~=M.m < >=M.m (in the order of most to least restricting). Also, it depends upon the maturity as well as the release cycle of the library/package per se. It is highly unlikely for torch to make a major release soon. Similarly, for onnxruntime. We should be good with either ~=M.m or >=M.m as both are pretty mature library/package in this case.

Hopefully as the ecosystems evolve and our experience grows we'd be able to decide better. What do you think @kousu?

Comment on lines +24 to +25
torch>=1.8.1,<=1.11.0
torchvision>=0.9.1,<=0.12.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
torch>=1.8.1,<=1.11.0
torchvision>=0.9.1,<=0.12.0
torch>=1.8.1
torchvision>=0.9.1

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing the upper bound on the version exposes us to potential issues if some breaking changes versions are made in future versions of either packages, isn't it? I would only keep the versions that we did tested (i.e. restricting to these with >=min, <=max bounds).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely... In hindsight, we're already making a big transition and I guess it would be better if we have an upper bound to narrow down in case we find any unusual behaviour.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But are you saying we should test every version of every dependency implied by our >=min, <=max bounds, @lifetheater57? I don't think there's python tooling that can do that, and even if there was the number of test cases would explode -- and our tests already take half an hour per commit.

I think the best we can do is set the max bound to the current latest torch (same with our other dependencies). If we ever miss updating our max for one of their releases we'll just have to assume it worked if the later one did. And if it doesn't we can go find the most recent version that did and pin to that.

Copy link
Contributor Author

@kousu kousu Aug 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran into a real-world example of lacking an upper bound causing problems last week: zalandoresearch/pytorch-ts#105

This project is declares gluonts>=0.9.0, sometime in the last few months gluonts moved their API all around without even writing a good changelog, so now torchts==0.6.0 is broken forevermore. pip install torchts==0.6.0 will always produce a broken install, so there's no way for people to go back in time and try an old version.

When I crashed into this I scrambled, and looked at the release dates of both projects to figure out that gluonts==0.9.3 was the current version at the time of it's release, and I was able to get it working with:

pip install 'torchts' 'gluonts<0.9.4'

So we will be a lot friendlier to the future if we put <max bounds on every one of our dependencies.

SCT had the same problem, by the way, but they solved it by ignoring most of python's packaging conventions; instead, making a release means generating and committing requirements-freeze.txt to the release branch, and then the (homegrown, unconventional, non-pip) install script will install all of them. I don't think that's good either because it overspecifies the dependencies and requires all this fragile shell scripting to get it right, and then it's impossible for anyone to build on SCT because SCT controls the entire environment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I would only test the min and max versions of the libraries we "manually" depends on if it is possible to automate that. For example, I would do it for PyTorch, but not for a library that is automatically installed by PyTorch.

Excellent, I agree.

What are the major downsides of going the "rolling releases" way?

I'm not totally sure. So far I've thought up two: it would leave us open to your original concern of surprise breakage, and it would mean that past versions will be broken forever, instead of just what's on the master branch.

I think for the sake of this PR, I am happy to leave the upper bound. We can keep thinking about this and decide the best practice for all our dependencies later.

I think this PR is probably ready to merge.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the major downsides of going the "rolling releases" way?

I'm not totally sure. So far I've thought up two: it would leave us open to your original concern of surprise breakage

In practice the "surprise breakage" of rolling releases means no one can install a working ivadomed, so we need to be able to issue hotfixes rapidly.

On the other hand, pinning dependencies doesn't avoid breakage, it just makes it someone else's problem: anyone who is trying to combine ivadomed with, say, a pytorch script they wrote that needs the very latest torch will be stuck. Though they can work around it with a virtualenv.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tl;dr:

We can?

  • rolling releases: entirely remove dependency specifiers

    • document: the only ivadomed we expect to work is "right now"
    • automate: use a cronjob to test master against all the latest dependencies every day
  • pinned releases: put <max bounds on all our dependencies

    • automate: rely on Dependabot to remind us bump + test when new releases come out

I pondered a bit upon this and I don't feel we're there yet to go with rolling releases, especially thinking about the integration of Multi-Contrast Multi-Session. I'm quite content with pinned releases as of now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all your thoughts. I'm going to start a new issue to keep talking about all this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-> #1191

requirements.txt Outdated Show resolved Hide resolved
@kousu kousu mentioned this pull request Aug 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependencies Pull requests that update a dependency file enhancement category: improves performance/results of an existing feature installation category: installation-related stuff priority:medium
Projects
None yet
Development

Successfully merging this pull request may close these issues.

python3.10 compatibility
8 participants