Skip to content

Remove CUDA 9.2 and older references from our cmake #65065

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

janeyx99
Copy link
Contributor

Removes old CUDA references in our cuda.cmake

@pytorch-probot
Copy link

pytorch-probot bot commented Sep 15, 2021

CI Flow Status

⚛️ CI Flow

Ruleset - Version: v1
Ruleset - File: https://github.com/janeyx99/pytorch/blob/61bc59389297792c08d6c222b1e9b1fd25e04ee2/.github/generated-ciflow-ruleset.json
PR ciflow labels: ciflow/default,ciflow/cuda

Workflows Labels (bold enabled) Status
Triggered Workflows
libtorch-linux-xenial-cuda10.2-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux ✅ triggered
libtorch-linux-xenial-cuda11.3-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux ✅ triggered
linux-bionic-cuda10.2-py3.9-gcc7 ciflow/all, ciflow/cuda, ciflow/linux, ciflow/slow ✅ triggered
linux-bionic-py3.6-clang9 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/noarch, ciflow/xla ✅ triggered
linux-bionic-py3.8-gcc9-coverage ciflow/all, ciflow/coverage, ciflow/cpu, ciflow/default, ciflow/linux ✅ triggered
linux-xenial-cuda10.2-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/linux, ciflow/slow ✅ triggered
linux-xenial-cuda11.3-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/default, ciflow/linux ✅ triggered
linux-xenial-py3.6-gcc5.4 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux ✅ triggered
linux-xenial-py3.6-gcc7-bazel-test ciflow/all, ciflow/bazel, ciflow/cpu, ciflow/default, ciflow/linux ✅ triggered
periodic-libtorch-linux-xenial-cuda11.1-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/scheduled ✅ triggered
periodic-linux-xenial-cuda11.1-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled ✅ triggered
periodic-win-vs2019-cuda11.1-py3 ciflow/all, ciflow/cuda, ciflow/scheduled, ciflow/win ✅ triggered
win-vs2019-cpu-py3 ciflow/all, ciflow/cpu, ciflow/default, ciflow/win ✅ triggered
win-vs2019-cuda10.2-py3 ciflow/all, ciflow/cuda, ciflow/win ✅ triggered
win-vs2019-cuda11.3-py3 ciflow/all, ciflow/cuda, ciflow/default, ciflow/win ✅ triggered
Skipped Workflows
parallelnative-linux-xenial-py3.6-gcc5.4 ciflow/all, ciflow/cpu, ciflow/linux 🚫 skipped
paralleltbb-linux-xenial-py3.6-gcc5.4 ciflow/all, ciflow/cpu, ciflow/linux 🚫 skipped
puretorch-linux-xenial-py3.6-gcc5.4 ciflow/all, ciflow/cpu, ciflow/linux 🚫 skipped

You can add a comment to the PR and tag @pytorchbot with the following commands:
# ciflow rerun, "ciflow/default" will always be added automatically
@pytorchbot ciflow rerun

# ciflow rerun with additional labels "-l <ciflow/label_name>", which is equivalent to adding these labels manually and trigger the rerun
@pytorchbot ciflow rerun -l ciflow/scheduled -l ciflow/slow

For more information, please take a look at the CI Flow Wiki.

@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Sep 15, 2021

🔗 Helpful links

💊 CI failures summary and remediations

As of commit 61bc593 (more details on the Dr. CI page):


  • 6/6 failures introduced in this PR

🕵️ 6 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See GitHub Actions build linux-bionic-cuda10.2-py3.9-gcc7 / test (distributed, 1, 1, linux.8xlarge.nvidia.gpu) (1/6)

Step: "Unknown" (full log | diagnosis details | 🔁 rerun)

2021-09-16T15:54:17.7149669Z RuntimeError: CUDA error: invalid device ordinal
2021-09-16T15:54:17.7139833Z   File "/opt/conda/lib/python3.9/site-packages/torch/distributed/rpc/internal.py", line 204, in _run_function
2021-09-16T15:54:17.7140793Z     result = python_udf.func(*python_udf.args, **python_udf.kwargs)
2021-09-16T15:54:17.7141993Z   File "/opt/conda/lib/python3.9/site-packages/torch/distributed/nn/api/remote_module.py", line 87, in _create_module
2021-09-16T15:54:17.7142866Z     module.to(device)
2021-09-16T15:54:17.7143774Z   File "/opt/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 899, in to
2021-09-16T15:54:17.7144530Z     return self._apply(convert)
2021-09-16T15:54:17.7145879Z   File "/opt/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 593, in _apply
2021-09-16T15:54:17.7146659Z     param_applied = fn(param)
2021-09-16T15:54:17.7147608Z   File "/opt/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 897, in convert
2021-09-16T15:54:17.7148836Z     return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
2021-09-16T15:54:17.7149669Z RuntimeError: CUDA error: invalid device ordinal
2021-09-16T15:54:17.7150637Z CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
2021-09-16T15:54:17.7152081Z For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
2021-09-16T15:54:17.7152535Z 
2021-09-16T15:54:17.7152782Z 
2021-09-16T15:54:17.8870162Z ok (1.612s)
2021-09-16T15:54:21.4016578Z   test_valid_device (__main__.TensorPipeCudaRemoteModuleTest) ... ok (3.514s)
2021-09-16T15:54:28.6308177Z   test_profiler_remote_cuda (__main__.TensorPipeCudaRpcTest) ... ok (7.229s)
2021-09-16T15:54:30.0414777Z   test_basic_gloo_ckpt_always (__main__.TensorPipePipeWithDDPTest) ... skip (1.410s)
2021-09-16T15:54:31.4521644Z   test_basic_gloo_ckpt_except_last (__main__.TensorPipePipeWithDDPTest) ... skip (1.410s)
2021-09-16T15:54:32.8624144Z   test_basic_gloo_ckpt_never (__main__.TensorPipePipeWithDDPTest) ... skip (1.410s)

See GitHub Actions build linux-xenial-py3.6-gcc5.4 / test (default, 1, 2, linux.2xlarge) (2/6)

Step: "Unknown" (full log | diagnosis details | 🔁 rerun)

2021-09-16T15:58:03.8311509Z CONTINUE_THROUGH_ERROR: false
  "cla signed",
  "ciflow/default",
  "ciflow/cuda"
]
2021-09-16T15:58:03.8308476Z   DOCKER_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3.6-gcc5.4:74e757e8b0cf750d2f91db6aa4c29640abce32ea
2021-09-16T15:58:03.8309628Z   JOB_BASE_NAME: linux-xenial-py3.6-gcc5.4-test
2021-09-16T15:58:03.8310126Z   TEST_CONFIG: default
2021-09-16T15:58:03.8310437Z   SHARD_NUMBER: 1
2021-09-16T15:58:03.8310741Z   NUM_TEST_SHARDS: 2
2021-09-16T15:58:03.8311102Z   PYTORCH_IGNORE_DISABLED_ISSUES: 
2021-09-16T15:58:03.8311509Z   CONTINUE_THROUGH_ERROR: false
2021-09-16T15:58:03.8311847Z   SHM_SIZE: 1g
2021-09-16T15:58:03.8312120Z   PR_NUMBER: 65065
2021-09-16T15:58:03.8312426Z ##[endgroup]
2021-09-16T15:58:17.0078756Z Processing ./dist/torch-1.10.0a0+git8a72595-cp36-cp36m-linux_x86_64.whl
2021-09-16T15:58:17.0354355Z Requirement already satisfied: dataclasses in /opt/conda/lib/python3.6/site-packages (from torch==1.10.0a0+git8a72595) (0.8)
2021-09-16T15:58:17.0358090Z Requirement already satisfied: typing-extensions in /opt/conda/lib/python3.6/site-packages (from torch==1.10.0a0+git8a72595) (3.10.0.0)
2021-09-16T15:58:17.3121937Z Installing collected packages: torch
2021-09-16T15:58:22.2332101Z Successfully installed torch-1.10.0a0+git8a72595
2021-09-16T15:58:22.3066164Z ++++ dirname .jenkins/pytorch/common.sh
2021-09-16T15:58:22.3072840Z +++ cd .jenkins/pytorch

See GitHub Actions build linux-xenial-cuda11.3-py3.6-gcc7 / test (distributed, 1, 1, linux.8xlarge.nvidia.gpu) (3/6)

Step: "Test PyTorch" (full log | diagnosis details | 🔁 rerun)

2021-09-16T16:09:54.8048811Z RuntimeError: Expe...e, but found at least two devices, cuda:0 and cpu!
2021-09-16T16:09:54.7957314Z     return x.cpu() + y.cuda()
2021-09-16T16:09:54.7958098Z RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
2021-09-16T16:09:54.7958705Z 
2021-09-16T16:09:54.8041944Z On WorkerInfo(id=2, name=worker2):
2021-09-16T16:09:54.8043145Z RuntimeError('Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!',)
2021-09-16T16:09:54.8043971Z Traceback (most recent call last):
2021-09-16T16:09:54.8044994Z   File "/opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py", line 204, in _run_function
2021-09-16T16:09:54.8045949Z     result = python_udf.func(*python_udf.args, **python_udf.kwargs)
2021-09-16T16:09:54.8047167Z   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/distributed/rpc/rpc_test.py", line 5879, in _gpu_add_wrong_gpus
2021-09-16T16:09:54.8048024Z     return x.cpu() + y.cuda()
2021-09-16T16:09:54.8048811Z RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
2021-09-16T16:09:54.8049416Z 
2021-09-16T16:09:55.3977469Z ok (6.227s)
2021-09-16T16:09:57.0170524Z   test_devices_option_mismatch (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... ok (1.619s)
2021-09-16T16:09:58.6359952Z   test_devices_option_mismatch_reverse (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... ok (1.619s)
2021-09-16T16:10:05.6621108Z   test_owner_rref_forward_synchronization1 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... ok (7.026s)
2021-09-16T16:10:14.8924536Z   test_owner_rref_forward_synchronization2 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... ok (9.230s)
2021-09-16T16:10:23.8217400Z   test_owner_rref_forward_synchronization3 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... ok (8.929s)
2021-09-16T16:10:30.7505210Z   test_owner_rref_forward_synchronization4 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... ok (6.929s)
2021-09-16T16:10:46.9922111Z   test_rref_as_arg_synchronization1 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... ok (16.242s)
2021-09-16T16:11:06.5469450Z   test_rref_as_arg_synchronization2 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... ok (19.555s)

See GitHub Actions build linux-bionic-cuda10.2-py3.9-gcc7 / test (default, 1, 2, linux.8xlarge.nvidia.gpu) (4/6)

Step: "Unknown" (full log | diagnosis details | 🔁 rerun)

2021-09-16T16:25:57.5527679Z AssertionError: can only test a child process
2021-09-16T16:25:57.5510879Z     assert self._parent_pid == os.getpid(), 'can only test a child process'
2021-09-16T16:25:57.5511617Z AssertionError: can only test a child process
2021-09-16T16:25:57.5517382Z Exception ignored in: <function _MultiProcessingDataLoaderIter.__del__ at 0x7f9eb86b9700>
2021-09-16T16:25:57.5518358Z Traceback (most recent call last):
2021-09-16T16:25:57.5519416Z   File "/opt/conda/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1328, in __del__
2021-09-16T16:25:57.5522177Z     self._shutdown_workers()
2021-09-16T16:25:57.5523610Z   File "/opt/conda/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1320, in _shutdown_workers
2021-09-16T16:25:57.5524801Z     if w.is_alive():
2021-09-16T16:25:57.5525787Z   File "/opt/conda/lib/python3.9/multiprocessing/process.py", line 160, in is_alive
2021-09-16T16:25:57.5526921Z     assert self._parent_pid == os.getpid(), 'can only test a child process'
2021-09-16T16:25:57.5527679Z AssertionError: can only test a child process
2021-09-16T16:25:59.6019169Z ok (2.227s)
2021-09-16T16:26:02.9170510Z   test_multiprocessing_contexts (__main__.TestDataLoaderPersistentWorkers) ... [W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
2021-09-16T16:26:02.9172473Z [W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
2021-09-16T16:26:02.9194519Z [W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
2021-09-16T16:26:06.2986296Z [W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
2021-09-16T16:26:06.2987814Z [W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
2021-09-16T16:26:06.2997508Z [W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
2021-09-16T16:26:12.9224705Z ok (13.320s)
2021-09-16T16:26:14.0196469Z   test_no_segfault (__main__.TestDataLoaderPersistentWorkers) ... ok (1.097s)
2021-09-16T16:26:14.0226163Z   test_numpy (__main__.TestDataLoaderPersistentWorkers) ... ok (0.003s)

See GitHub Actions build linux-xenial-cuda10.2-py3.6-gcc7 / test (default, 1, 2, linux.8xlarge.nvidia.gpu) (5/6)

Step: "Unknown" (full log | diagnosis details | 🔁 rerun)

2021-09-16T15:50:25.3460372Z CONTINUE_THROUGH_ERROR: false
  "cla signed",
  "ciflow/default",
  "ciflow/cuda"
]
2021-09-16T15:50:25.3455846Z   DOCKER_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7:74e757e8b0cf750d2f91db6aa4c29640abce32ea
2021-09-16T15:50:25.3457687Z   JOB_BASE_NAME: linux-xenial-cuda10.2-py3.6-gcc7-test
2021-09-16T15:50:25.3458468Z   TEST_CONFIG: default
2021-09-16T15:50:25.3458898Z   SHARD_NUMBER: 1
2021-09-16T15:50:25.3459328Z   NUM_TEST_SHARDS: 2
2021-09-16T15:50:25.3459836Z   PYTORCH_IGNORE_DISABLED_ISSUES: 
2021-09-16T15:50:25.3460372Z   CONTINUE_THROUGH_ERROR: false
2021-09-16T15:50:25.3460874Z   GPU_FLAG: --gpus all
2021-09-16T15:50:25.3461274Z   SHM_SIZE: 2g
2021-09-16T15:50:25.3461678Z   PR_NUMBER: 65065
2021-09-16T15:50:25.3462084Z ##[endgroup]
2021-09-16T15:50:51.6730210Z Processing ./dist/torch-1.10.0a0+git8a72595-cp36-cp36m-linux_x86_64.whl
2021-09-16T15:50:51.7113552Z Requirement already satisfied: dataclasses in /opt/conda/lib/python3.6/site-packages (from torch==1.10.0a0+git8a72595) (0.8)
2021-09-16T15:50:51.7119189Z Requirement already satisfied: typing-extensions in /opt/conda/lib/python3.6/site-packages (from torch==1.10.0a0+git8a72595) (3.10.0.0)
2021-09-16T15:50:52.1214788Z Installing collected packages: torch
2021-09-16T15:51:01.8534353Z Successfully installed torch-1.10.0a0+git8a72595
2021-09-16T15:51:01.9559833Z ++++ dirname .jenkins/pytorch/common.sh

See GitHub Actions build linux-xenial-cuda10.2-py3.6-gcc7 / test (multigpu, 1, 1, linux.16xlarge.nvidia.gpu) (6/6)

Step: "Unknown" (full log | diagnosis details | 🔁 rerun)

2021-09-16T16:04:47.7298674Z RuntimeError: CUDA error: invalid device ordinal
2021-09-16T16:04:47.7289643Z   File "/opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py", line 204, in _run_function
2021-09-16T16:04:47.7290598Z     result = python_udf.func(*python_udf.args, **python_udf.kwargs)
2021-09-16T16:04:47.7291776Z   File "/opt/conda/lib/python3.6/site-packages/torch/distributed/nn/api/remote_module.py", line 87, in _create_module
2021-09-16T16:04:47.7292605Z     module.to(device)
2021-09-16T16:04:47.7293474Z   File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 899, in to
2021-09-16T16:04:47.7294230Z     return self._apply(convert)
2021-09-16T16:04:47.7295145Z   File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 593, in _apply
2021-09-16T16:04:47.7295900Z     param_applied = fn(param)
2021-09-16T16:04:47.7296824Z   File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 897, in convert
2021-09-16T16:04:47.7297758Z     return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
2021-09-16T16:04:47.7298674Z RuntimeError: CUDA error: invalid device ordinal
2021-09-16T16:04:47.7299638Z CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
2021-09-16T16:04:47.7300639Z For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
2021-09-16T16:04:47.7301092Z 
2021-09-16T16:04:47.7301319Z 
2021-09-16T16:04:47.9099223Z ok (1.816s)
2021-09-16T16:04:51.7315158Z   test_valid_device (__main__.TensorPipeCudaRemoteModuleTest) ... ok (3.822s)
2021-09-16T16:04:59.5705089Z   test_profiler_remote_cuda (__main__.TensorPipeCudaRpcTest) ... ok (7.839s)
2021-09-16T16:05:06.7148909Z   test_basic_gloo_ckpt_always (__main__.TensorPipePipeWithDDPTest) ... [W logger.cpp:305] Warning: Cuda time stats are not collected for multi-device modules. (function operator())
2021-09-16T16:05:06.7150833Z [W logger.cpp:305] Warning: Cuda time stats are not collected for multi-device modules. (function operator())
2021-09-16T16:05:07.5005898Z ok (7.930s)

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@janeyx99 janeyx99 requested a review from a team September 15, 2021 16:23
@zhouzhuojie
Copy link
Contributor

@pytorchbot ciflow rerun -l ciflow/cuda

@zhouzhuojie
Copy link
Contributor

@janeyx99 I'm trying out cilflow from reviewers' view, adding ciflow/cuda btw

@codecov
Copy link

codecov bot commented Sep 15, 2021

Codecov Report

Merging #65065 (61bc593) into master (8800a8b) will increase coverage by 0.00%.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master   #65065   +/-   ##
=======================================
  Coverage   66.37%   66.38%           
=======================================
  Files         727      727           
  Lines       93571    93571           
=======================================
+ Hits        62109    62116    +7     
+ Misses      31462    31455    -7     

@facebook-github-bot
Copy link
Contributor

@janeyx99 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@janeyx99 merged this pull request in 9af6fe9.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants