Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Seg fault with test_rnn_retain_variables on ppc64le #16953

Open
deepali-c opened this issue Feb 11, 2019 · 1 comment
Open

Seg fault with test_rnn_retain_variables on ppc64le #16953

deepali-c opened this issue Feb 11, 2019 · 1 comment
Labels
module: crash Problem manifests as a hard crash, as opposed to a RuntimeError module: POWER Issues specific to the POWER/ppc architecture triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@deepali-c
Copy link
Contributor

馃悰 Bug

Observed segfault with the test case test_rnn_retain_variables:

Other tests that crash are:

test_cuda_rnn_fused
test_rnn_initial_hidden_state
test_einsum

To Reproduce

The minimum to code to reproduce is:

import torch
import torch.nn as nn
device="cpu"
dtype=torch.double
rnn = nn.GRU(10, 20, num_layers=2).to(device,dtype)
input = torch.randn(5, 6, 10, device=device, dtype=dtype, requires_grad=True)
output = rnn(input)

Other observations are:
a. The above code works for device="cuda"
b. The above code works for dtype=torch.float
c. The above code works if the following is used for input:
input = torch.randn(3, 3, 10, device=device, dtype=dtype, requires_grad=True)

Expected behavior

The tests should pass.

Environment

PyTorch version: 1.0.0a0+7998997 (with some local changes)
Is debug build: No
CUDA used to build PyTorch: 10.0.130

OS: Red Hat Enterprise Linux Server 7.6 (Maipo)
GCC version: (GCC) 4.8.5 20150623 (Red Hat 4.8.5-36)
CMake version: Could not collect

Python version: 3.6
Is CUDA available: Yes
CUDA runtime version: 10.0.130
GPU models and configuration:
GPU 0: Tesla V100-SXM2-16GB
GPU 1: Tesla V100-SXM2-16GB
GPU 2: Tesla V100-SXM2-16GB
GPU 3: Tesla V100-SXM2-16GB

Nvidia driver version: 410.72
cuDNN version: Could not collect

Additional context

@fmassa fmassa added the module: crash Problem manifests as a hard crash, as opposed to a RuntimeError label Feb 19, 2019
@ezyang
Copy link
Contributor

ezyang commented Jun 16, 2019

@umanwizard umanwizard added module: POWER Issues specific to the POWER/ppc architecture triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Jun 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: crash Problem manifests as a hard crash, as opposed to a RuntimeError module: POWER Issues specific to the POWER/ppc architecture triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

4 participants