Skip to content

Reliably repeating pytorch system crash/reboot when using imagenet examples #3022

@castleguarders

Description

@castleguarders

So I have a 100% repeatable system crash (reboot) when trying to run the imagenet example (2012 dataset). resnet18 defaults. The crash seems to happen at Variable.py at torch.autograd.backward(..) (line 158).

I am able to run the basic mnist example successfully.

Setup: Ubuntu 16.04, 4.10.0-35-generic #39~16.04.1-Ubuntu SMP Wed Sep 13 09:02:42 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

python --version Python 3.6.2 :: Anaconda, Inc.

/usr/local/cuda/bin/nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Sep__1_21:08:03_CDT_2017
Cuda compilation tools, release 9.0, V9.0.176

nvidia-smi output.
Sat Oct 7 23:51:53 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.81 Driver Version: 384.81 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 108... Off | 00000000:03:00.0 On | N/A |
| 14% 51C P8 18W / 250W | 650MiB / 11170MiB | 0% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1335 G /usr/lib/xorg/Xorg 499MiB |
| 0 2231 G cinnamon 55MiB |
| 0 3390 G ...-token=C6DE372B6D9D4FCD6453869AF4C6B4E5 93MiB |
+-----------------------------------------------------------------------------+

torch/vision was built locally on the machine from master. No issues at compile or install time, other than the normal compile time warnings...

Happy to help get further information..

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions