Skip to content

Conversation

zou3519
Copy link
Contributor

@zou3519 zou3519 commented Sep 11, 2018

Also adds two additional tests that check for memory leaks while the relevant graph executors are alive:

  • (minimal test): Create a ScriptModule, keep it alive, and test that it does not leak memory while it is alive
  • (large test) Do MNIST training with a traced MNIST module and test that no memory is leaked while the traced module (with graph executor) is alive

cc @apaszke @zdevito

@zou3519 zou3519 added the oncall: jit Add this issue/PR to JIT oncall triage queue label Sep 11, 2018
Copy link
Contributor

@apaszke apaszke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but it would be great if we could make sure that the new tests don't take forever to run

test/test_jit.py Outdated

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

zou3519 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

zou3519 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Also adds two tests:
- Create a ScriptModule, keep it alive, and test that it does not leak memory while it is alive
- Do MNIST training and test that no memory is leaked while the Mnist module (with graph executor) is alive
Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

zou3519 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

zou3519 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

petrex pushed a commit to petrex/pytorch that referenced this pull request Sep 12, 2018
* master: (165 commits)
  Aibench for asr decoder
  Explicitly set locale on docs build. (pytorch#11595)
  Documentation for debugging JIT
  Fused weightnorm for ATen (pytorch#10842)
  Move Type, Tensor, TensorMethods to core.
  Add reminder % to the jit
  Fix reloading modules back into python (pytorch#11552)
  Add trigonometry functions to docs/source/onnx.rst
  Add EndToEndHybridModel CUDA tests (pytorch#11544)
  minor formatting error log (pytorch#11528)
  Warn that export+import module always load onto the CPU (pytorch#11485)
  caffe2::StorageImpl use at::DataPtr (pytorch#11282)
  Sync all libnccl soversions, not just libnccl.so.1 (pytorch#11575)
  Document BatchNorm and update default behavior (pytorch#11484)
  Typo fix in randomness.rst (pytorch#11571)
  Move some bmm/baddbmm to ATen (pytorch#11292)
  Make c10d test work on CPU only build (pytorch#11567)
  Clean up some C++ cruftiness in the script lexer.
  Allow setting deletion constant
  Make C10d support CPU only build (pytorch#11513)
  ...
@ssnl
Copy link
Collaborator

ssnl commented Sep 12, 2018

Is it expected that test_mnist_training_leaks_no_memory_cuda gives me this warning:

test_mnist_training_leaks_no_memory_cuda (__main__.TestEndToEndHybridFrontendModels) ... test/tes
t_jit.py:7046: TracerWarning: Trace had nondeterministic nodes. Nodes:
        %92 : Double(5, 50) = aten::dropout(%89, %90, %91), scope: MnistNet
This may cause errors in trace checking. To disable trace checking, pass check_trace=False to tor
ch.jit.trace()
  traced_net = torch.jit.trace(net, [torch.randn(5, 1, 28, 28, device='cuda')])
test/test_jit.py:7046: TracerWarning: Output nr 1. of the traced function does not match the corr
esponding output of the Python function. Detailed error:
Not within tolerance rtol=1e-05 atol=1e-08 at input[4, 3] (-2.3382810849225493 vs. -2.80311356175
0903) and 49 other locations (100.00%)
  traced_net = torch.jit.trace(net, [torch.randn(5, 1, 28, 28, device='cuda')])
ok

@zou3519 zou3519 deleted the jit-cuda-tests3 branch September 13, 2018 15:35
@zou3519
Copy link
Contributor Author

zou3519 commented Sep 13, 2018

@ssnl I'm fixing that in #11639

@ezyang ezyang added the merged label Jun 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

oncall: jit Add this issue/PR to JIT oncall triage queue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants