Skip to content

Conversation

ssnl
Copy link
Collaborator

@ssnl ssnl commented Jun 29, 2019

Having the NVRTC stub in ATen is necessary to call driver APIs in ATen. This is currently blocking #22229.

DynamicLibrary is also moved as it is used in the stub code, and seems general enough.

@pytorchbot pytorchbot added caffe2 oncall: jit Add this issue/PR to JIT oncall triage queue module: build Build system issues module: cuda Related to torch.cuda, and CUDA support in general module: internals Related to internal abstractions in c10 and ATen labels Jun 29, 2019
@ssnl ssnl requested review from soumith and zdevito June 29, 2019 03:40
@ssnl ssnl force-pushed the thnvrtc_aten branch 6 times, most recently from e4af4de to 2b41e59 Compare June 30, 2019 01:29
@ssnl
Copy link
Collaborator Author

ssnl commented Jun 30, 2019

The test error is from #22052 , not from this PR.

Copy link
Contributor

@zdevito zdevito left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code changes look fine to me. I didn't review the build changes because I do not have the context for how they work anymore.

@vadimkantorov
Copy link
Contributor

Does this open a way to custom fast user-provided CUDA-snippets? (like element-wise fused ops)

@ssnl
Copy link
Collaborator Author

ssnl commented Jul 1, 2019

@vadimkantorov could you elaborate a bit on what you mean exactly?

@vadimkantorov
Copy link
Contributor

vadimkantorov commented Jul 1, 2019

I mean providing a wrapper that, given a user-provided CUDA expresion like x => sin(cos(x)), can in runtime generate a kernel (without saving it to disk, like cpp_extension.load_inline would do, and I think there are also some restrictions on ABI matching the compiler that was used for compiling PyTorch) that applies over a CUDA tensor (inplace or outplace). Probably binary apply functions could exist as well. Even if this would exist only in contrib, an example of NVRTC runtime compilation and creation of a function for dense contiguous float CUDA tensors can be helpful.

In principle it's JIT's job, but meanwhile it's evolving, having a CUPY-like functionality: https://docs-cupy.chainer.org/en/stable/reference/kernel.html would be cool.

@ssnl
Copy link
Collaborator Author

ssnl commented Jul 1, 2019

@vadimkantorov I see.

Let me explain a bit. This PR adds nothing new (unless you are using ATen and only ATen). The NVRTC stub currently lives in libtorch, being used by jit exactly. The PR only moves it to ATen. Technically what you described could be a feature implemented in libtorch without this PR, and using the existing stub there. But maybe moving the stub to ATen makes bindings easier, idk. :)

@zhangguanheng66 zhangguanheng66 requested a review from colesbury July 1, 2019 22:16
@zhangguanheng66 zhangguanheng66 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jul 1, 2019
@ssnl
Copy link
Collaborator Author

ssnl commented Jul 2, 2019

I think it would be great if this is also reviewed by a build person :)

@zou3519
Copy link
Contributor

zou3519 commented Jul 2, 2019

I'm not sure if this is relevant, but I think you can call libtorch code from ATen.

@ssnl
Copy link
Collaborator Author

ssnl commented Jul 2, 2019

@zou3519 fascinating! Are there examples on how to do that?

@zou3519
Copy link
Contributor

zou3519 commented Jul 2, 2019

You can include a torch header from any ATen file:

#include <torch/csrc/utils/memory.h>
.

This is possible because @kostmo did some build work on merging libcaffe2 and libtorch so that they are the same library. This will definitely break the ATen standalone library build though, if it still exists and if it hasn't been broken already. I don't know what our plan regarding the standalone is.

@ssnl
Copy link
Collaborator Author

ssnl commented Jul 2, 2019

@zou3519 Oh that's cool. Good to know that. Thanks! :)

@ssnl ssnl requested a review from kostmo July 3, 2019 18:15
@ssnl
Copy link
Collaborator Author

ssnl commented Jul 3, 2019

Hey @kostmo , can you review this? Thank you :)

@ssnl
Copy link
Collaborator Author

ssnl commented Jul 3, 2019

@pytorchbot merge this please :)

@pytorchbot pytorchbot added the merge-this-please Was marked for merge with @pytorchbot merge this please label Jul 3, 2019
Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ezyang is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@ezyang
Copy link
Contributor

ezyang commented Jul 8, 2019

It looks like this diff need some internal build updates.

@ssnl
Copy link
Collaborator Author

ssnl commented Jul 8, 2019

@ezyang Let me know if there is any thing I can help. I am more than happy to!

@ezyang
Copy link
Contributor

ezyang commented Jul 8, 2019

The internal CI error is

caffe2/aten/src/ATen/cuda/detail/CUDAHooks.cpp: included an untracked header: 
caffe2/aten/src/ATen/cuda/nvrtc_stub/ATenNVRTC.h

I'm waiting for my internal source copy to check out and I'll look some more.

@ezyang
Copy link
Contributor

ezyang commented Jul 8, 2019

I got a simple fix. Testing.

@ssnl ssnl deleted the thnvrtc_aten branch July 9, 2019 14:32
zdevito pushed a commit to zdevito/ATen that referenced this pull request Jul 9, 2019
Summary:
Having the NVRTC stub in ATen is necessary to call driver APIs in ATen. This is currently blocking pytorch/pytorch#22229.

`DynamicLibrary` is also moved as it is used in the stub code, and seems general enough.
Pull Request resolved: pytorch/pytorch#22362

Differential Revision: D16131787

Pulled By: ezyang

fbshipit-source-id: add2ee8a8865229578aa00001a00d5a6671e0e73
@facebook-github-bot
Copy link
Contributor

@ezyang merged this pull request in 31d821e.

@ezyang
Copy link
Contributor

ezyang commented Jul 10, 2019

Internally, users are reporting this error:

Error in dlopen or dlsym: libcaffe2_nvrtc.so: cannot open shared object file: No such file or directory

@ssnl
Copy link
Collaborator Author

ssnl commented Jul 10, 2019

@ezyang hmmm I know that internal builds are not using CMake. Maybe some internal build changes are needed for it to build the libcaffe2_nvtrc?

@ezyang
Copy link
Contributor

ezyang commented Jul 10, 2019

So, what I'm kind of thinking right now, is that all I need to do is turn USE_DIRECT_NVRTC back on for the internal build. Because fbcode has its own strategy for lazy loading of CUDA libraries, so the thing you did here isn't necessary in that case. Unfortunately, I didn't actually review this PR, so I am not sure.

@ssnl
Copy link
Collaborator Author

ssnl commented Jul 10, 2019

@ezyang Ah I see. So the fbcode build just puts everything in one library, including the libcaffe2_nvrtc bits. Let me know how I can help with this!

also cc @kostmo who reviewed :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

caffe2 merge-this-please Was marked for merge with @pytorchbot merge this please Merged module: build Build system issues module: cuda Related to torch.cuda, and CUDA support in general module: internals Related to internal abstractions in c10 and ATen oncall: jit Add this issue/PR to JIT oncall triage queue open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants