Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make the state structs accessible directly from C. #21

Closed
Jorghi12 opened this issue Jul 10, 2018 · 20 comments
Closed

Make the state structs accessible directly from C. #21

Jorghi12 opened this issue Jul 10, 2018 · 20 comments

Comments

@Jorghi12
Copy link

Jorghi12 commented Jul 10, 2018

You can take a look at CUDA headers and you'll see that state is designed as a struct: https://github.com/Geof23/Gklee/blob/master/Gklee/include/cuda/curand_mtgp32.h#L194

rocRAND however has its state defined as a class with member functions. This is not supported in C and thus it's not possible to obtain C style linkage.

Why is this important? Many deep learning frameworks interface with Python and C via Python C Extensions.

I suggest two possible solutions would be either

  1. Add a constructor to the mtgp32_engine class that takes in a hiprandStateMtgp32 object, allowing conversions from hiprandStateMtgp32 to mtgp32_engine.

OR

  1. Change the mtgp32_engine class to a C style struct by removing the constructors & member functions, and instead make these free functions (i.e. global scope).
@Jorghi12 Jorghi12 changed the title MTGP State is missing forward declarations Forward declarations for MTGP State are missing. Jul 10, 2018
@Jorghi12 Jorghi12 changed the title Forward declarations for MTGP State are missing. rocRAND needs forward declarations that support C name mangling Jul 11, 2018
@Jorghi12 Jorghi12 changed the title rocRAND needs forward declarations that support C name mangling rocRAND should have forward declarations that support C name mangling Jul 11, 2018
@jszuppe
Copy link
Contributor

jszuppe commented Jul 11, 2018

As far as I know name mangling applies to C++ (which has namespaces, overloading etc.), not to C, I guess you meant disabling name mangling. I think I know what is the basic requirement here. You want to be able to compile hipandMakeMTGP32KernelState and hiprandMakeMTGP32Constants functions in C, right?

Edit: You may also have a C++ function with C API which takes curandStateMtgp32 as input parameter or returns it. In that case you don't need those functions, only the struct. I guess that's what your are looking for.

Thus, you'll be able to apply extern C and obtain C style linkage.

Yes, I guess user can add extern "C" around #include curand_mtgp32_host.h.

Why is this important? Many deep learning frameworks interface with Python and C via Python C Extensions.

Can you provide test case or a link to a library which does that?

Add a constructor to the mtgp32_engine class that takes in a hiprandStateMtgp32 object, allowing conversions from hiprandStateMtgp32 to mtgp32_engine.

You can notice that in rocRAND/hipRAND there is no hiprandStateMtgp32, there is only hiprandStateMtgp32_t (and that is true for every state).

There are two reasons behind this: 1. you can't write struct hiprandStateMtgp32 if hiprandStateMtgp32 is an alias. hipRAND is a wrapper for cuRAND and rocRAND and not using aliases (typedefs) is troublesome, 2. cuRAND uses hiprandState<>_t everywhere anyway. In other words, even if we adjust it to work in C only hiprandStateMtgp32_t will be available and you won't be able to write struct hiprandStateMtgp32 (struct hiprandStateMtgp32_t would obviously be wrong too).

@jszuppe
Copy link
Contributor

jszuppe commented Jul 11, 2018

You can take a look at CUDA headers and you'll see that state is designed as a struct.

Yes, but only curandStateMtgp32 and curandStatePhilox4_32_10 can be used in C API, because other states are declared in curand_kernel.h which has C++-style functions (overloading). I guess it's more like a coincident than a deliberate decision. Anyway, I guess it's needed for porting something, so it's not like you're writing your own code. We will look into that.

btw. curandStateMtgp32[_t] and hiprandStateMtgp32_t types should be treated as opaque types. Accessing fields is incorrect and won't work as rocRAND and cuRAND states have different fields.

@Jorghi12
Copy link
Author

Jorghi12 commented Jul 11, 2018

@jszuppe From what I'm seeing, curandStateMtgp32 and curandStatePhilox4_32_10 are not the only states that can be used in the C API.

All of them can interleave nicely with the C API. Take a look below.

curandStateXORWOW: https://github.com/cheichler/RemoteRendering/blob/master/inc/curand_kernel.h#L122

curandStateMRG32k3a:
https://github.com/cheichler/RemoteRendering/blob/master/inc/curand_kernel.h#L186

curandStateSobol32:
https://github.com/cheichler/RemoteRendering/blob/master/inc/curand_kernel.h#L209

curandStateScrambledSobol32:
https://github.com/cheichler/RemoteRendering/blob/master/inc/curand_kernel.h#L227

curandStateSobol64:
https://github.com/cheichler/RemoteRendering/blob/master/inc/curand_kernel.h#L245

curandStateScrambledSobol64:
https://github.com/cheichler/RemoteRendering/blob/master/inc/curand_kernel.h#L263

Example:
Here's an example from 2017's PyTorch ROCm port. https://github.com/ROCmSoftwarePlatform/cutorch_hip/blob/7ca43279783513d1084a422d23d73c802ed825b7/lib/THC/MTGP/hiprand_mtgp32.h#L30

Notice that the old version of hiprand successfully followed CUDA by allowing forward declaring the HipRandStateMtgp32.

@jszuppe
Copy link
Contributor

jszuppe commented Jul 11, 2018

In that header you have functions like curand() that has multiple overloads. C compiler should complain about that.

@ex-rzr
Copy link
Contributor

ex-rzr commented Jul 11, 2018

This header also includes templates.

@Jorghi12
Copy link
Author

Jorghi12 commented Jul 11, 2018

@jszuppe So essentially, you'd create a header like this where only the structs are defined. https://github.com/Geof23/Gklee/blob/master/Gklee/include/cuda/curand_mtgp32.h

Hope this makes sense!

Edit:

So let me explain why the headers have C++ code :)

Check this out: https://github.com/Geof23/Gklee/blob/master/Gklee/include/cuda/curand_kernel.h#L86

#define QUALIFIERS static __forceinline__ __device__

As we both know, NVCC compiler parses out the kernel code and the host code and sends them to different compilers. All the C++ style code you both were talking about had QUALIFIERS prepended it, so they were device code. The NVCC compiler would not include such code when sending it to the host compiler (which could be a C compiler). If you pay attention, the structs don't have QUALIFIERS prepended to them, so they're host code.

In our case, we should use IF/DEFs in order to determine whether a block of code should be compiled or not. You'll see that this practice is already been performed correctly in the HIP API.

The way CUDA's API works is that you should be able to interoperate with all of their APIs (curand, e.g.) entirely through the C language, so it's not a coincidence at all that the states are defined as struct objects.

@Jorghi12 Jorghi12 changed the title rocRAND should have forward declarations that support C name mangling Make the state structs accessible directly from C. Jul 11, 2018
@jszuppe
Copy link
Contributor

jszuppe commented Jul 12, 2018

If you have something like void foo(curandStateXORWOW); in your C API in header file, then you have to include curand_kernel.h in that header, so programs that uses that header knows what it is. That you can't do, because compilation errors. And that's what I meant when I said that only 2 state structs can be used in C. Also, at that point I didn't know how you want to use struct state in C.

So let me explain why the headers have C++ code :)

You can do something like THC_API struct curandStateMtgp32 * THCRandom_generatorStates(struct THCState* state);, because struct curandStateMtgp32 in that expression is at the same time a forward declaration. From perspective of THCRandom_generatorStates's users curandStateMtgp32 is an opaque type. That's why you're right you can make it work in C API, but it has nothing to do with how nvcc works (especially because files that uses your library and C API don't have to be compiled with nvcc). btw. IIRC, nvcc treats C code like C++ code (unless that changed) when it's in .cu file.

Like I said before, it [forward declaration of hiprandStateMtgp32] won't be possible with hipRAND, because we use typedefs and you can't forward declare a typedef (when the struct is not defined). We would need different solution (either replace typedefs with real structs, or different C/C++ magic).

@jszuppe
Copy link
Contributor

jszuppe commented Jul 12, 2018

Anyway, now that I know what you need I'll look into it and try to find for the simplest solution that will enable you to write THC_API struct hiprandStateMtgp32 * THCRandom_generatorStates(struct THCState* state); in C API header. It's possible that for now it will be limited to hiprandStateMtgp32.

@Jorghi12
Copy link
Author

@jszuppe Thanks. Yeah I agree. The usage of typedefs makes forward declaring really difficult here. A good solution would require a spoonful of creativity : )

(And yeah NVCC will compile the host code using a C++ compiler, but since the header only had structs on host side, it would interleave nicely with C. This helps turn the opaque type hiprandStateMtgp32 into a real type.)

jszuppe pushed a commit that referenced this issue Jul 13, 2018
@jszuppe
Copy link
Contributor

jszuppe commented Jul 13, 2018

@Jorghi12 Can you try branch fix_21_and_test?

In test/extra you can see what works (you don't have to compile and run it if you don't want to). Anyway, rand_kernel_lib.cpp is compiled using hcc to a dynamic library, it has C API similar to what you have in pytorch. Then you can do gcc test_rand_kernel_lib.c -Wl,-rpath,../../build/test/extra/ -L../../build/test/extra/ -lrand_kernel_lib -o test_rand_kernel_lib and run ./test_rand_kernel_lib to see that it works.

It also includes fixes for other issues, and that's why now you have to add CXX=hcc before cmake ../. (see https://github.com/ROCmSoftwarePlatform/rocRAND/tree/fix_21_and_test#build-and-install).

@Jorghi12
Copy link
Author

@jszuppe Awesome! I'll give this a go.

@jszuppe
Copy link
Contributor

jszuppe commented Jul 16, 2018

When you confirm it works for you, I'll push this and other changes into develop branch. I hope to do a new release with ROCm 1.8.2.

@iotamudelta
Copy link
Contributor

@jszuppe ROCm 1.8.2 would be a great target but unlikely unless you notify releng here about this. If it works, I'd be in favor since PyTorch depends on it. Thanks!

@iotamudelta
Copy link
Contributor

@jszuppe just talked to releng. If you merge to master and let them know today that it is in, we can make the window.

@jszuppe
Copy link
Contributor

jszuppe commented Jul 16, 2018

@iotamudelta Ah, I didn't know ROCm 1.8.2 will be released so soon (this week?). I'm sure I won't be able to merge this into master today. I need to get confirmation from @Jorghi12 that it works, merge it to develop, run tests on ROCm and CUDA machines etc., it's almost 10pm in Poland.

@iotamudelta
Copy link
Contributor

@jszuppe we are in the release candidates. what is a realistic time line for landing in master from your side? I can check if we can make it work.

@jszuppe
Copy link
Contributor

jszuppe commented Jul 16, 2018

I can do that tomorrow, before 2pm CEST. But most likely without #22. We also have plans to remove some very minor warnings on CUDA environment, that would also be excluded.

@iotamudelta
Copy link
Contributor

#22 is not a blocker from what I can see. and thanks - will forward.

@iotamudelta
Copy link
Contributor

iotamudelta commented Jul 16, 2018

@jszuppe wrote this earlier but it vanished - I've done some limited testing and w/ changes to my PyTorch it compiles w/ this rocRAND branch. I should say that my PyTorch strips the struct specifier and relies on the typedef fix - see #23 . But that alone is worth releasing.

@Jorghi12
Copy link
Author

Jorghi12 commented Jul 17, 2018

@jszuppe The branch works. Feel free to merge to master 👍 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants