Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Input data is real, not complex #32

Closed
mtazzari opened this issue Jul 6, 2017 · 7 comments
Closed

Input data is real, not complex #32

mtazzari opened this issue Jul 6, 2017 · 7 comments
Milestone

Comments

@mtazzari
Copy link
Owner

mtazzari commented Jul 6, 2017

Now the input image is assumed to be complex: (from libcommon.pyx)
def sample(dcomplex[:,::1] data, dRA, dDec, du, dreal[::1] u, dreal[::1] v)

We assumed this so that the input data already allocated the space needed for FFT operations. However, the image in real space is always real, so at the moment we need to cast the image to complex type before using galario:
sample(ref_complex.astype('complex128'), dRA, dDec, du, u, v)

We could save the casting and transfer time if we copy a real 2d array, rather than complex.
Can we easily implement this? E.g. by:

  • libcommon.pyx: changing to def sample(dreal[:,::1] data, dRA, dDec, du, dreal[::1] u, dreal[::1] v)
  • galario.cpp:
// nx: size of data
dcomplex *data_d;
size_t nbytes = sizeof(dcomplex)*nx*nx;
dreal *data_d_real;
size_t nbytes = sizeof(dreal)*nx*nx;
CCheck(cudaMalloc((void**)&data_d, nbytes));
CCheck(cudaMemcpy(data_d_real, data, nbytes, cudaMemcpyHostToDevice));
kernel real to complex (see stackoverflow question)

kernel unsigned int to float: https://stackoverflow.com/questions/9153861/typecasting-in-cuda-and-cublas

This change would be needed since there is no real sense to have a complex input image.

As a future step: we could switch from using complex to complex to real to complex FFTs, but we did not do that because it changes the mapping and the size of the matrices.

@mtazzari
Copy link
Owner Author

mtazzari commented Jul 6, 2017

@fredRos what do you think? Feasible?

@fredRos
Copy link
Collaborator

fredRos commented Jul 6, 2017

yes, we should do that. I'm profiling the code right now and see a number of things we can improve. This one is high-level and makes perfect sense

@fredRos
Copy link
Collaborator

fredRos commented Jul 7, 2017

It is more work and requires more decisions:

  • should fft2d accept a real image? It could not work in place anymore
  • chi2 and sample don't provide access to the Fourier space image, so here it is easy to do
  • could we fftshift the real image? We could save half of the memory transfer

@mtazzari
Copy link
Owner Author

mtazzari commented Jul 7, 2017

A realistic use-case of galario employs only the sample() and/or the chi2() functions.
The other functions are for those who want to play with galario more in detail (like us) and I think it's fine to keep data as complex for all of them.

For sample() and chi2() functions it is easy to implement the change and I would start with them. In this way, internally nothing should change since we cast from dreal* to dcomplex* at the very beginning, before starting any operation.
For the CUDA version, I would do the casting after dreal* data has been copied to the GPU and then I would feed it to the dcomplex* data_d array that has been initialized with 0 imaginary part.

@mtazzari mtazzari added this to the 1.o milestone Jul 7, 2017
@fredRos
Copy link
Collaborator

fredRos commented Jul 7, 2017

What about shifting the real image? It seems to me like we could do that and only after the shift we'd add the imaginary part, perhaps on the device

@fredRos fredRos mentioned this issue Jul 7, 2017
@fredRos
Copy link
Collaborator

fredRos commented Jul 10, 2017

kernel unsigned int to float: https://stackoverflow.com/questions/9153861/typecasting-in-cuda-and-cublas

I'm following your suggestion now. I tried to do it in place but that would not allow multiple threads to operate concurrently. But then we have to use 50 % extra memory on the GPU to have a real and complex image until the complex image is properly constructed. This may be a problem for users with high-res images and small memory GPUs

Perhaps I can do the profiling to see if it's faster to do the construction on the CPU and then transfer. On all system I have seen so far it is safe to assume that there is more memory on the host available.

@fredRos
Copy link
Collaborator

fredRos commented Jul 10, 2017

fixed by #45

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants