Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong results for inverse transformation for 2D batches #273

Closed
antholzer opened this issue Nov 27, 2019 · 9 comments
Closed

Wrong results for inverse transformation for 2D batches #273

antholzer opened this issue Nov 27, 2019 · 9 comments
Assignees

Comments

@antholzer
Copy link

What is the expected behavior

  • Inverse should be original times a constant.

What actually happens

If I transform a 4x4x4 array, where the transformation should be applied to the 2. and 3. dimension, the forward transformation is correct but the inverse is not.
I have the following parameters for rocfft_plan_description:
strides = {4, 16} and distance = 1. Are these correct or am I missing something?

However, applying the transformations to first and second dimension (with strides={1,4} and distance=16) works.

How to reproduce

  • this gist gives Maximum error: 9.13869 but should be close to 0.

Environment

Hardware description
GPU gfx803
CPU Threadripper 1950X
Software version
ROCK v2.10
ROCR v1.1
HCC v2.10
Library v0.9.8

Used the rocm docker image with kernel v5.3.12.

@antholzer antholzer changed the title Problem with inverse transformation for 2D batches Wrong results for inverse transformation for 2D batches Dec 3, 2019
@antholzer
Copy link
Author

The example with hipPlanMany works i.e. if one calculates the inverse with the same plan settings, the error is small. Changing the sample as follows results in a wrong result.

--- clients/samples/hipfft/hipfft_planmany_2d_z2z.cpp	2019-12-03 17:33:03.676271591 +0100
+++ clients/samples/hipfft/hipfft_planmany_2d_z2z.cpp	2019-12-03 17:32:05.728933650 +0100
@@ -30,21 +30,21 @@
         << "hipfft 2D double-precision complex-to-complex transform using advanced interface\n";
 
     int rank    = 2;
-    int n[2]    = {4, 5};
-    int howmany = 3;
+    int n[2]    = {4, 4};
+    int howmany = 4;
 
     // array is contiguous in memory
-    int istride = 1;
+    int istride = 4;
     // in-place transforms require istride=ostride
     int ostride = istride;
 
     // we choose to have no padding around our data:
-    int inembed[2] = {istride * n[0], istride * n[1]};
+    int inembed[2] = {n[0], n[1]};
     // in-place transforms require inembed=oneembed:
     int onembed[2] = {inembed[0], inembed[1]};
 
-    int idist = inembed[0] * inembed[1];
-    int odist = onembed[0] * onembed[1];
+    int idist = 1;
+    int odist = 1;
 
     std::cout << "n: " << n[0] << " " << n[1] << "\n"
               << "howmany: " << howmany << "\n"
@@ -54,7 +54,8 @@
               << "idist: " << idist << "\todist: " << odist << "\n"
               << std::endl;
 
-    std::vector<std::complex<double>> data(howmany * idist);
+    std::vector<std::complex<double>> data(4*4*4);
+    std::vector<std::complex<double>> cdata(4*4*4);
     const auto total_bytes = data.size() * sizeof(decltype(data)::value_type);
 
     std::cout << "input:\n";
@@ -96,9 +97,11 @@
     hipMemcpy(d_in_out, (void*)data.data(), total_bytes, hipMemcpyHostToDevice);
 
     result = hipfftExecZ2Z(hipPlan, d_in_out, d_in_out, HIPFFT_FORWARD);
+    result = hipfftExecZ2Z(hipPlan, d_in_out, d_in_out, HIPFFT_BACKWARD);
 
-    hipMemcpy((void*)data.data(), d_in_out, total_bytes, hipMemcpyDeviceToHost);
+    hipMemcpy((void*)cdata.data(), d_in_out, total_bytes, hipMemcpyDeviceToHost);
 
+    const double s = 1.0/(n[0]*n[1]);
     std::cout << "output:\n";
     for(int ibatch = 0; ibatch < howmany; ++ibatch)
     {
@@ -108,7 +111,7 @@
             for(int j = 0; j < onembed[1]; j++)
             {
                 const auto pos = ibatch * odist + i * onembed[1] + j;
-                std::cout << data[pos] << " ";
+                std::cout << s*cdata[pos] << " ";
             }
             std::cout << "\n";
         }
@@ -116,5 +119,16 @@
     }
     std::cout << std::endl;
 
+    double err = 0.0;
+    for (int i = 0; i < data.size(); i++)
+    {
+        double ierr = std::abs(s*cdata[i] - data[i]);
+        if (ierr > err)
+        {
+            err = ierr;
+        }
+    }
+    std::cout << "max error: " << err << std::endl;
+
     hipFree(d_in_out);
 }

@malcolmroberts malcolmroberts self-assigned this Dec 3, 2019
@malcolmroberts
Copy link
Contributor

It looks like you're changing a lot of the parameters; some combinations don't make sense, and can't produce correct results. Could you write down the input and output strides and distances along with your problem size, please?

@antholzer
Copy link
Author

Sure.
problem size: 4x4x4
strides: 4, 16
distance: 1

I am only trying complex transforms and thus input and outputs strides are equal.

@malcolmroberts
Copy link
Contributor

Is the transform in-place or out-of-place? Just want to make sure it's not a duplicate of #270

@antholzer
Copy link
Author

The transformation is in-place and the forward transform seems to be giving correct results.

@malcolmroberts
Copy link
Contributor

So, one thing that might be causing confusion is that hipFFT is row-major ("C-style) and rocFFT is column-major ("FORTRAN-style"). So a hipFFT transform has lengths {nx, ny}, and the equivalent rocFFT transform has lengths {ny, nx}. The strides also have to be reversed.

This may explain why hipFFT is working but rocFFT isn't.

@antholzer
Copy link
Author

Sorry, if my 2nd comment was confusing, but both are not working.
But you are right that the results are different between roc- and hipFFT. However, the inverse should be the input times a scaling factor.

To clarify:
rocFFT is working for:
strides: 1 4
distance: 16

For
strides: 4, 16
distance: 1
I get the right result for the forward transform (I tested it using FFTW in julia, which is column major), but the inverse is not equal to the original (times a constant).

@malcolmroberts
Copy link
Contributor

Thanks for the info; I am making some time to look into this issue.

This may be solved by a commit that is under review. Could you try

https://github.com/malcolmroberts/rocFFT/tree/fix_transpose_for_1D_stride

and see if this solves your problem?

@malcolmroberts
Copy link
Contributor

Closing due to inactivity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants