-
Notifications
You must be signed in to change notification settings - Fork 24
WIP: Another try at 2don2d #31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
This hugely simplifies the building process of pfft. Since the new planners are unlikely to be included in next release of FFTW, it makes sense to just ship in a release the patched version of FFTW that is known to work with PFFT. touch fftw3/Changelog
|
Dear Yu,
I will have a look at it as soon as possible. Hopefully, I find some time
at the weekend.
Best regards
Michael
|
|
Thanks! I strong suspect it is because I don't quite know what different 'transposed' flags really mean. While the 2don2d decomposition is probably not useful dealing with 2d data, if the 2d data is from a projection of 3don2d data it can hugely simply downstream applications. |
|
Dear Yu, I did a rebase of your branch on top of PFFT master (I just skipped the FFTW-include for the moment since I have to test it separately). Have a look at the new branch rebase_2don2d. The last commit fixes the order of input and output arrays in the local transforms. This must be different to 3dto2d remap, since we skip one global remap. You also had a copy paste and planned a serial trafo twice. Hope this fixes your issues. I just did some quick tests with weird unequal blocks sizes. Feel free to test it more deliberately. |
|
We still have to check, whether all the flags are supported in the right way, e.g., DESTROY_INPUT, PRESERVE_INPUT and so on. I also think, that we do not have to use 2 local transposes like in the 3dto2d case. I will think about a simplification. n0/p0 x n1/p1 -> n0/(p0 x p1) x n1 should go directly with only one global remap. |
|
Thanks! That k += error was a shame!
I indeed suspected there must be a simpler way, but I am not sufficiently
equipped to work it out..
I'll rebuild and add this to the python binding for some testing about all
parameters in the coming days -- I believe almost all flags are tested by
the script in the python binding.
…On Mon, Jan 29, 2018 at 3:02 PM, Michael Pippig ***@***.***> wrote:
We still have to check, whether all the flags are supported in the right
way, e.g., DESTROY_INPUT, PRESERVE_INPUT and so on. I also think, that we
do not have to use 2 local transposes like in the 3dto2d case. I will think
about a simplification. n0/p0 x n1/p1 -> n0/(p0 x p1) x n1 should go
directly with only one global remap.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#31 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAIbTBjToWKcgzLvzjmW55eSZtnDO8yIks5tPk3ygaJpZM4RjWWO>
.
|
But the result is garbage.
however the round trip is correct. So looks like we are doing some transpose incorrectly but consistently.
But without it the single tranform was wrong anyways.
…it data, not part of the pfft kernel)
The old head breaks pfft-python everywhere with a crash in local_size_remap_nd_transposed
|
I added a minor fix. I can confirm that currently it writes all zeros if |
|
Dear Yu, what is the status of this issue. Did you do some more work on this? Do you need some more help? |
|
Sorry for being away so long. The FPE error is gone for whatever reason. Here is the matrix of fails and passes. It appears whenever the input is not destroyed the output is wrong; we are very close to it. |
|
Seems like we are relying on the side effect on the first argument(in) of 'sertrafo' in (does sertrafo modify the input?) If that's the case then there is no way we can preserve the input values without modifying sertrafo. |
|
The number of failures changes from run to run. Looks like the only 'safe' combination is PDFFT_DESTROY_INPUT and avoid PFFT_PADDED. |
This fixed all cases but the PADDED r2C and likely the shifted transforms. The problem with the padded r2c transforms is that somehow the last axis local_ni is not padded. It should be a trivial fix? I am lost here, because the size is actually computed outside the remap routines by partrafo.c The routines looked generic enough to support any decompositions, including 2d on 2d and 3d on 3d.
|
Some pretty big progress are made. Now the trouble is in the calculation of local_ni of padded r2c / c2r in 2d on 2d mode: |
|
Actually currently 3don3d fails on padded r2c as well. |
It appears that we shall prefer out of place global_transp as much as possible. The FFTW_PRESERVE_INPUT flag is added to signify that the transform must be preserving input; it should have been added automatically. Also expanding the code branches to make it easier to parse what is actually done in different cases.
|
OK. I think this is PR pretty much done. The 2don2d support is now as good as the 3don3d support and covers sufficient number of cases to make it useful. Here is the latest output of the roundtrip script. @mpip do you want to run more extensive test cases before merging this? |
This PR supersedes #30.
Some progress are made but I am currently stuck. @mpip Could you take a look at this?
The idea is to transpose n0 / p0 x n1 / p1 to n0 (p0 * p1) x n1. I
followed the 3dto2d example to do three steps:
It sounds easy enough, but currently the implementation is buggy, and I cannot locate the problem.
The main file that implements this is in remap_2dto1d.c. I added a simple/ugly interface in remap.c to dispatch to remap_3dto2d or remap_2dto1d depending on rnk_n. This can be improved later once we get the code working correctly.
I played with the tests simple_test_c2c_2don2d.c:
I initially suspected it was the new array interface; so I did some name clean up to clarify the new array logic. Now I think it is unlikely related.
I checked the 3dto2d appear to be consistent when I change the number of ranks and use a variety of combinations. So it is likely correct (I haven't compared with a single rank transform).
PS: I was working off my branch, where the first few commits bundles fftw; these can be removed later -- it is easier to work with on my workstation where there is not mpi enabled pfft system wide.