Specify fft ordering #8

Infinoid · 2022-08-22T17:02:37Z

Allow specifying the order that the FFTs will be performed in. This allows optimization based on which data dimensions will be more restricted than others.

Add command line parameters to bench.py to allow specifying which FFT dimensions should be performed before, and after, the distdl repartition.

For example, consider processing all of the spatial dimensions before the repartition, and process the time dimension after.
To do this, pass one of the following to bench.py:

--fft-order-before 2 3 4,
or --fft-order-after 5,
or both: --fft-order-before 2 3 4 --fft-order-after 5

For 4-dimensional data, these all mean the same thing. (If one side is omitted, it will be calculated as the complement of the other.)

If you don't specify either of these parameters, you get the same default as before: do 4 and 5, then repartition, then do 2 and 3. Since the actual ffts count downward, this default causes the time dimension to be processed first. If the time dimension has more modes per data element, compared to the other dimensions, then this will be inefficient.

The fft dimension settings are added to the benchmark output filenames, so multiple experiments can be run and the results will be kept separate. (These filenames are getting pretty long, though!)

TODO:

Make sure it scales properly on a real system
Update gen_scripts.py job sizes as needed
See if it can handle workload imbalance, where all MPI ranks contribute to the xyz processing but some ranks are unneeded for the time/weight part
Remove debug messages

Infinoid · 2022-09-09T12:23:00Z

Note, this appears to be slower than the default ordering, it will remain a draft PR until we figure out why.

Infinoid · 2022-09-20T12:46:51Z

Note, this appears to be slower than the default ordering, it will remain a draft PR until we figure out why.

The reason it was slower is because the --partition_shape also needs to be updated when you change the FFT ordering. The input partition shape (P_x) should match the P_m layout within the DFNO code, otherwise the data is repartitioned 2 more times, before the first FFT (R1) and after the last iFFT (R4). When the shapes match, R1 and R4 are noops, and that's what I was doing wrong.

So, using this command as an example:

bench.py --input-shape 1 1 128 128 64 1 --modes 8 8 4 4 --num-timesteps 32 --fft-order-after 5

This is the size and modes we get for spatial scaling to 4 nodes on Perlmutter. The ordering parameter tells it to do x,y,z first, then repartition (R2), and then do time. But for this ordering:

--partition_shape 1 1 2 2 1 1 is wrong, P_x and P_m do not match
--partition_shape 1 1 1 1 1 4 is correct, now R1 and R4 are noops.

During my earlier testing, I wasn't thinking of R1 and R4 at all, and R2 and R3 are small compared to those. The ordering parameters work, but the input partitioning needs to be adjusted at the same time.

Specify fft ordering

bc47964

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Specify fft ordering #8

Specify fft ordering #8

Infinoid commented Aug 22, 2022

Infinoid commented Sep 9, 2022

Infinoid commented Sep 20, 2022

Specify fft ordering #8

Are you sure you want to change the base?

Specify fft ordering #8

Conversation

Infinoid commented Aug 22, 2022

Infinoid commented Sep 9, 2022

Infinoid commented Sep 20, 2022