Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specify fft ordering #8

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Infinoid
Copy link
Contributor

Allow specifying the order that the FFTs will be performed in. This allows optimization based on which data dimensions will be more restricted than others.

Add command line parameters to bench.py to allow specifying which FFT dimensions should be performed before, and after, the distdl repartition.

For example, consider processing all of the spatial dimensions before the repartition, and process the time dimension after.
To do this, pass one of the following to bench.py:

  • --fft-order-before 2 3 4,
  • or --fft-order-after 5,
  • or both: --fft-order-before 2 3 4 --fft-order-after 5

For 4-dimensional data, these all mean the same thing. (If one side is omitted, it will be calculated as the complement of the other.)

If you don't specify either of these parameters, you get the same default as before: do 4 and 5, then repartition, then do 2 and 3. Since the actual ffts count downward, this default causes the time dimension to be processed first. If the time dimension has more modes per data element, compared to the other dimensions, then this will be inefficient.

The fft dimension settings are added to the benchmark output filenames, so multiple experiments can be run and the results will be kept separate. (These filenames are getting pretty long, though!)

TODO:

  • Make sure it scales properly on a real system
  • Update gen_scripts.py job sizes as needed
  • See if it can handle workload imbalance, where all MPI ranks contribute to the xyz processing but some ranks are unneeded for the time/weight part
  • Remove debug messages

@Infinoid
Copy link
Contributor Author

Infinoid commented Sep 9, 2022

Note, this appears to be slower than the default ordering, it will remain a draft PR until we figure out why.

@Infinoid
Copy link
Contributor Author

Note, this appears to be slower than the default ordering, it will remain a draft PR until we figure out why.

The reason it was slower is because the --partition_shape also needs to be updated when you change the FFT ordering. The input partition shape (P_x) should match the P_m layout within the DFNO code, otherwise the data is repartitioned 2 more times, before the first FFT (R1) and after the last iFFT (R4). When the shapes match, R1 and R4 are noops, and that's what I was doing wrong.

So, using this command as an example:

bench.py --input-shape 1 1 128 128 64 1 --modes 8 8 4 4 --num-timesteps 32 --fft-order-after 5

This is the size and modes we get for spatial scaling to 4 nodes on Perlmutter. The ordering parameter tells it to do x,y,z first, then repartition (R2), and then do time. But for this ordering:

  • --partition_shape 1 1 2 2 1 1 is wrong, P_x and P_m do not match
  • --partition_shape 1 1 1 1 1 4 is correct, now R1 and R4 are noops.

During my earlier testing, I wasn't thinking of R1 and R4 at all, and R2 and R3 are small compared to those. The ordering parameters work, but the input partitioning needs to be adjusted at the same time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant