# Ongoing work


### Signal processing - Fast Fourier Transforms (FFT)

You might have noticed that Heat 1.4.0 hasn't actually been released yet. Here we want to show you a new feature that is available in our custom kernel, but not in the latest stable release v1.3.1. 

New in v1.4.0 is the `ht.fft` module. This is a crucial feature for many communities, in particular we are looking to support the radiointerferometry community in their efforts to scale existing processing pipelines to larger and larger datasets. 

Let's try 2-dimensional FFTs on a large data cube. If the transform axes do not include the split axis of the DNDarray, then the operation exploits entirely `torch.fft` operations and does not require communication.

In [None]:
%%px 

a = ht.random.randn(4000, 400, 400, split=0, device="gpu")
fft_2d = ht.fft.fft2(a)

The `ht.fft` module includes all the functionality that users expect of `numpy.fft`, powered by `torch.fft` and, when necessary, Heat infrastructure for communication. Researchers can perform Fourier transforms on data cubes that are too large to be transformed on a single node. 

Fast FFT-based 2D convolution will also be available in the future, to add to Heat's signal processing capabilities.

Other important updates that will be merged soon:
- Batch-parallel k-means and k-medians
- Fully distributed item selection and assignment
- Optimized QR decomposition

Coming up (hopefully) in v1.5.0:
- Optimized Dataloader for distributed deep learning 

### Known issues

- Slab parallelism only (data decomposition along 1 dimension) is limiting for many algorithms
- some low-level operations are slow and need refactoring 
- full exploitation of PyTorch 2 capabilities is still pending

### Dos and Don'ts

In this section we would like to address a few best practices for programming with Heat. While we can obviously not cover all issues, these are major pointers as how to get reasonable performance.

**Dos**

* Split up large data amounts
    * often you input data set along the 'observations/samples' dimension
    * large intermediate matrices
* Use the Heat API
    * computational kernels are optimized
    * Python constructs (e.g. loops) tend to be slow
* Potentially have a copy of certain data with different splits

**Dont's**

* Avoid extensive data copying, e.g.
    * operations with operands of different splits (except None)
    * reshape() that actually change the array dimensions (adding extra dimensions with size 1 is fine)
* Execute everything on GPU
    * computation-intensive operations are usually a good fit
    * operations extensively accessing memory only (e.g. sorting) are not