Issues with FNet

This is a tracking issue for problems/enhancements needed with FNet.

So far, @stefan-it @ontocord have mentioned that:
* [ ] `--fp16` precision throws an error with the model.
* [ ] It is not possible to keep variable sequence length during training because of the way DFT matrices are initialized (via a config variable).
    - For this @ontocord suggested that we keep different buffers ready. A major drawback to this is that we can't possibly keep dft matrices for all sequence lengths. However, the values of the DFT matrices vary according to the total sequence length expected. 
    - So, one efficient way of handling this is that we can try to create a generic matrix (not DFT) and then modify it on the fly accordingly based on the sequence length.
        - For example, the DFT matrix is defined as:
        ![image](https://user-images.githubusercontent.com/29076344/135800441-2d5f69e8-9231-4ccc-a971-52f60c4059e5.png)
            - We can create a matrix without the multiplier, and then on the fly take the portion of the matrix that is needed for the batch, and then multiply with the correct multiplier. Wdyt @patrickvonplaten @sgugger?
            - Or multiply the matrix with `sqrt(N)/sqrt(seq_length)` and take `mat[:seq_length, :seq_length]` while multiplying.
 
* [ ] Need to verify if pushing to GPU pushes everything to the device (including buffers).

I will be adding more issues/problems here as and when they arise.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Issues with FNet #13853

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issues with FNet #13853

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions