Skip to content

Issues with FNet #13853

@gchhablani

Description

@gchhablani

This is a tracking issue for problems/enhancements needed with FNet.

So far, @stefan-it @ontocord have mentioned that:

  • --fp16 precision throws an error with the model.

  • It is not possible to keep variable sequence length during training because of the way DFT matrices are initialized (via a config variable).

    • For this @ontocord suggested that we keep different buffers ready. A major drawback to this is that we can't possibly keep dft matrices for all sequence lengths. However, the values of the DFT matrices vary according to the total sequence length expected.
    • So, one efficient way of handling this is that we can try to create a generic matrix (not DFT) and then modify it on the fly accordingly based on the sequence length.
      • For example, the DFT matrix is defined as:
        image
        • We can create a matrix without the multiplier, and then on the fly take the portion of the matrix that is needed for the batch, and then multiply with the correct multiplier. Wdyt @patrickvonplaten @sgugger?
        • Or multiply the matrix with sqrt(N)/sqrt(seq_length) and take mat[:seq_length, :seq_length] while multiplying.
  • Need to verify if pushing to GPU pushes everything to the device (including buffers).

I will be adding more issues/problems here as and when they arise.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions