-
Notifications
You must be signed in to change notification settings - Fork 30.6k
Closed
Description
This is a tracking issue for problems/enhancements needed with FNet.
So far, @stefan-it @ontocord have mentioned that:
-
--fp16
precision throws an error with the model. -
It is not possible to keep variable sequence length during training because of the way DFT matrices are initialized (via a config variable).
- For this @ontocord suggested that we keep different buffers ready. A major drawback to this is that we can't possibly keep dft matrices for all sequence lengths. However, the values of the DFT matrices vary according to the total sequence length expected.
- So, one efficient way of handling this is that we can try to create a generic matrix (not DFT) and then modify it on the fly accordingly based on the sequence length.
- For example, the DFT matrix is defined as:
- We can create a matrix without the multiplier, and then on the fly take the portion of the matrix that is needed for the batch, and then multiply with the correct multiplier. Wdyt @patrickvonplaten @sgugger?
- Or multiply the matrix with
sqrt(N)/sqrt(seq_length)
and takemat[:seq_length, :seq_length]
while multiplying.
- For example, the DFT matrix is defined as:
-
Need to verify if pushing to GPU pushes everything to the device (including buffers).
I will be adding more issues/problems here as and when they arise.
Metadata
Metadata
Assignees
Labels
No labels