Attempted replication of GateLoop Transformer (https://arxiv.org/abs/2311.01927) in JAX.
I believe this is the most faithful reproduction of gateloop's time mixer to date. Information on architecture details was obtained from discussion here and here