You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
How do you train that far? I'm using the deepspeed example and it terminates after 3k steps with seq_len 256, but at least until then the loss doesn't nan.
Loss returns Nan
Some of my settings
causal=true
blindspot_size=1
n_local_attn_heads
ff_chunks=1
reversible=false
use_axial_pos_emb=false
The text was updated successfully, but these errors were encountered: