We currently have minimal docs covering training rationale, interesting optimizations, or longer term objectives, nor do we currently link to the resources that already exist (e.g. the padding-free Transformers blog post).
We should collect all the resources we have and any brain-dump-grade information we care to share into some public set.