truncated normal initializer #38

ruotianluo · 2018-11-19T16:35:08Z

I have a reasonable truncated normal approximation. (Actually that is what tf does).
https://discuss.pytorch.org/t/implementing-truncated-normal-initializer/4778/16?u=ruotianluo

thomwolf · 2018-11-20T09:09:23Z

We could try that. Not sure how important it is though. Did you try it?

thomwolf · 2018-11-26T09:42:42Z

Ok I think we will stick to the normal_initializer for now. Thanks for indicating this option!

add coqa runner as basic mt-coqa runner

* Update trainer and model flows to accommodate sparseml Disable FP16 on QAT start (huggingface#12) * Override LRScheduler when using LRModifiers * Disable FP16 on QAT start * keep wrapped scaler object for training after disabling Using QATMatMul in DistilBERT model class (huggingface#41) Removed double quantization of output of context layer. (huggingface#45) Fix DataParallel validation forward signatures (huggingface#47) * Fix: DataParallel validation forward signatures * Update: generalize forward_fn selection Best model after epoch (huggingface#46) fix sclaer check for non fp16 mode in trainer (huggingface#38) Mobilebert QAT (huggingface#55) * Remove duplicate quantization of vocabulary. enable a QATWrapper for non-parameterized matmuls in BERT self attention (huggingface#9) * Utils and auxillary changes update Zoo stub loading for SparseZoo 1.1 refactor (huggingface#54) add flag to signal NM integration is active (huggingface#32) Add recipe_name to file names * Fix errors introduced in manual cherry-pick upgrade Co-authored-by: Benjamin Fineran <bfineran@users.noreply.github.com>

Nah

thomwolf closed this as completed Nov 26, 2018

maeotaku mentioned this issue May 23, 2019

bert->onnx ->caffe2 weird error #633

Closed

stevezheng23 added a commit to stevezheng23/transformers that referenced this issue Mar 24, 2020

Merge pull request huggingface#38 from stevezheng23/dev/zheng/coqa

67a9836

add coqa runner as basic mt-coqa runner

jameshennessytempus pushed a commit to jameshennessytempus/transformers that referenced this issue Jun 1, 2023

Merge pull request huggingface#38 from jamesthesnake/nah

b7ecfb6

Nah

lwmlyy mentioned this issue Aug 15, 2023

add util for ram efficient loading of model when using fsdp #25107

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

truncated normal initializer #38

truncated normal initializer #38

ruotianluo commented Nov 19, 2018

thomwolf commented Nov 20, 2018 •

edited

thomwolf commented Nov 26, 2018

truncated normal initializer #38

truncated normal initializer #38

Comments

ruotianluo commented Nov 19, 2018

thomwolf commented Nov 20, 2018 • edited

thomwolf commented Nov 26, 2018

thomwolf commented Nov 20, 2018 •

edited