-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
question about number of parameters #5
Comments
Hi Huchi, Sorry for my late reply, I was too busy in the past several weeks. Please refer to # Log model size
if args.train_subtransformer:
print(f"| SubTransformer size (without embedding weights): {model.get_sampled_params_numel(utils.get_subtransformer_config(args))}")
embed_size = args.decoder_embed_dim_subtransformer * len(task.tgt_dict)
print(f"| Embedding layer size: {embed_size} \n") Thanks! |
Awsome, thanks !
Hanrui Wang <notifications@github.com> 于 2020年11月10日周二 上午5:57写道:
… Hi Huchi,
Sorry for my late reply, I was too busy in the past several weeks.
It is required to extract the SubTransformer weights from the checkpoints
we shared to get the correct model size. The reason Is that we finetuned a
SubTransformer by always sampling that SubTransformer from the
SuperTransformer. So the checkpoint contains all the weights of a
SuperTransformer, but we need to only use the SubTransformer part to do
testing and profiling.
Please refer to train.py line 61 to 64 for how to get SubTransformer
model size:
# Log model sizeif args.train_subtransformer:
print(f"| SubTransformer size (without embedding weights): {model.get_sampled_params_numel(utils.get_subtransformer_config(args))}")
embed_size = args.decoder_embed_dim_subtransformer * len(task.tgt_dict)
print(f"| Embedding layer size: {embed_size} \n")
Thanks!
Hanrui
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#5 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AJXERISK56RZWHGVLTAP53DSPBQUVANCNFSM4SOML3BQ>
.
|
Hi, I trained some models using the pre-defined configurations but
the number of parameters is much larger than what you reported (55.1M vs. 31.5M).
Configuration:
HAT_iwslt14deen_titanxp@168.8ms_bleu@34.8.yml
Here is the code I used to calculate the number of parameters:
(embedding layers are excluded)
The text was updated successfully, but these errors were encountered: