question about number of parameters #5

huchinlp · 2020-10-13T08:38:40Z

Hi, I trained some models using the pre-defined configurations but
the number of parameters is much larger than what you reported (55.1M vs. 31.5M).

Configuration:
HAT_iwslt14deen_titanxp@168.8ms_bleu@34.8.yml

Here is the code I used to calculate the number of parameters:
(embedding layers are excluded)

import torch

m = torch.load('checkpoints/iwslt14.de-en/subtransformer/HAT_iwslt14deen_titanxp@\
168.8ms_bleu@34.8/checkpoint_best.pt', map_location='cpu')
m = m['model']

n = 0
for k in m:
    if 'emb' not in k:
        n += m[k].numel()

print(n)

Hanrui-Wang · 2020-11-09T21:57:16Z

Hi Huchi,

Sorry for my late reply, I was too busy in the past several weeks.
It is required to extract the SubTransformer weights from the checkpoints we shared to get the correct model size. The reason Is that we finetuned a SubTransformer by always sampling that SubTransformer from the SuperTransformer. So the checkpoint contains all the weights of a SuperTransformer, but we need to only use the SubTransformer part to do testing and profiling.

Please refer to train.py line 61 to 64 for how to get SubTransformer model size:

# Log model size
if args.train_subtransformer:
    print(f"| SubTransformer size (without embedding weights): {model.get_sampled_params_numel(utils.get_subtransformer_config(args))}")
    embed_size = args.decoder_embed_dim_subtransformer * len(task.tgt_dict)
    print(f"| Embedding layer size: {embed_size} \n")

Thanks!
Hanrui

huchinlp · 2020-11-10T02:08:30Z

Awsome, thanks ! Hanrui Wang <notifications@github.com> 于 2020年11月10日周二上午5:57写道：

…

Hi Huchi, Sorry for my late reply, I was too busy in the past several weeks. It is required to extract the SubTransformer weights from the checkpoints we shared to get the correct model size. The reason Is that we finetuned a SubTransformer by always sampling that SubTransformer from the SuperTransformer. So the checkpoint contains all the weights of a SuperTransformer, but we need to only use the SubTransformer part to do testing and profiling. Please refer to train.py line 61 to 64 for how to get SubTransformer model size: # Log model sizeif args.train_subtransformer: print(f"| SubTransformer size (without embedding weights): {model.get_sampled_params_numel(utils.get_subtransformer_config(args))}") embed_size = args.decoder_embed_dim_subtransformer * len(task.tgt_dict) print(f"| Embedding layer size: {embed_size} \n") Thanks! Hanrui — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#5 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AJXERISK56RZWHGVLTAP53DSPBQUVANCNFSM4SOML3BQ> .

huchinlp closed this as completed Nov 10, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

question about number of parameters #5

question about number of parameters #5

huchinlp commented Oct 13, 2020 •

edited

Hanrui-Wang commented Nov 9, 2020

huchinlp commented Nov 10, 2020 via email

question about number of parameters #5

question about number of parameters #5

Comments

huchinlp commented Oct 13, 2020 • edited

Hanrui-Wang commented Nov 9, 2020

huchinlp commented Nov 10, 2020 via email

huchinlp commented Oct 13, 2020 •

edited