Plans to support model trained in fairseq #11

kalyangvs · 2019-10-10T13:23:32Z

Can you please support a model trained in fairseq, else since it is torch can it be imported to infer and quantized.

Also the model sizes are of transformer_big? Since if it is transformer _base it would be around half of the score.
Please consider distilling the model into smaller model that would help for inference and size.

guillaumekln · 2019-10-10T13:58:24Z

Assuming fairseq trains compatible Transformer models (same computation graph), you can write a converter to extract the weights.

Also the model sizes are of transformer_big?

Which model are you referring to? The benchmarks numbers come from a Transformer Base.

Please consider distilling the model into smaller model that would help for inference and size.

Users can simply decide to apply distillation before using this project.

aj7tesh · 2021-03-08T06:39:13Z

File "/home/user/.local/lib/python3.6/site-packages/ctranslate2/converters/opennmt_py.py", line 25, in _load
variables["generator.weight"] = checkpoint["generator"]["0.weight"]
KeyError: 'generator

I am trying to convert fairseq based transformer model

guillaumekln · 2021-03-08T08:35:35Z

fairseq models are not supported, see the README:

https://github.com/OpenNMT/CTranslate2#converting-models

pttzty · 2021-04-23T20:20:05Z

I wonder if there are plans of supporting of conversion to ctranslate2 from Fairseq transformer models. It is a widely used training framework and supports some new pertained models like BART as well. @guillaumekln

guillaumekln · 2021-04-26T07:15:36Z

We will probably get to that. Do you want to help? We should first determine the architecture differences with the current Transformer implementation (if any) and then add a converter for these checkpoints.

pttzty · 2021-04-29T00:13:40Z

@guillaumekln I took a quick look into fairSeq weights names and the ones in opennmt-py. I noticed that opennmt-py models have "encoder.layer_norm.a_2", "encoder.layer_norm.b_2" which I don't think I have a corresponding mapping in fairseq models. In Fairseq, there are only (use encoder as examples)
'encoder.layers.{layer_number}.self_attn_layer_norm.weight',
'encoder.layers.{layer_number}.self_attn_layer_norm.bias',
'encoder.layers.{layer_number}.final_layer_norm.weight',
'encoder.layers.{layer_number}.final_layer_norm.bias'

kalyangvs · 2021-05-29T12:37:10Z

Hi,

There are architectural/implementation level changes in fairseq's transformer.
I have a basic doubt in the converter script, if I export the weights to the bin format, how is the layers ordering maintained, they are still present in ctranslate2 core.
Is dropout needed during inference? Is it present in ctranslate2?

Differences example:
a) Having a layer norm is optional in fairseq for the entire encoder or decoder. (src -> embeddings(src) -> layer(out) -> layer_norm(out) (Optional))
b) Each encoder layer in fairseq ----> self_attn -> Residual self_attn_(op+ip) -> ffn(x) -> Residual x+ffn_inputs -> layer norm (Default implementation)
Each encoder layer in OpenNMT-py ----> norm -> self_attn -> Residual context+inputs -> ffn(out)
Each decoder layer in OpennMT-py ----> norm1 -> self_attn -> self_op+inputs -> norm2 -> context_attn -> context_op+self_op -> ffn(out)

@guillaumekln How to start writing a converter when there are many changes as these?

guillaumekln · 2021-05-29T14:11:48Z

if I export the weights to the bin format, how is the layers ordering maintained, they are still present in ctranslate2 core.

Each weight has a unique name. That's how the order is preserved between the bin format and the core implementation.

Is dropout needed during inference?

Dropout is not used during difference.

How to start writing a converter when there are many changes as these?

Here you are comparing Pre-Norm Transformers (OpenNMT default) and Post-Norm Transformers (fairseq default).

The converter and core implementation supports both variants since this commit: 316ed26

Additionally, the V2 branch slightly improves the converter design to make it easier to implement new converters. We hope to release this version in the coming days or weeks.

guillaumekln · 2021-06-03T13:22:31Z

Let's reopen this issue for visibility. I'm currently trying to add a converter and see if we are missing anything else.

kalyangvs · 2021-06-03T13:26:30Z

Hi @guillaumekln , I have used this converter with the pretrained model available here from the page , but the generations using extracted weights is not producing correct translations. I have verified the variable names and number and everything seems correct. Am i missing something?

I am using recently released v2.0.0. for this change

guillaumekln · 2021-06-03T14:45:00Z

I found that in Fairseq the first token forwarded into the decoder is </s> instead of <s>. This requires a small change in the core implementation.

I implemented a converter using the Fairseq API to support multiple checkpoint versions (see the PR above). I translated from their WMT19 model and it seems to work but more testing would be required.

guillaumekln · 2021-06-03T16:31:39Z

@gvskalyan Can you help testing? You can grab the latest wheels from the CI: download the artifact python-wheels, unzip, and pip install the appropriate wheel. Thanks!

pttzty · 2021-06-03T19:13:49Z

@guillaumekln I will be able to help to test a case for myself as well

kalyangvs · 2021-06-07T07:44:19Z

Hi @guillaumekln , the translation with the latest wheel from CI seems proper, I tested it on fairseq wmt19 models and observed the following, are there any other validation tests to be done?
Is the drop/variation in BLEU score expected, can this be attributed to the architecture of these models (transformer-big)?
The test sets are of WMT14 and used SacreBLEU

Direction	Checkpoint Name	BLEU - Fairseq	BLEU - CTranslate2	BLEU - CTranslate2 (edit1)	Checkpoint Md5sum - Fairseq	Checkpoint Md5sum - CTranslate2
en-de	wmt19.en-de.joined-dict.ensemble/model1	35.73	35.56	35.36	f19aa2206748c59a0c5719386bc9a967	6eed02afd5be4eb949ef43db389a6806
de-en	wmt19.de-en.joined-dict.ensemble/model1	38.18	37.75	37.91	2f4e81d6cfd3385d37883a85010f88f4	3dcf2ab1f1685e2f2794664e3fdf69e6
en-ru	wmt19.en-ru.ensemble/model1	43.14	42.46	42.51	11f261a927258abf35cc466cf3ca508e	24f3aac0dca6e65606d90d5377c34d76
ru-en	wmt19.ru-en.ensemble/model1	41.74	41.06	41.35	59a52df2f09c8ddb01d4c9f5692358ee	afd846d3956bb2a43a6cd1f221f8fc3c

Update - edit 1 : scores with --normalize_scores --disable_early_exit

guillaumekln · 2021-06-07T09:54:53Z

Thanks for testing!

I checked further and found the output is the same with greedy search (beam_size=1) but not with beam search. There are indeed small differences in the beam search implementation, for example:

fairseq is sorting hypotheses by the token-level score and not the sentence-level score
fairseq is decoding until beam_size hypotheses are finished while CTranslate2 has a condition to exit earlier.

We can easily add new translation options to match the fairseq behavior, but we will probably not change the default behavior.

Since the model conversion is working fine, I will merge the PR and add these options in separate PRs. I will summarize here the options you can enable to close the gap with fairseq.

guillaumekln · 2021-06-08T13:15:49Z

With the latest commit on master, you can change the following options to get almost the same output as fairseq:

with API: normalize_scores=True, allow_early_exit=False
with CLI: --normalize_scores --disable_early_exit

(Note that disabling early exit slightly reduces the translation speed.)

The few remaining differences are mostly due to small variations in the beam search implementation.

guillaumekln · 2022-02-07T09:16:53Z

Hi @gvskalyan, I see just now your updated table in #11 (comment). We don't expect this difference. Did you set the same beam size in this comparison?

guillaumekln · 2022-09-08T12:57:28Z

The few remaining differences are mostly due to small variations in the beam search implementation.

For reference, #911 should fix the remaining differences.

guillaumekln added the help wanted Extra attention is needed label Oct 10, 2019

guillaumekln closed this as completed Sep 24, 2020

guillaumekln reopened this Jun 3, 2021

guillaumekln mentioned this issue Jun 3, 2021

Add a converter for Fairseq Transformers #480

Merged

guillaumekln closed this as completed in #480 Jun 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Plans to support model trained in fairseq #11

Plans to support model trained in fairseq #11

kalyangvs commented Oct 10, 2019

guillaumekln commented Oct 10, 2019

aj7tesh commented Mar 8, 2021 •

edited

guillaumekln commented Mar 8, 2021

pttzty commented Apr 23, 2021 •

edited

guillaumekln commented Apr 26, 2021

pttzty commented Apr 29, 2021

kalyangvs commented May 29, 2021

guillaumekln commented May 29, 2021 •

edited

guillaumekln commented Jun 3, 2021

kalyangvs commented Jun 3, 2021

guillaumekln commented Jun 3, 2021

guillaumekln commented Jun 3, 2021 •

edited

pttzty commented Jun 3, 2021

kalyangvs commented Jun 7, 2021 •

edited

guillaumekln commented Jun 7, 2021

guillaumekln commented Jun 8, 2021 •

edited

guillaumekln commented Feb 7, 2022

guillaumekln commented Sep 8, 2022

Plans to support model trained in fairseq #11

Plans to support model trained in fairseq #11

Comments

kalyangvs commented Oct 10, 2019

guillaumekln commented Oct 10, 2019

aj7tesh commented Mar 8, 2021 • edited

guillaumekln commented Mar 8, 2021

pttzty commented Apr 23, 2021 • edited

guillaumekln commented Apr 26, 2021

pttzty commented Apr 29, 2021

kalyangvs commented May 29, 2021

guillaumekln commented May 29, 2021 • edited

guillaumekln commented Jun 3, 2021

kalyangvs commented Jun 3, 2021

guillaumekln commented Jun 3, 2021

guillaumekln commented Jun 3, 2021 • edited

pttzty commented Jun 3, 2021

kalyangvs commented Jun 7, 2021 • edited

guillaumekln commented Jun 7, 2021

guillaumekln commented Jun 8, 2021 • edited

guillaumekln commented Feb 7, 2022

guillaumekln commented Sep 8, 2022

aj7tesh commented Mar 8, 2021 •

edited

pttzty commented Apr 23, 2021 •

edited

guillaumekln commented May 29, 2021 •

edited

guillaumekln commented Jun 3, 2021 •

edited

kalyangvs commented Jun 7, 2021 •

edited

guillaumekln commented Jun 8, 2021 •

edited