Skip to content
This repository has been archived by the owner on Sep 4, 2023. It is now read-only.

We did not find an alpha in the model named: F0::Wemb_QuantMultA. when translating from pt to de on outbound translations. #58

Closed
andrenatal opened this issue Jan 27, 2022 · 9 comments · Fixed by #55
Assignees
Labels
bug Something isn't working outbound-translation Issues and Requests for outbound translation
Milestone

Comments

@andrenatal
Copy link
Contributor

andrenatal commented Jan 27, 2022

I'm getting the crash below [2] in the model/engine when translating from pt to de using outbound translations (although this seems to be a generalized issue). It's not happening from pt to the other languages but it might be happening with different combinations. There's a screen record of the error here:

STR:
1 - Download Nightly pt-br
2 - Load the extension from master
3 - Navigate to: http://andrenatal.github.io/translations-playground
4 - Choose de
5 - Click on Traduzir
6 - Click on Enable translation of form Yes
7 - Click on the textarea
8 - Type something in the textarea withing the outbound translations widget

[1] https://www.dropbox.com/s/7vw5vhykcnlnwjv/ptde_crash.mov?dl=1

[2]

Using fallback gemm implementation bergamot-translator-worker.js:6245:17
Wasm Runtime initialized Successfully (preRun -> onRuntimeInitialized) in 0.011 secs translationWorker.js:67:29
Creating Translation Service with config: [object Object] translationWorker.js:230:25
Translation Service created successfully translationWorker.js:232:25
Constructing model 'dept' via pivoting: 'deen' and 'enpt' translationWorker.js:252:25
Total Download time for all files of 'enpt': 0.185 secs translationWorker.js:321:21
Constructing Aligned memory. Size: 17140836 bytes, Alignment: 256 translationWorker.js:481:21
Aligned memory construction done translationWorker.js:483:21
Aligned memory initialized translationWorker.js:486:21
Constructing Aligned memory. Size: 4472528 bytes, Alignment: 64 translationWorker.js:481:21
Aligned memory construction done translationWorker.js:483:21
Aligned memory initialized translationWorker.js:486:21
Constructing Aligned memory. Size: 812781 bytes, Alignment: 64 translationWorker.js:481:21
Aligned memory construction done translationWorker.js:483:21
Aligned memory initialized translationWorker.js:486:21
Aligned vocab memory1 size: 812781 translationWorker.js:343:23
Aligned model memory size: 17140836 translationWorker.js:345:21
Aligned shortlist memory size: 4472528 translationWorker.js:346:21
Translation Model config: 
            beam-size: 1
            normalize: 1.0
            word-penalty: 0
            max-length-break: 128
            mini-batch-words: 1024
            workspace: 128
            max-length-factor: 2.0
            skip-cost: true
            cpu-threads: 0
            quiet: true
            quiet-translation: true
            gemm-precision: int8shiftAlphaAll
            translationWorker.js:347:21
[2022-01-27 13:15:45] [data] Loading SentencePiece vocabulary from buffer bergamot-translator-worker.js:1217:12
[2022-01-27 13:15:46] Missing list of protected prefixes for sentence splitting. Set with --ssplit-prefix-file. bergamot-translator-worker.js:1217:12
[2022-01-27 13:15:46] [memory] Extending reserved space to 128 MB (device cpu0) bergamot-translator-worker.js:1217:12
[2022-01-27 13:15:46] Loaded model config bergamot-translator-worker.js:1217:12
[2022-01-27 13:15:46] Loading scorer of type transformer as feature F0 bergamot-translator-worker.js:1217:12
[2022-01-27 13:15:46] Memory mapping model at 0x682a00 bergamot-translator-worker.js:1217:12
[2022-01-27 13:15:46] [memory] Reserving 31 MB, device cpu0 bergamot-translator-worker.js:1217:12
[2022-01-27 13:15:46] [memory] Reserving 8 MB, device cpu0 bergamot-translator-worker.js:1217:12
Total Download time for all files of 'deen': 0.803 secs translationWorker.js:321:21
Constructing Aligned memory. Size: 17140837 bytes, Alignment: 256 translationWorker.js:481:21
Aligned memory construction done translationWorker.js:483:21
Aligned memory initialized translationWorker.js:486:21
Constructing Aligned memory. Size: 5047568 bytes, Alignment: 64 translationWorker.js:481:21
Aligned memory construction done translationWorker.js:483:21
Aligned memory initialized translationWorker.js:486:21
Constructing Aligned memory. Size: 784269 bytes, Alignment: 64 translationWorker.js:481:21
Aligned memory construction done translationWorker.js:483:21
Aligned memory initialized translationWorker.js:486:21
Aligned vocab memory1 size: 784269 translationWorker.js:343:23
Aligned model memory size: 17140837 translationWorker.js:345:21
Aligned shortlist memory size: 5047568 translationWorker.js:346:21
Translation Model config: 
            beam-size: 1
            normalize: 1.0
            word-penalty: 0
            max-length-break: 128
            mini-batch-words: 1024
            workspace: 128
            max-length-factor: 2.0
            skip-cost: true
            cpu-threads: 0
            quiet: true
            quiet-translation: true
            gemm-precision: int8shiftAlphaAll
            translationWorker.js:347:21
[2022-01-27 13:15:46] [data] Loading SentencePiece vocabulary from buffer bergamot-translator-worker.js:1217:12
[2022-01-27 13:15:46] Missing list of protected prefixes for sentence splitting. Set with --ssplit-prefix-file. bergamot-translator-worker.js:1217:12
[2022-01-27 13:15:46] [memory] Extending reserved space to 128 MB (device cpu0) bergamot-translator-worker.js:1217:12
[2022-01-27 13:15:46] Loaded model config bergamot-translator-worker.js:1217:12
[2022-01-27 13:15:46] Loading scorer of type transformer as feature F0 bergamot-translator-worker.js:1217:12
[2022-01-27 13:15:46] Memory mapping model at 0xa89bf00 bergamot-translator-worker.js:1217:12
[2022-01-27 13:15:47] [memory] Reserving 31 MB, device cpu0 bergamot-translator-worker.js:1217:12
[2022-01-27 13:15:47] [memory] Reserving 8 MB, device cpu0 bergamot-translator-worker.js:1217:12
Model 'dept' successfully constructed. Time taken: 1.273 secs translationWorker.js:201:23
loadLanguageModel function complete translationWorker.js:223:21
Constructing model 'ptde' via pivoting: 'pten' and 'ende' translationWorker.js:252:25
Total Download time for all files of 'ende': 0.144 secs translationWorker.js:321:21
Constructing Aligned memory. Size: 17140498 bytes, Alignment: 256 translationWorker.js:481:21
Aligned memory construction done translationWorker.js:483:21
Aligned memory initialized translationWorker.js:486:21
Constructing Aligned memory. Size: 3062492 bytes, Alignment: 64 translationWorker.js:481:21
Aligned memory construction done translationWorker.js:483:21
Aligned memory initialized translationWorker.js:486:21
Constructing Aligned memory. Size: 797501 bytes, Alignment: 64 translationWorker.js:481:21
Aligned memory construction done translationWorker.js:483:21
Aligned memory initialized translationWorker.js:486:21
Aligned vocab memory1 size: 797501 translationWorker.js:343:23
Aligned model memory size: 17140498 translationWorker.js:345:21
Aligned shortlist memory size: 3062492 translationWorker.js:346:21
Translation Model config: 
            beam-size: 1
            normalize: 1.0
            word-penalty: 0
            max-length-break: 128
            mini-batch-words: 1024
            workspace: 128
            max-length-factor: 2.0
            skip-cost: true
            cpu-threads: 0
            quiet: true
            quiet-translation: true
            gemm-precision: int8shiftAlphaAll
            translationWorker.js:347:21
[2022-01-27 13:15:47] [data] Loading SentencePiece vocabulary from buffer bergamot-translator-worker.js:1217:12
[2022-01-27 13:15:47] Missing list of protected prefixes for sentence splitting. Set with --ssplit-prefix-file. bergamot-translator-worker.js:1217:12
[2022-01-27 13:15:47] [memory] Extending reserved space to 128 MB (device cpu0) bergamot-translator-worker.js:1217:12
[2022-01-27 13:15:48] Loaded model config bergamot-translator-worker.js:1217:12
[2022-01-27 13:15:48] Loading scorer of type transformer as feature F0 bergamot-translator-worker.js:1217:12
[2022-01-27 13:15:48] Memory mapping model at 0xd8f0f00 bergamot-translator-worker.js:1217:12
[2022-01-27 13:15:48] [memory] Reserving 31 MB, device cpu0 bergamot-translator-worker.js:1217:12
[2022-01-27 13:15:48] [memory] Reserving 8 MB, device cpu0 bergamot-translator-worker.js:1217:12
Total Download time for all files of 'pten': 0.454 secs translationWorker.js:321:21
Constructing Aligned memory. Size: 17140836 bytes, Alignment: 256 translationWorker.js:481:21
Aligned memory construction done translationWorker.js:483:21
Aligned memory initialized translationWorker.js:486:21
Constructing Aligned memory. Size: 5001420 bytes, Alignment: 64 translationWorker.js:481:21
Aligned memory construction done translationWorker.js:483:21
Aligned memory initialized translationWorker.js:486:21
Constructing Aligned memory. Size: 812889 bytes, Alignment: 64 translationWorker.js:481:21
Aligned memory construction done translationWorker.js:483:21
Aligned memory initialized translationWorker.js:486:21
Aligned vocab memory1 size: 812889 translationWorker.js:343:23
Aligned model memory size: 17140836 translationWorker.js:345:21
Aligned shortlist memory size: 5001420 translationWorker.js:346:21
Translation Model config: 
            beam-size: 1
            normalize: 1.0
            word-penalty: 0
            max-length-break: 128
            mini-batch-words: 1024
            workspace: 128
            max-length-factor: 2.0
            skip-cost: true
            cpu-threads: 0
            quiet: true
            quiet-translation: true
            gemm-precision: int8shiftAlphaAll
            translationWorker.js:347:21
[2022-01-27 13:15:48] [data] Loading SentencePiece vocabulary from buffer bergamot-translator-worker.js:1217:12
[2022-01-27 13:15:48] Missing list of protected prefixes for sentence splitting. Set with --ssplit-prefix-file. bergamot-translator-worker.js:1217:12
[2022-01-27 13:15:48] [memory] Extending reserved space to 128 MB (device cpu0) bergamot-translator-worker.js:1217:12
[2022-01-27 13:15:48] Loaded model config bergamot-translator-worker.js:1217:12
[2022-01-27 13:15:48] Loading scorer of type transformer as feature F0 bergamot-translator-worker.js:1217:12
[2022-01-27 13:15:48] Memory mapping model at 0x1c560600 bergamot-translator-worker.js:1217:12
[2022-01-27 13:15:48] [memory] Reserving 31 MB, device cpu0 bergamot-translator-worker.js:1217:12
[2022-01-27 13:15:48] [memory] Reserving 8 MB, device cpu0 bergamot-translator-worker.js:1217:12
Outbound Model 'ptde' successfully constructed. Time taken: 0.791 secs translationWorker.js:175:25
[2022-01-27 13:15:54] Error: We did not find an alpha in the model named: F0::Wemb_QuantMultA. bergamot-translator-worker.js:1217:12
[2022-01-27 13:15:54] Error: Aborted from auto marian::cpu::integer::fetchAlphaFromModelNodeOp::forwardOps()::(anonymous class)::operator()() const in /root/checkout/3rd_party/marian-dev/src/tensors/cpu/intgemm_interface.h:583 bergamot-translator-worker.js:1217:12
Callstacks not supported in WASM builds currently bergamot-translator-worker.js:1217:12
undefined bergamot-translator-worker.js:649:9
Translation error:  RuntimeError: abort(undefined). Build with -s ASSERTIONS=1 for more info. translationWorker.js:117:37
RuntimeError: abort(undefined). Build with -s ASSERTIONS=1 for more info.
@andrenatal andrenatal added bug Something isn't working outbound-translation Issues and Requests for outbound translation labels Jan 27, 2022
@jelmervdl
Copy link
Contributor

I've had problems with the en-de model as well. Looks like the precomputed alphas may be missing from the model?

@kpu
Copy link
Contributor

kpu commented Jan 27, 2022

I was able to reproduce the error using the files downloaded from here:
https://storage.googleapis.com/bergamot-models-sandbox/0.2.10/ende/model.ende.intgemm.alphas.bin
https://storage.googleapis.com/bergamot-models-sandbox/0.2.10/ende/lex.50.50.ende.s2t.bin
https://storage.googleapis.com/bergamot-models-sandbox/0.2.10/ende/vocab.deen.spm

The command I used to reproduce the error is

~/marian-dev/build/marian-decoder --relative-paths -m moz/model.ende.intgemm.alphas.bin -v moz/vocab.deen.spm{,} --beam-size 1 --mini-batch 32 --maxi-batch 100 --maxi-batch-sort src -w 128 --skip-cost --shortlist moz/lex.50.50.ende.s2t.bin --cpu-threads 1 --gemm-precision int8shiftAlphaAll <<<"Hello"

These files are outdated. They are version 1 of the en-de system not version 2. Also the model file doesn't match version 1 exactly. In any case v2 should be pulled from https://data.statmt.org/bergamot/models/deen_v2.0/ .

I tried to determine the provenance of the model file, but it doesn't match any currently hosted model, v1 or v2.

md5sum v?/*/model.intgemm.alphas.bin moz/model.ende.intgemm.alphas.bin 
c482cd68e65a3cff6b66ac1c79ad4bee  v1/ende.student.base/model.intgemm.alphas.bin
f48744b967863d4e27f68d3e7199d922  v1/ende.student.tiny11/model.intgemm.alphas.bin
f48744b967863d4e27f68d3e7199d922  v1/ende.student.tiny.for.regression.tests/model.intgemm.alphas.bin
977f09cb9781d37c61e9ca45929178c1  v2/ende.student.base/model.intgemm.alphas.bin
f74fc9b331d6fe9f395721a1717a2117  v2/ende.student.tiny11/model.intgemm.alphas.bin
17447b6ef127f5a13fc38415181f115d  moz/model.ende.intgemm.alphas.bin

The vocabulary file is from v1:

md5sum v?/*/vocab.deen.spm moz/vocab.deen.spm
bbbc1f3a2d1dd39e6e88e00f13ef2f23  v1/ende.student.base/vocab.deen.spm
bbbc1f3a2d1dd39e6e88e00f13ef2f23  v1/ende.student.tiny11/vocab.deen.spm
bbbc1f3a2d1dd39e6e88e00f13ef2f23  v1/ende.student.tiny.for.regression.tests/vocab.deen.spm
5dd2fd1c2f5f67e7d84092c5037404f9  v2/ende.student.base/vocab.deen.spm
5dd2fd1c2f5f67e7d84092c5037404f9  v2/ende.student.tiny11/vocab.deen.spm
bbbc1f3a2d1dd39e6e88e00f13ef2f23  moz/vocab.deen.spm

Once I switched the model file to version 2 from https://data.statmt.org/bergamot/models/deen_v2.0/ende.student.tiny11.tar.gz , the command completed successfully

@kpu kpu removed their assignment Jan 27, 2022
@kpu
Copy link
Contributor

kpu commented Jan 27, 2022

Unassigning self, leaving for @abhi-agg to do a model pull (I don't have permissions to do that).

@kpu
Copy link
Contributor

kpu commented Jan 28, 2022

browsermt/students#46

Sep 14, 2021

"Updated en-de models posted, thanks @kaleidoescape

    "checksum": "7f6bdcf60555fca479e014a6722729b34890e52ca8bfbffb5138f574ec91aec7",
    "url": "http://data.statmt.org/bergamot/models/deen/ende.student.base.tar.gz",

    "checksum": "5214a434a8b6d0562eb927ff5ffe42d4a60240370a0095e0c1369d960878254f",
    "url": "http://data.statmt.org/bergamot/models/deen/ende.student.tiny11.tar.gz",

cc @andrenatal @lonnen"

@kpu
Copy link
Contributor

kpu commented Jan 28, 2022

I note the following agenda items from the Bergamot plenary.
30 September 2021 point 4 "Mozilla pulling in shifted alphas models and config? They’ve been available for a while…"
16 September 2021 point 2.a "en-de updated with WMT21 system"

@jerinphilip
Copy link

how can we ensure if the other models we have are current and updated then if no one from your team notifies and replace then in the modelregistry whenever there's a new version then?

We do not have visibility into your model pushing mechanisms. Here's my recommendation. Create a JSON file similar to how https://translatelocally.com/models.json. Use this to generate your modelRegistry. The JSON is already pullable by a python repository mechanism which can be used for continuous testing. Or you can bring your own custom repository in (python) code without having to bundle everything. The merits are manifold:

  1. Mozilla's evaluations (https://github.com/mozilla/firefox-translations-evaluation) use python. It'll be easy to pick up for data-viz and table generations over there.
  2. All active models can be tested for continuous stability, I think this might even be feasible via GitHub Actions. We can do weekly cron runs against expected output.
  3. The model is available to the common public for command line explorations as well. There are models in Mozilla repository that are not available in the browsermt repository.

@abhi-agg
Copy link
Collaborator

Update: I applied #61 and tested the workflow. The issue pertaining to not finding the alphas is gone.

Now I see a new error as follows (attached screenshot contains more details):

Translation error:  TypeError: message.sourceParagraph.trim is not a function translationWorker.js:120:37
    consumeTranslationQueue moz-extension://2c65be99-3a64-4361-adf9-ee979fba6c71/controller/translation/translationWorker.js:120
TypeError: message.sourceParagraph.trim is not a function
translationWorker.js:94:72
Translation error:  TypeError: message.sourceParagraph.trim is not a function translationWorker.js:120:37
    consumeTranslationQueue moz-extension://2c65be99-3a64-4361-adf9-ee979fba6c71/controller/translation/translationWorker.js:120

This error seems to be happening only for outbound translation now as the in page translation of PT <-> DE works without any issue now.

Screenshot 2022-01-28 at 15 55 03

@andrenatal
Copy link
Contributor Author

@abhi-agg I'm working to stabilize and fix all outbound translation issues on this patch: #55. If you apply it you'll see this issue gone.

@abhi-agg
Copy link
Collaborator

abhi-agg commented Jan 28, 2022

Awesome. It means I don't need to debug this issue. Once you merge that, we will close this issue.

@andrenatal andrenatal linked a pull request Feb 9, 2022 that will close this issue
@andrenatal andrenatal added this to the W1 milestone Feb 10, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working outbound-translation Issues and Requests for outbound translation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants