Update marian with binary model loading #11

andrenatal · 2021-03-02T05:41:19Z

No description provided.

This first commit imports files from mts which was repurposed for bergamot translator from https://github.com/browsermt/mts/tree/nuke.

Modifications to SentencePiece are necessary to provide token level string_views. This commit changes marian to an alternate branch which has the feature incorporated.

@UG

CMakeLists have been modified with the necessary includes to add browsermt/mts@nuke files to the bergamot-translator library. In addition, adds the ssplit dependency, corresponding includes. Intel MKL fails on compilation, unable to find libraries. To solve this 3rd_party/CMakeLists.txt is modified with @UG's fixes to propogate variables (EXT_LIBS, etc) at a library level.

@UG

A faster linesplitter added for benchmarks is removed in favour of @UG's ssplit-cpp. NOTE: ssplit-cpp's regex based implementation is slow for one-line parses, which ideally needs to be improved in upstream ssplit-cpp to trivially reduce to a faster newline character based split.

Commit modifies the example test-code main-mts into the app folder, updating CMakeLists accordingly.

Removed Alignments, too many questions and no concrete answers. Better off removing unused code. History is kept for now, for internal use.

Vocabs was earlier loaded in each thread and copied several times. Modified this to be loaded only once in Service and reference used consistently later on. This change makes Tokenizer as a class rather moot, as there's only one private member and a function. Moved this into TextProcessor. SentenceSplitter, however remains a separate class. utils.{h,cpp} had only a single loadVocabularies function, which is at the moment required only in Service. Making loadVocabularies a function inside Service and getting rid of utils.*.

- Truncating long sentences into those of a specified length for faster processing is now a separate function, for improved readability. - Changes doing push_back -> emplace_back at places to avoid copy. - query_to_segments is renamed as process. - Comments are added in an attempt to bring some sanity.

Only the bergamot-translator library should be linked to main target Any other library (marian ${MARIAN_CUDA_LIB} ${EXT_LIBS} ssplit pcrecpp.a pcre.a) should be linked to bergamot-translator target inside src/translator folder.

Enables Mac and Ubuntu CPU only builds through GitHub CI. CI scripts are copied from marian-dev with necessary changes. 3rd-party/marian-dev is modified to meet C++17 requirements modifying for half_float.

Using std::string for config. Now capable of launching marian translator through API interface. There's a sketchy workaround to convert a string config to marian::Options, with an added note.

- Provide yaml formatted string as model configuration - Remove redundant files

- Print original and translated text - Just add 2 vector entries for texts

Requirement for string_view is the original source string be transferred all the way from input to service to back to TranslationResult. This constraint was violated in several places by means of existence of a copy-constructor. The issue is fixed by deleting copy and assignment constructors in marian::bergamot::TranslationResult and UnifiedAPI::TranslationResult, which demonstrated a few occurances of the same. Replaced the same with move semantics. In addition, future is set and get using move semantics at the moment. Default move-constructor didn't seem to be working, so they're made explicit for TranslationResults. This commit additionally packs a few deletions and improvements made to improve structure (textops.cpp, batcher.cpp) along the process of inspecting and fixing the garbled outputs. They are choose to be kept, in the interest of time, against a prettified atomic commit engineering. Combinations of the following commits in jp/string-view-bug [acfc92 78a588 12d91b 00a277 919e2f 9d3a46 b7e39b 18f67b bf667c]

- WORMHOLE cmake option is set to ON when compiling for WASM - WASM module might not run on Chrome

This reverts commit 3dd7a60.

Turn of assertions and disable exception catching for wasm builds

- Includes try/catch free builds - Has ASSERTION=0 and DISABLE_EXCEPTION_CATCHING=1 for wasm builds

- This combination gives min inference time (~ 200 WPS) on local machine

…he upcoming release

- Clears up the spaghetti of model packaging - Usage instructions - Formatting changes

- PACKAGE_DIR cmake option can now accept relative paths

- Now things are consistent with the top level README instructions that suggest to build in "build-wasm" folder

- Clarified that the Demo and API usage section assumes bergamot models were packaged into wasm binary - Formatting changes

…etal builds. Fixes browsermt/bergamot-translator#31

…n circleci

WASM CircleCI builds

Jerin Philip and others added 30 commits January 20, 2021 19:08

Import sources from mts adaptation

601bd52

This first commit imports files from mts which was repurposed for bergamot translator from https://github.com/browsermt/mts/tree/nuke.

Bumping marian with sentencepiece capable fork

d786f25

Modifications to SentencePiece are necessary to provide token level string_views. This commit changes marian to an alternate branch which has the feature incorporated.

Adding documentation and example to service.h

b3f1905

Enhancing service.h further

d3c707f

Moving main (mts) to app/

54a6c6c

Commit modifies the example test-code main-mts into the app folder, updating CMakeLists accordingly.

Removing unused timer.h

caa03e1

TranslationResult Docs

d6ec007

Removed Alignments, too many questions and no concrete answers. Better off removing unused code. History is kept for now, for internal use.

MTranslationResult, more comments

9b18bd9

Fixing compile error, need tests, CI

12e7e2c

Removing unused variable in batch_translator

80125e2

CMakeLists improvements

3714393

Only the bergamot-translator library should be linked to main target Any other library (marian ${MARIAN_CUDA_LIB} ${EXT_LIBS} ssplit pcrecpp.a pcre.a) should be linked to bergamot-translator target inside src/translator folder.

Adding vim temporary files to .gitignore

e75bd7e

Updating README.md with instructions to run service-cli

3b6b9cd

Improved 3rd party header inclusion and library linking

c8fc004

Removed a redundant directory inclusion in CMakeFile

1c3b656

Removing Exception to fix Apple compile

988e76b

CI and Associated Changes

7e2eb02

Enables Mac and Ubuntu CPU only builds through GitHub CI. CI scripts are copied from marian-dev with necessary changes. 3rd-party/marian-dev is modified to meet C++17 requirements modifying for half_float.

CI scripts: master -> main

cd025e9

Changing code-style to clang-format-google

69adc7a

Integrating marian-translator through API

08a7358

Using std::string for config. Now capable of launching marian translator through API interface. There's a sketchy workaround to convert a string config to marian::Options, with an added note.

Removed redundant lines from CMakeFile

026f1af

Cleanup TranslationModelConfiguration to std::string change in API

b49f2c1

- Provide yaml formatted string as model configuration - Remove redundant files

Improved main.cpp file

0d16b19

- Print original and translated text - Just add 2 vector entries for texts

Removing config file printing

e76a602

CMake updates submodules

548c888

motin and others added 28 commits February 15, 2021 13:19

Formatting

1e94d78

Add 10 lines of esen benchmark sentences to test page

fcc998f

Make modelConfig an object instead of string (less likelihood of typos)

f3ff1d2

Add model config used in pr6 benchmarks

7d6346d

Use yaml for modelConfig on test page

64d57d8

Enabled simd shuffle pattern for intgemm compilation

3dd7a60

- WORMHOLE cmake option is set to ON when compiling for WASM - WASM module might not run on Chrome

Prepend shortlist path with /

91e45cb

Turn of assertions and disable exception catching for wasm builds

9a5ae95

Revert "Enabled simd shuffle pattern for intgemm compilation"

9a5cf30

This reverts commit 3dd7a60.

Merge pull request #26 from motin/wasm-integration

fc3ab33

Turn of assertions and disable exception catching for wasm builds

Updated marian submodule

0374ac4

- Includes try/catch free builds - Has ASSERTION=0 and DISABLE_EXCEPTION_CATCHING=1 for wasm builds

Enabled COMPILE_WITHOUT_EXCEPTIONS for marian submodule

3607523

Re-enable simd shuffle pattern for intgemm compilation

c5c5339

Updated config for min inference time

921c2ee

- This combination gives min inference time (~ 200 WPS) on local machine

Updated instructions on how to get all relevant models in place for t…

b1e72ce

…he upcoming release

Updated test page to use the model structure from bergamot-models repo

d907400

Improved README

b86f8a7

- Clears up the spaghetti of model packaging - Usage instructions - Formatting changes

Allow using relative paths for packaging files

9feebe5

- PACKAGE_DIR cmake option can now accept relative paths

Added more explanation for FILES_TO_PACKAGE in README

b75e72e

Replaced "build-wasm-docker" with "build-wasm"

c2371dd

- Now things are consistent with the top level README instructions that suggest to build in "build-wasm" folder

Improved wasm/README

79571ba

- Clarified that the Demo and API usage section assumes bergamot models were packaged into wasm binary - Formatting changes

Remove Docker-based builds since they are no more reproducible than m…

51f702e

…etal builds. Fixes browsermt/bergamot-translator#31

CircleCI config, docs and badge

896df30

Remove trailing slash from intgemm submodule since it caused issues i…

f823c29

…n circleci

Increase CircleCI RAM from 4gb to 6gb

ece8240

Increase CircleCI RAM from 6gb to 8gb

826d322

Increase CircleCI RAM from 8gb to 16gb

cdd0953

Merge pull request #1 from mozilla/wasm-circle-builds

bed48e1

WASM CircleCI builds

andrenatal requested a review from abhi-agg March 2, 2021 05:41

andrenatal closed this Mar 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update marian with binary model loading #11

Update marian with binary model loading #11

andrenatal commented Mar 2, 2021

Update marian with binary model loading #11

Update marian with binary model loading #11

Conversation

andrenatal commented Mar 2, 2021