Skip to content
This repository has been archived by the owner on Mar 11, 2021. It is now read-only.

Update marian with binary model loading #11

Closed
wants to merge 99 commits into from
Closed

Commits on Jan 20, 2021

  1. Import sources from mts adaptation

    This first commit imports files from  mts which was repurposed for bergamot translator
    from https://github.com/browsermt/mts/tree/nuke.
    Jerin Philip committed Jan 20, 2021
    Configuration menu
    Copy the full SHA
    601bd52 View commit details
    Browse the repository at this point in the history
  2. Bumping marian with sentencepiece capable fork

    Modifications to SentencePiece are necessary to provide token level
    string_views. This commit changes marian to an alternate branch which
    has the feature incorporated.
    Jerin Philip committed Jan 20, 2021
    Configuration menu
    Copy the full SHA
    d786f25 View commit details
    Browse the repository at this point in the history
  3. Updating CMakeLists to build main

    CMakeLists have been modified with the necessary includes to add
    browsermt/mts@nuke files to the bergamot-translator library. In
    addition, adds the ssplit dependency, corresponding includes.
    
    Intel MKL fails on compilation, unable to find libraries. To solve this
    3rd_party/CMakeLists.txt is modified with @UG's fixes to propogate
    variables (EXT_LIBS, etc) at a library level.
    Jerin Philip committed Jan 20, 2021
    Configuration menu
    Copy the full SHA
    bde9094 View commit details
    Browse the repository at this point in the history
  4. Undoing LineSplitter, reverting SentenceSplitter.

    A faster linesplitter added for benchmarks is removed in favour of @UG's
    ssplit-cpp.
    NOTE: ssplit-cpp's regex based implementation is slow for one-line
    parses, which ideally needs to be improved in upstream ssplit-cpp to
    trivially reduce to a faster newline character based split.
    Jerin Philip committed Jan 20, 2021
    Configuration menu
    Copy the full SHA
    b25b227 View commit details
    Browse the repository at this point in the history
  5. Adding documentation and example to service.h

    Jerin Philip committed Jan 20, 2021
    Configuration menu
    Copy the full SHA
    b3f1905 View commit details
    Browse the repository at this point in the history
  6. Enhancing service.h further

    Jerin Philip committed Jan 20, 2021
    Configuration menu
    Copy the full SHA
    d3c707f View commit details
    Browse the repository at this point in the history
  7. Moving main (mts) to app/

    Commit modifies the example test-code main-mts into the app folder,
    updating CMakeLists accordingly.
    Jerin Philip committed Jan 20, 2021
    Configuration menu
    Copy the full SHA
    54a6c6c View commit details
    Browse the repository at this point in the history
  8. Removing unused timer.h

    Jerin Philip committed Jan 20, 2021
    Configuration menu
    Copy the full SHA
    caa03e1 View commit details
    Browse the repository at this point in the history
  9. TranslationResult Docs

    Removed Alignments, too many questions and no concrete answers. Better
    off removing unused code. History is kept for now, for internal use.
    Jerin Philip committed Jan 20, 2021
    Configuration menu
    Copy the full SHA
    d6ec007 View commit details
    Browse the repository at this point in the history

Commits on Jan 21, 2021

  1. Fixes copying around vocabs

    Vocabs was earlier loaded in each thread and copied several times.
    Modified this to be loaded only once in Service and reference used
    consistently later on.
    
    This change makes Tokenizer as a class rather moot, as there's only one
    private member and a function. Moved this into TextProcessor.
    SentenceSplitter, however remains a separate class.
    
    utils.{h,cpp} had only a single loadVocabularies function, which
    is at the moment required only in Service. Making loadVocabularies a
    function inside Service and getting rid of utils.*.
    Jerin Philip committed Jan 21, 2021
    Configuration menu
    Copy the full SHA
    4640ae4 View commit details
    Browse the repository at this point in the history
  2. Neaten TextProcessor, add a bit of docs.

    - Truncating long sentences into those of a specified length for faster
      processing is now a separate function, for improved readability.
    - Changes doing push_back -> emplace_back at places to avoid copy.
    - query_to_segments is renamed as process.
    - Comments are added in an attempt to bring some sanity.
    Jerin Philip committed Jan 21, 2021
    Configuration menu
    Copy the full SHA
    ea1a628 View commit details
    Browse the repository at this point in the history
  3. MTranslationResult, more comments

    Jerin Philip committed Jan 21, 2021
    Configuration menu
    Copy the full SHA
    9b18bd9 View commit details
    Browse the repository at this point in the history
  4. Fixing compile error, need tests, CI

    Jerin Philip committed Jan 21, 2021
    Configuration menu
    Copy the full SHA
    12e7e2c View commit details
    Browse the repository at this point in the history
  5. Removing unused variable in batch_translator

    Jerin Philip committed Jan 21, 2021
    Configuration menu
    Copy the full SHA
    80125e2 View commit details
    Browse the repository at this point in the history

Commits on Jan 22, 2021

  1. CMakeLists improvements

    Only the bergamot-translator library should be linked to main target
    Any other library (marian ${MARIAN_CUDA_LIB} ${EXT_LIBS} ssplit
    pcrecpp.a pcre.a) should be linked to bergamot-translator target inside
    src/translator folder.
    Jerin Philip committed Jan 22, 2021
    Configuration menu
    Copy the full SHA
    3714393 View commit details
    Browse the repository at this point in the history
  2. Adding vim temporary files to .gitignore

    Jerin Philip committed Jan 22, 2021
    Configuration menu
    Copy the full SHA
    e75bd7e View commit details
    Browse the repository at this point in the history
  3. Updating README.md with instructions to run service-cli

    Jerin Philip committed Jan 22, 2021
    Configuration menu
    Copy the full SHA
    3b6b9cd View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    c8fc004 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    1c3b656 View commit details
    Browse the repository at this point in the history
  6. Removing Exception to fix Apple compile

    Jerin Philip committed Jan 22, 2021
    Configuration menu
    Copy the full SHA
    988e76b View commit details
    Browse the repository at this point in the history

Commits on Jan 23, 2021

  1. CI and Associated Changes

    Enables Mac and Ubuntu CPU only builds through GitHub CI. CI scripts are
    copied from marian-dev with necessary changes.
    
    3rd-party/marian-dev is modified to meet C++17 requirements modifying
    for half_float.
    Jerin Philip committed Jan 23, 2021
    Configuration menu
    Copy the full SHA
    7e2eb02 View commit details
    Browse the repository at this point in the history
  2. CI scripts: master -> main

    Jerin Philip committed Jan 23, 2021
    Configuration menu
    Copy the full SHA
    cd025e9 View commit details
    Browse the repository at this point in the history

Commits on Jan 24, 2021

  1. Changing code-style to clang-format-google

    Jerin Philip committed Jan 24, 2021
    Configuration menu
    Copy the full SHA
    69adc7a View commit details
    Browse the repository at this point in the history

Commits on Jan 25, 2021

  1. Integrating marian-translator through API

    Using std::string for config. Now capable of launching marian translator
    through API interface. There's a sketchy workaround to convert a string
    config to marian::Options, with an added note.
    Jerin Philip committed Jan 25, 2021
    Configuration menu
    Copy the full SHA
    08a7358 View commit details
    Browse the repository at this point in the history

Commits on Jan 26, 2021

  1. Configuration menu
    Copy the full SHA
    026f1af View commit details
    Browse the repository at this point in the history
  2. Cleanup TranslationModelConfiguration to std::string change in API

     - Provide yaml formatted string as model configuration
     - Remove redundant files
    abhi-agg committed Jan 26, 2021
    Configuration menu
    Copy the full SHA
    b49f2c1 View commit details
    Browse the repository at this point in the history
  3. Improved main.cpp file

     - Print original and translated text
     - Just add 2 vector entries for texts
    abhi-agg committed Jan 26, 2021
    Configuration menu
    Copy the full SHA
    0d16b19 View commit details
    Browse the repository at this point in the history
  4. Fix for garbled output through cli.

    Requirement for string_view is the original source string be transferred
    all the way from input to service to back to TranslationResult. This
    constraint was violated in several places by means of existence of a
    copy-constructor. The issue is fixed by deleting copy and assignment
    constructors in marian::bergamot::TranslationResult and
    UnifiedAPI::TranslationResult, which demonstrated a few occurances of
    the same. Replaced the same with move semantics.  In addition, future is
    set and get using move semantics at the moment.  Default
    move-constructor didn't seem to be working, so they're made explicit for
    TranslationResults.
    
    This commit additionally packs a few deletions and improvements made to
    improve structure (textops.cpp, batcher.cpp) along the process of
    inspecting and fixing the garbled outputs. They are choose to be kept,
    in the interest of time, against a prettified atomic commit engineering.
    
    Combinations of the following commits in jp/string-view-bug
    [acfc92 78a588 12d91b 00a277 919e2f 9d3a46 b7e39b 18f67b bf667c]
    Jerin Philip committed Jan 26, 2021
    Configuration menu
    Copy the full SHA
    9a17f36 View commit details
    Browse the repository at this point in the history

Commits on Jan 28, 2021

  1. Removing config file printing

    Jerin Philip committed Jan 28, 2021
    Configuration menu
    Copy the full SHA
    e76a602 View commit details
    Browse the repository at this point in the history

Commits on Feb 2, 2021

  1. CMake updates submodules

    Jerin Philip committed Feb 2, 2021
    Configuration menu
    Copy the full SHA
    548c888 View commit details
    Browse the repository at this point in the history
  2. Reordering git submodule update before includes

    Jerin Philip committed Feb 2, 2021
    Configuration menu
    Copy the full SHA
    2929077 View commit details
    Browse the repository at this point in the history

Commits on Feb 8, 2021

  1. Updated marian-dev submodule

     - Switch to "wasm" branch of browsermt/marian-dev
    abhi-agg committed Feb 8, 2021
    Configuration menu
    Copy the full SHA
    9a54d21 View commit details
    Browse the repository at this point in the history

Commits on Feb 9, 2021

  1. Changed encodePreservingSource -> encodeWithByteRanges

     - This change happened because marian submodule changed
       this name
    
     - Native builds are working fine
       -- bergamot-translator-app output is consistent
    abhi-agg committed Feb 9, 2021
    Configuration menu
    Copy the full SHA
    47b4bae View commit details
    Browse the repository at this point in the history

Commits on Feb 10, 2021

  1. Updated ssplit submodule to a different repository

     - Added abhi-agg/ssplit-cpp
     - Added its wasm branch in bergamot-translator
     - Native builds of bergamot-translator are successful
       -- Sentence splitting is NOT WORKING
       -- Only translation is working
    abhi-agg committed Feb 10, 2021
    Configuration menu
    Copy the full SHA
    5683168 View commit details
    Browse the repository at this point in the history
  2. Changed translate() API from non-blocking to blocking

     - Can be changed back to non-blocking once blocking API
       becomes integrable via WASM port in browser
    abhi-agg committed Feb 10, 2021
    Configuration menu
    Copy the full SHA
    584700c View commit details
    Browse the repository at this point in the history
  3. Updated ssplit submodule

    abhi-agg committed Feb 10, 2021
    Configuration menu
    Copy the full SHA
    a2d3269 View commit details
    Browse the repository at this point in the history

Commits on Feb 11, 2021

  1. Add cmake option to compile project on WASM

     - Set cmake option COMPILE_WASM to ON to compile the project
       on WASM
    abhi-agg committed Feb 11, 2021
    Configuration menu
    Copy the full SHA
    9747d9b View commit details
    Browse the repository at this point in the history
  2. Set cmake option to compile marian library only

     - Set COMPILE_LIBRARY_ONLY to ON for marian library
    abhi-agg committed Feb 11, 2021
    Configuration menu
    Copy the full SHA
    b73d4f4 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    838547e View commit details
    Browse the repository at this point in the history
  4. cmake compile option changes

     - Make native builds successful with marian decoder
     - COMPILE_DECODER_ONLY flag requires importing some
       compile definitions from marian
    abhi-agg committed Feb 11, 2021
    Configuration menu
    Copy the full SHA
    9b89650 View commit details
    Browse the repository at this point in the history
  5. cmake compile option changes for wasm builds

      - Make WASM builds successful with marian decoder
      - Setting COMPILE_WASM to ON requires importing some
        compile definitions from marian
    abhi-agg committed Feb 11, 2021
    Configuration menu
    Copy the full SHA
    79c445a View commit details
    Browse the repository at this point in the history
  6. Fixed a bug in TranslationModel class

     - Using bergamot-translator as a library fails at run time because
       necessary parser options are not set
    abhi-agg committed Feb 11, 2021
    Configuration menu
    Copy the full SHA
    a06530e View commit details
    Browse the repository at this point in the history
  7. Source code changes to compile the project without threads

     - Set COMPILE_THREAD_VARIANT cmake option to ON to compile
       multithreaded variant of the project
    abhi-agg committed Feb 11, 2021
    Configuration menu
    Copy the full SHA
    23a9527 View commit details
    Browse the repository at this point in the history
  8. Added code to generate proper JS bindings of translator

     - COMPILE_WASM cmake option sets WASM_BINDINGS compile
       definition that enables code for generating proper JS
       bindings
    abhi-agg committed Feb 11, 2021
    Configuration menu
    Copy the full SHA
    7b80003 View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    74b06d8 View commit details
    Browse the repository at this point in the history
  10. Added JS binding files and cmake infrastructure to build them

     - Added "wasm" folder
     - Contains README file as well
    abhi-agg committed Feb 11, 2021
    Configuration menu
    Copy the full SHA
    de501e8 View commit details
    Browse the repository at this point in the history
  11. Configuration menu
    Copy the full SHA
    e126470 View commit details
    Browse the repository at this point in the history
  12. Configuration menu
    Copy the full SHA
    ff95e37 View commit details
    Browse the repository at this point in the history

Commits on Feb 12, 2021

  1. Configuration menu
    Copy the full SHA
    28dcf55 View commit details
    Browse the repository at this point in the history
  2. Updated marian-dev submodule

     - This fixes the issue of sentencepiece not being able to checkout
       properly
    abhi-agg committed Feb 12, 2021
    Configuration menu
    Copy the full SHA
    3b7673b View commit details
    Browse the repository at this point in the history
  3. Update README.md

    Add  `--recursive` to `git clone` instructions
    andrenatal committed Feb 12, 2021
    Configuration menu
    Copy the full SHA
    9108d9f View commit details
    Browse the repository at this point in the history
  4. Merge pull request #20 from browsermt/andrenatal-patch-1

    Update README.md
    kpu committed Feb 12, 2021
    Configuration menu
    Copy the full SHA
    f43dc33 View commit details
    Browse the repository at this point in the history
  5. Update README.md

    updating  `--recursive`  on wasm instructions too
    andrenatal committed Feb 12, 2021
    Configuration menu
    Copy the full SHA
    3a53a68 View commit details
    Browse the repository at this point in the history

Commits on Feb 13, 2021

  1. Update README.md

    andrenatal committed Feb 13, 2021
    Configuration menu
    Copy the full SHA
    a97bf7b View commit details
    Browse the repository at this point in the history
  2. Update README.md

    andrenatal committed Feb 13, 2021
    Configuration menu
    Copy the full SHA
    47db659 View commit details
    Browse the repository at this point in the history

Commits on Feb 14, 2021

  1. Including a more elaborated test page, a node webserver containing th…

    …e proper cors headers and wasm mimetype
    andrenatal committed Feb 14, 2021
    Configuration menu
    Copy the full SHA
    1e413f7 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    0dbc861 View commit details
    Browse the repository at this point in the history

Commits on Feb 15, 2021

  1. Updated wasm readme

    motin committed Feb 15, 2021
    Configuration menu
    Copy the full SHA
    d27a96f View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    f7c8651 View commit details
    Browse the repository at this point in the history
  3. Some cleanup

    motin committed Feb 15, 2021
    Configuration menu
    Copy the full SHA
    26ea5bb View commit details
    Browse the repository at this point in the history
  4. Add support for translating multiple sentences on the test page + rep…

    …ort words per second metric in the log
    motin committed Feb 15, 2021
    Configuration menu
    Copy the full SHA
    d3969bc View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    28c0ab2 View commit details
    Browse the repository at this point in the history
  6. Add instructions on how to assemble and package the set of files expe…

    …cted by the test page
    motin committed Feb 15, 2021
    Configuration menu
    Copy the full SHA
    a33b3a3 View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    53e0b9f View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    e50dd09 View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    7030fa0 View commit details
    Browse the repository at this point in the history
  10. Configuration menu
    Copy the full SHA
    49ad651 View commit details
    Browse the repository at this point in the history
  11. Configuration menu
    Copy the full SHA
    77f3954 View commit details
    Browse the repository at this point in the history
  12. Configuration menu
    Copy the full SHA
    dbdcdab View commit details
    Browse the repository at this point in the history
  13. Fix typo from when fixing typo

    motin committed Feb 15, 2021
    Configuration menu
    Copy the full SHA
    70bdcd4 View commit details
    Browse the repository at this point in the history
  14. Finally found the original typo that made it appear as if loading the…

    … model in the test page was faster than elsewhere - the lexical shortlist was not being included at the right place in the model config
    motin committed Feb 15, 2021
    Configuration menu
    Copy the full SHA
    da56501 View commit details
    Browse the repository at this point in the history
  15. Formatting

    motin committed Feb 15, 2021
    Configuration menu
    Copy the full SHA
    1e94d78 View commit details
    Browse the repository at this point in the history
  16. Configuration menu
    Copy the full SHA
    fcc998f View commit details
    Browse the repository at this point in the history
  17. Configuration menu
    Copy the full SHA
    f3ff1d2 View commit details
    Browse the repository at this point in the history
  18. Configuration menu
    Copy the full SHA
    7d6346d View commit details
    Browse the repository at this point in the history
  19. Configuration menu
    Copy the full SHA
    64d57d8 View commit details
    Browse the repository at this point in the history
  20. Enabled simd shuffle pattern for intgemm compilation

     - WORMHOLE cmake option is set to ON when compiling for WASM
     - WASM module might not run on Chrome
    abhi-agg committed Feb 15, 2021
    Configuration menu
    Copy the full SHA
    3dd7a60 View commit details
    Browse the repository at this point in the history
  21. Prepend shortlist path with /

    motin committed Feb 15, 2021
    Configuration menu
    Copy the full SHA
    91e45cb View commit details
    Browse the repository at this point in the history
  22. Configuration menu
    Copy the full SHA
    9a5ae95 View commit details
    Browse the repository at this point in the history
  23. Revert "Enabled simd shuffle pattern for intgemm compilation"

    This reverts commit 3dd7a60.
    motin committed Feb 15, 2021
    Configuration menu
    Copy the full SHA
    9a5cf30 View commit details
    Browse the repository at this point in the history
  24. Merge pull request #26 from motin/wasm-integration

    Turn of assertions and disable exception catching for wasm builds
    abhi-agg committed Feb 15, 2021
    Configuration menu
    Copy the full SHA
    fc3ab33 View commit details
    Browse the repository at this point in the history
  25. Updated marian submodule

     - Includes try/catch free builds
     - Has ASSERTION=0 and DISABLE_EXCEPTION_CATCHING=1 for wasm builds
    abhi-agg committed Feb 15, 2021
    Configuration menu
    Copy the full SHA
    0374ac4 View commit details
    Browse the repository at this point in the history
  26. Configuration menu
    Copy the full SHA
    3607523 View commit details
    Browse the repository at this point in the history
  27. Configuration menu
    Copy the full SHA
    c5c5339 View commit details
    Browse the repository at this point in the history

Commits on Feb 16, 2021

  1. Updated config for min inference time

     - This combination gives min inference time (~ 200 WPS)
       on local machine
    abhi-agg committed Feb 16, 2021
    Configuration menu
    Copy the full SHA
    921c2ee View commit details
    Browse the repository at this point in the history
  2. Updated instructions on how to get all relevant models in place for t…

    …he upcoming release
    motin committed Feb 16, 2021
    Configuration menu
    Copy the full SHA
    b1e72ce View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    d907400 View commit details
    Browse the repository at this point in the history

Commits on Feb 17, 2021

  1. Improved README

     - Clears up the spaghetti of model packaging
     - Usage instructions
     - Formatting changes
    abhi-agg committed Feb 17, 2021
    Configuration menu
    Copy the full SHA
    b86f8a7 View commit details
    Browse the repository at this point in the history
  2. Allow using relative paths for packaging files

     - PACKAGE_DIR cmake option can now accept relative paths
    abhi-agg committed Feb 17, 2021
    Configuration menu
    Copy the full SHA
    9feebe5 View commit details
    Browse the repository at this point in the history

Commits on Feb 18, 2021

  1. Configuration menu
    Copy the full SHA
    b75e72e View commit details
    Browse the repository at this point in the history
  2. Replaced "build-wasm-docker" with "build-wasm"

     - Now things are consistent with the top level README
       instructions that suggest to build in "build-wasm"
       folder
    abhi-agg committed Feb 18, 2021
    Configuration menu
    Copy the full SHA
    c2371dd View commit details
    Browse the repository at this point in the history
  3. Improved wasm/README

     - Clarified that the Demo and API usage section assumes
       bergamot models were packaged into wasm binary
     - Formatting changes
    abhi-agg committed Feb 18, 2021
    Configuration menu
    Copy the full SHA
    79571ba View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    51f702e View commit details
    Browse the repository at this point in the history

Commits on Feb 19, 2021

  1. CircleCI config, docs and badge

    motin committed Feb 19, 2021
    Configuration menu
    Copy the full SHA
    896df30 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    f823c29 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    ece8240 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    826d322 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    cdd0953 View commit details
    Browse the repository at this point in the history

Commits on Feb 20, 2021

  1. Merge pull request #1 from mozilla/wasm-circle-builds

    WASM CircleCI builds
    motin committed Feb 20, 2021
    Configuration menu
    Copy the full SHA
    bed48e1 View commit details
    Browse the repository at this point in the history