Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File-based fast training for Any2Vec models #2127

Merged
merged 133 commits into from
Sep 14, 2018

Commits on Jul 9, 2018

  1. CythonLineSentence

    persiyanov committed Jul 9, 2018
    Configuration menu
    Copy the full SHA
    39a2c11 View commit details
    Browse the repository at this point in the history
  2. fix

    persiyanov committed Jul 9, 2018
    Configuration menu
    Copy the full SHA
    20c22f7 View commit details
    Browse the repository at this point in the history
  3. fix setup.py

    persiyanov committed Jul 9, 2018
    Configuration menu
    Copy the full SHA
    dd0e9ca View commit details
    Browse the repository at this point in the history
  4. fixes

    persiyanov committed Jul 9, 2018
    Configuration menu
    Copy the full SHA
    6203c77 View commit details
    Browse the repository at this point in the history
  5. some refactoring

    persiyanov committed Jul 9, 2018
    Configuration menu
    Copy the full SHA
    03bf799 View commit details
    Browse the repository at this point in the history

Commits on Jul 10, 2018

  1. remove printf

    persiyanov committed Jul 10, 2018
    Configuration menu
    Copy the full SHA
    660493f View commit details
    Browse the repository at this point in the history
  2. compiled

    persiyanov committed Jul 10, 2018
    Configuration menu
    Copy the full SHA
    1aedfe8 View commit details
    Browse the repository at this point in the history
  3. second branch for pystreams

    persiyanov committed Jul 10, 2018
    Configuration menu
    Copy the full SHA
    9ff0bb1 View commit details
    Browse the repository at this point in the history
  4. fix

    persiyanov committed Jul 10, 2018
    Configuration menu
    Copy the full SHA
    9e498b7 View commit details
    Browse the repository at this point in the history

Commits on Jul 11, 2018

  1. learning rate decay in Cython + _do_train_epoch + _train_epoch_multis…

    …tream methods
    persiyanov committed Jul 11, 2018
    Configuration menu
    Copy the full SHA
    1d4a2a8 View commit details
    Browse the repository at this point in the history
  2. add train_epoch_sg function

    persiyanov committed Jul 11, 2018
    Configuration menu
    Copy the full SHA
    97bac7e View commit details
    Browse the repository at this point in the history
  3. call _train_epoch_multistream from train()

    persiyanov committed Jul 11, 2018
    Configuration menu
    Copy the full SHA
    4de3a84 View commit details
    Browse the repository at this point in the history
  4. add word2vec_inner.cpp

    persiyanov committed Jul 11, 2018
    Configuration menu
    Copy the full SHA
    36d1412 View commit details
    Browse the repository at this point in the history
  5. remove pragma from .cpp

    persiyanov committed Jul 11, 2018
    Configuration menu
    Copy the full SHA
    625025b View commit details
    Browse the repository at this point in the history

Commits on Jul 12, 2018

  1. Configuration menu
    Copy the full SHA
    8173da8 View commit details
    Browse the repository at this point in the history
  2. fix doc

    persiyanov committed Jul 12, 2018
    Configuration menu
    Copy the full SHA
    bd0a0e0 View commit details
    Browse the repository at this point in the history
  3. fix pip

    persiyanov committed Jul 12, 2018
    Configuration menu
    Copy the full SHA
    63663fa View commit details
    Browse the repository at this point in the history

Commits on Jul 14, 2018

  1. Configuration menu
    Copy the full SHA
    2ee2405 View commit details
    Browse the repository at this point in the history
  2. remove printf

    persiyanov committed Jul 14, 2018
    Configuration menu
    Copy the full SHA
    8f8e817 View commit details
    Browse the repository at this point in the history
  3. add 1 test for CythonLineSentence

    persiyanov committed Jul 14, 2018
    Configuration menu
    Copy the full SHA
    ac28bbb View commit details
    Browse the repository at this point in the history

Commits on Jul 18, 2018

  1. no vocab copying

    persiyanov committed Jul 18, 2018
    Configuration menu
    Copy the full SHA
    942a12f View commit details
    Browse the repository at this point in the history
  2. fixed

    persiyanov committed Jul 18, 2018
    Configuration menu
    Copy the full SHA
    2a44fbc View commit details
    Browse the repository at this point in the history

Commits on Jul 19, 2018

  1. Revert "fixed"

    This reverts commit 2a44fbc.
    persiyanov committed Jul 19, 2018
    Configuration menu
    Copy the full SHA
    e4a8ba0 View commit details
    Browse the repository at this point in the history
  2. Revert "no vocab copying"

    This reverts commit 942a12f.
    persiyanov committed Jul 19, 2018
    Configuration menu
    Copy the full SHA
    394a417 View commit details
    Browse the repository at this point in the history

Commits on Jul 24, 2018

  1. remove input_streams, add corpus_file

    persiyanov committed Jul 24, 2018
    Configuration menu
    Copy the full SHA
    9ab6b1b View commit details
    Browse the repository at this point in the history
  2. fix

    persiyanov committed Jul 24, 2018
    Configuration menu
    Copy the full SHA
    5d2e2cf View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    0489561 View commit details
    Browse the repository at this point in the history

Commits on Jul 26, 2018

  1. upd .cpp

    persiyanov committed Jul 26, 2018
    Configuration menu
    Copy the full SHA
    901cad4 View commit details
    Browse the repository at this point in the history
  2. add C++11 compiler flags

    persiyanov committed Jul 26, 2018
    Configuration menu
    Copy the full SHA
    c09035c View commit details
    Browse the repository at this point in the history
  3. pep8

    persiyanov committed Jul 26, 2018
    Configuration menu
    Copy the full SHA
    1e3c314 View commit details
    Browse the repository at this point in the history
  4. add link args too

    persiyanov committed Jul 26, 2018
    Configuration menu
    Copy the full SHA
    d6755be View commit details
    Browse the repository at this point in the history
  5. upd FastLineSentence

    persiyanov committed Jul 26, 2018
    Configuration menu
    Copy the full SHA
    cc4680c View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    9978f6b View commit details
    Browse the repository at this point in the history
  7. fix flake

    persiyanov committed Jul 26, 2018
    Configuration menu
    Copy the full SHA
    35333dd View commit details
    Browse the repository at this point in the history
  8. clean up base_any2vec.py

    persiyanov committed Jul 26, 2018
    Configuration menu
    Copy the full SHA
    86b91ac View commit details
    Browse the repository at this point in the history
  9. fix

    persiyanov committed Jul 26, 2018
    Configuration menu
    Copy the full SHA
    fca6f50 View commit details
    Browse the repository at this point in the history
  10. fix CythonLineSentence ctor

    persiyanov committed Jul 26, 2018
    Configuration menu
    Copy the full SHA
    45ca084 View commit details
    Browse the repository at this point in the history
  11. fix py3 type error

    persiyanov committed Jul 26, 2018
    Configuration menu
    Copy the full SHA
    16bb386 View commit details
    Browse the repository at this point in the history
  12. fix again

    persiyanov committed Jul 26, 2018
    Configuration menu
    Copy the full SHA
    c83b96f View commit details
    Browse the repository at this point in the history
  13. try again

    persiyanov committed Jul 26, 2018
    Configuration menu
    Copy the full SHA
    1a21b0b View commit details
    Browse the repository at this point in the history
  14. new error

    persiyanov committed Jul 26, 2018
    Configuration menu
    Copy the full SHA
    dd83a3e View commit details
    Browse the repository at this point in the history

Commits on Jul 27, 2018

  1. fix test

    persiyanov committed Jul 27, 2018
    Configuration menu
    Copy the full SHA
    c72f0b6 View commit details
    Browse the repository at this point in the history

Commits on Jul 30, 2018

  1. add unordered_map wrapper

    persiyanov committed Jul 30, 2018
    Configuration menu
    Copy the full SHA
    74e51b3 View commit details
    Browse the repository at this point in the history
  2. upd

    persiyanov committed Jul 30, 2018
    Configuration menu
    Copy the full SHA
    58fc112 View commit details
    Browse the repository at this point in the history
  3. fix cython compiling errors

    persiyanov committed Jul 30, 2018
    Configuration menu
    Copy the full SHA
    5e70184 View commit details
    Browse the repository at this point in the history
  4. upd word2vec_inner.cpp

    persiyanov committed Jul 30, 2018
    Configuration menu
    Copy the full SHA
    9727782 View commit details
    Browse the repository at this point in the history

Commits on Jul 31, 2018

  1. add some tests

    persiyanov committed Jul 31, 2018
    Configuration menu
    Copy the full SHA
    d97ac0c View commit details
    Browse the repository at this point in the history
  2. more tests for corpus_file

    persiyanov committed Jul 31, 2018
    Configuration menu
    Copy the full SHA
    b6d7bb3 View commit details
    Browse the repository at this point in the history
  3. fix docstrings

    persiyanov committed Jul 31, 2018
    Configuration menu
    Copy the full SHA
    0c1fc5f View commit details
    Browse the repository at this point in the history

Commits on Aug 1, 2018

  1. addressing comments

    persiyanov committed Aug 1, 2018
    Configuration menu
    Copy the full SHA
    fd66e34 View commit details
    Browse the repository at this point in the history
  2. fix tests skipIf

    persiyanov committed Aug 1, 2018
    Configuration menu
    Copy the full SHA
    da9f3da View commit details
    Browse the repository at this point in the history
  3. add persistence test

    persiyanov committed Aug 1, 2018
    Configuration menu
    Copy the full SHA
    81329d6 View commit details
    Browse the repository at this point in the history
  4. online learning tests

    persiyanov committed Aug 1, 2018
    Configuration menu
    Copy the full SHA
    f2ba633 View commit details
    Browse the repository at this point in the history
  5. fix save_as_line_sentence

    persiyanov committed Aug 1, 2018
    Configuration menu
    Copy the full SHA
    51cec43 View commit details
    Browse the repository at this point in the history
  6. fix again

    persiyanov committed Aug 1, 2018
    Configuration menu
    Copy the full SHA
    a72ddf1 View commit details
    Browse the repository at this point in the history

Commits on Aug 2, 2018

  1. address new comments

    persiyanov committed Aug 2, 2018
    Configuration menu
    Copy the full SHA
    aba7682 View commit details
    Browse the repository at this point in the history
  2. fix test

    persiyanov committed Aug 2, 2018
    Configuration menu
    Copy the full SHA
    03d44b2 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    e4e8cb2 View commit details
    Browse the repository at this point in the history
  4. fix tests

    persiyanov committed Aug 2, 2018
    Configuration menu
    Copy the full SHA
    3e989de View commit details
    Browse the repository at this point in the history

Commits on Aug 3, 2018

  1. add .c file

    persiyanov committed Aug 3, 2018
    Configuration menu
    Copy the full SHA
    d8c5cdc View commit details
    Browse the repository at this point in the history
  2. fix test

    persiyanov committed Aug 3, 2018
    Configuration menu
    Copy the full SHA
    2a42b85 View commit details
    Browse the repository at this point in the history
  3. fix tests skipIf and setup.py

    persiyanov committed Aug 3, 2018
    Configuration menu
    Copy the full SHA
    002a60c View commit details
    Browse the repository at this point in the history
  4. fix mac os compatibility

    persiyanov committed Aug 3, 2018
    Configuration menu
    Copy the full SHA
    3850f49 View commit details
    Browse the repository at this point in the history

Commits on Aug 9, 2018

  1. add tutorial on w2v multistream

    persiyanov committed Aug 9, 2018
    Configuration menu
    Copy the full SHA
    c1e8a9b View commit details
    Browse the repository at this point in the history

Commits on Aug 10, 2018

  1. 300% -> 200% in notebook

    persiyanov committed Aug 10, 2018
    Configuration menu
    Copy the full SHA
    7b7195b View commit details
    Browse the repository at this point in the history
  2. add MULTISTREAM_VERSION global constant

    persiyanov committed Aug 10, 2018
    Configuration menu
    Copy the full SHA
    3a8a915 View commit details
    Browse the repository at this point in the history
  3. first move towards multistream FastText

    persiyanov committed Aug 10, 2018
    Configuration menu
    Copy the full SHA
    6beb96a View commit details
    Browse the repository at this point in the history
  4. move MULTISTREAM_VERSION

    persiyanov committed Aug 10, 2018
    Configuration menu
    Copy the full SHA
    a2eb5fc View commit details
    Browse the repository at this point in the history
  5. fix error

    persiyanov committed Aug 10, 2018
    Configuration menu
    Copy the full SHA
    57f7b66 View commit details
    Browse the repository at this point in the history
  6. fix CythonVocab

    persiyanov committed Aug 10, 2018
    Configuration menu
    Copy the full SHA
    83ce7c2 View commit details
    Browse the repository at this point in the history
  7. regenerated .c & .cpp files

    persiyanov committed Aug 10, 2018
    Configuration menu
    Copy the full SHA
    a3ede08 View commit details
    Browse the repository at this point in the history

Commits on Aug 11, 2018

  1. resolve ambiguate fast_sentence_* declarations

    persiyanov committed Aug 11, 2018
    Configuration menu
    Copy the full SHA
    d38463e View commit details
    Browse the repository at this point in the history
  2. add test_training_multistream for fasttext

    persiyanov committed Aug 11, 2018
    Configuration menu
    Copy the full SHA
    ec4c677 View commit details
    Browse the repository at this point in the history
  3. add skipif

    persiyanov committed Aug 11, 2018
    Configuration menu
    Copy the full SHA
    a5311d2 View commit details
    Browse the repository at this point in the history
  4. add more tests

    persiyanov committed Aug 11, 2018
    Configuration menu
    Copy the full SHA
    f499d5b View commit details
    Browse the repository at this point in the history
  5. fix flake8

    persiyanov committed Aug 11, 2018
    Configuration menu
    Copy the full SHA
    645499c View commit details
    Browse the repository at this point in the history

Commits on Aug 12, 2018

  1. add short example

    persiyanov committed Aug 12, 2018
    Configuration menu
    Copy the full SHA
    dc1b98d View commit details
    Browse the repository at this point in the history

Commits on Aug 13, 2018

  1. upd jupyter notebook

    persiyanov committed Aug 13, 2018
    Configuration menu
    Copy the full SHA
    b9564e9 View commit details
    Browse the repository at this point in the history

Commits on Aug 14, 2018

  1. fix docstrings in doc2vec

    persiyanov committed Aug 14, 2018
    Configuration menu
    Copy the full SHA
    eefdd65 View commit details
    Browse the repository at this point in the history
  2. add d2v_train_epoch_dbow for from-file training

    persiyanov committed Aug 14, 2018
    Configuration menu
    Copy the full SHA
    f669979 View commit details
    Browse the repository at this point in the history

Commits on Aug 15, 2018

  1. add missing parts of from-file doc2vec

    persiyanov committed Aug 15, 2018
    Configuration menu
    Copy the full SHA
    e80189f View commit details
    Browse the repository at this point in the history
  2. refactored a bit

    persiyanov committed Aug 15, 2018
    Configuration menu
    Copy the full SHA
    cf6b032 View commit details
    Browse the repository at this point in the history
  3. add total_corpus_count calculation in doc2vec

    persiyanov committed Aug 15, 2018
    Configuration menu
    Copy the full SHA
    87d8ea7 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    e2851b4 View commit details
    Browse the repository at this point in the history
  5. add tests for doc2vec file-based + rename MULTISTREAM -> CORPUSFILE e…

    …verywhere
    persiyanov committed Aug 15, 2018
    Configuration menu
    Copy the full SHA
    1fdaa43 View commit details
    Browse the repository at this point in the history
  6. regenerated .c + .cpp files

    persiyanov committed Aug 15, 2018
    Configuration menu
    Copy the full SHA
    c2fa0d8 View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    5427416 View commit details
    Browse the repository at this point in the history
  8. make shared initialization

    persiyanov committed Aug 15, 2018
    Configuration menu
    Copy the full SHA
    7f7760b View commit details
    Browse the repository at this point in the history
  9. use init_config from word2vec_corpusfile

    persiyanov committed Aug 15, 2018
    Configuration menu
    Copy the full SHA
    926fd5e View commit details
    Browse the repository at this point in the history
  10. add FastTextConfig

    persiyanov committed Aug 15, 2018
    Configuration menu
    Copy the full SHA
    df47983 View commit details
    Browse the repository at this point in the history
  11. init_config -> init_w2v_config, init_ft_config

    persiyanov committed Aug 15, 2018
    Configuration menu
    Copy the full SHA
    0df7f6f View commit details
    Browse the repository at this point in the history
  12. regenerated .c & .cpp files

    persiyanov committed Aug 15, 2018
    Configuration menu
    Copy the full SHA
    5fd1c99 View commit details
    Browse the repository at this point in the history
  13. using FastTextConfig in fasttext_corpusfile.pyx

    persiyanov committed Aug 15, 2018
    Configuration menu
    Copy the full SHA
    d9257be View commit details
    Browse the repository at this point in the history
  14. fix

    persiyanov committed Aug 15, 2018
    Configuration menu
    Copy the full SHA
    67c572c View commit details
    Browse the repository at this point in the history
  15. fix

    persiyanov committed Aug 15, 2018
    Configuration menu
    Copy the full SHA
    8e82b9f View commit details
    Browse the repository at this point in the history
  16. fix next_random in w2v

    persiyanov committed Aug 15, 2018
    Configuration menu
    Copy the full SHA
    db2a77f View commit details
    Browse the repository at this point in the history

Commits on Aug 16, 2018

  1. introduce Doc2VecConfig

    persiyanov committed Aug 16, 2018
    Configuration menu
    Copy the full SHA
    a96bc6d View commit details
    Browse the repository at this point in the history
  2. fix init_d2v_config

    persiyanov committed Aug 16, 2018
    Configuration menu
    Copy the full SHA
    3b4da64 View commit details
    Browse the repository at this point in the history
  3. use Doc2VecConfig in doc2vec_corpusfile.pyx

    persiyanov committed Aug 16, 2018
    Configuration menu
    Copy the full SHA
    53b967c View commit details
    Browse the repository at this point in the history
  4. removed unused vars

    persiyanov committed Aug 16, 2018
    Configuration menu
    Copy the full SHA
    f57d1cb View commit details
    Browse the repository at this point in the history
  5. fix docstrings

    persiyanov committed Aug 16, 2018
    Configuration menu
    Copy the full SHA
    b652afe View commit details
    Browse the repository at this point in the history
  6. fix more docstrings

    persiyanov committed Aug 16, 2018
    Configuration menu
    Copy the full SHA
    260cfb5 View commit details
    Browse the repository at this point in the history
  7. test old model for doc2vec & fasttext

    persiyanov committed Aug 16, 2018
    Configuration menu
    Copy the full SHA
    a433018 View commit details
    Browse the repository at this point in the history
  8. fix loading old models

    persiyanov committed Aug 16, 2018
    Configuration menu
    Copy the full SHA
    20ec49b View commit details
    Browse the repository at this point in the history
  9. fix fasttext model checking

    persiyanov committed Aug 16, 2018
    Configuration menu
    Copy the full SHA
    1ced17d View commit details
    Browse the repository at this point in the history
  10. merge fast_line_sentence.cpp and fast_line_sentence.h

    persiyanov committed Aug 16, 2018
    Configuration menu
    Copy the full SHA
    0731449 View commit details
    Browse the repository at this point in the history
  11. fix word2vec test

    persiyanov committed Aug 16, 2018
    Configuration menu
    Copy the full SHA
    35f0ab4 View commit details
    Browse the repository at this point in the history
  12. fix syntax error

    persiyanov committed Aug 16, 2018
    Configuration menu
    Copy the full SHA
    49905f0 View commit details
    Browse the repository at this point in the history
  13. remove redundanta seekg call

    persiyanov committed Aug 16, 2018
    Configuration menu
    Copy the full SHA
    95c6ec9 View commit details
    Browse the repository at this point in the history
  14. fix example notebook

    persiyanov committed Aug 16, 2018
    Configuration menu
    Copy the full SHA
    aed2b6b View commit details
    Browse the repository at this point in the history
  15. add initial doc_tags computation

    persiyanov committed Aug 16, 2018
    Configuration menu
    Copy the full SHA
    c1af621 View commit details
    Browse the repository at this point in the history
  16. fix test

    persiyanov committed Aug 16, 2018
    Configuration menu
    Copy the full SHA
    33bf97a View commit details
    Browse the repository at this point in the history

Commits on Aug 17, 2018

  1. fix test for windows

    persiyanov committed Aug 17, 2018
    Configuration menu
    Copy the full SHA
    e592b6a View commit details
    Browse the repository at this point in the history
  2. add one more test on offsets

    persiyanov committed Aug 17, 2018
    Configuration menu
    Copy the full SHA
    d08e4c1 View commit details
    Browse the repository at this point in the history
  3. get rid of subword_arrays in fasttext

    persiyanov committed Aug 17, 2018
    Configuration menu
    Copy the full SHA
    468a000 View commit details
    Browse the repository at this point in the history
  4. make hanging indents everywhere

    persiyanov committed Aug 17, 2018
    Configuration menu
    Copy the full SHA
    f71e1f8 View commit details
    Browse the repository at this point in the history

Commits on Aug 18, 2018

  1. open file in byte mode

    persiyanov committed Aug 18, 2018
    Configuration menu
    Copy the full SHA
    811388b View commit details
    Browse the repository at this point in the history
  2. fix pep

    persiyanov committed Aug 18, 2018
    Configuration menu
    Copy the full SHA
    ddd5901 View commit details
    Browse the repository at this point in the history
  3. fix tests

    persiyanov committed Aug 18, 2018
    Configuration menu
    Copy the full SHA
    a3490c7 View commit details
    Browse the repository at this point in the history
  4. fix again

    persiyanov committed Aug 18, 2018
    Configuration menu
    Copy the full SHA
    a28ff0d View commit details
    Browse the repository at this point in the history
  5. final fix?

    persiyanov committed Aug 18, 2018
    Configuration menu
    Copy the full SHA
    b2996f0 View commit details
    Browse the repository at this point in the history
  6. regenerated .c & .cpp files

    persiyanov committed Aug 18, 2018
    Configuration menu
    Copy the full SHA
    64bb617 View commit details
    Browse the repository at this point in the history
  7. fix test_persistence_fromfile for FastText

    persiyanov committed Aug 18, 2018
    Configuration menu
    Copy the full SHA
    816f63f View commit details
    Browse the repository at this point in the history

Commits on Aug 20, 2018

  1. add fasttext & doc2vec to notebook

    persiyanov committed Aug 20, 2018
    Configuration menu
    Copy the full SHA
    abad1b8 View commit details
    Browse the repository at this point in the history
  2. add short examples

    persiyanov committed Aug 20, 2018
    Configuration menu
    Copy the full SHA
    0b03839 View commit details
    Browse the repository at this point in the history

Commits on Aug 23, 2018

  1. Configuration menu
    Copy the full SHA
    6217c73 View commit details
    Browse the repository at this point in the history

Commits on Aug 25, 2018

  1. Configuration menu
    Copy the full SHA
    f70d159 View commit details
    Browse the repository at this point in the history

Commits on Sep 9, 2018

  1. Configuration menu
    Copy the full SHA
    9593d5f View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    7b714b2 View commit details
    Browse the repository at this point in the history

Commits on Sep 12, 2018

  1. fix deprecation warning

    menshikh-iv authored Sep 12, 2018
    Configuration menu
    Copy the full SHA
    b833f0f View commit details
    Browse the repository at this point in the history

Commits on Sep 14, 2018

  1. regenerate .ipynb

    persiyanov committed Sep 14, 2018
    Configuration menu
    Copy the full SHA
    bcc0fb9 View commit details
    Browse the repository at this point in the history
  2. upd plot

    persiyanov committed Sep 14, 2018
    Configuration menu
    Copy the full SHA
    384e0b1 View commit details
    Browse the repository at this point in the history
  3. upd plot

    persiyanov committed Sep 14, 2018
    Configuration menu
    Copy the full SHA
    527266f View commit details
    Browse the repository at this point in the history