Skip to content
Permalink
Branch: develop
Commits on Nov 13, 2019
  1. Add choices for --image_pull_policy and --restart_policy (#1451)

    terrytangyuan authored and LiMinghao1994 committed Nov 13, 2019
    Signed-off-by: terrytangyuan <terrytangyuan@gmail.com>
  2. Add launching parameters for PS (#1452)

    ywskycn authored and QiJune committed Nov 13, 2019
    * Add launching parameters for PS
    
    * fix a test
    
    * ps should parse --port parameter
Commits on Nov 12, 2019
  1. add unit test for model_handler to process subclass model (#1446)

    workingloong authored and terrytangyuan committed Nov 12, 2019
    * add unit test for model_handler to process subclass model
    
    * fix code format to satisfy flake8
  2. Cache keras dataset in Travis (#1450)

    QiJune authored and terrytangyuan committed Nov 12, 2019
    * cache keras datasets in travis
    
    * add cache volumn
    
    * format code
  3. report_variable for ps init when needed (#1449)

    skydoorkai committed Nov 12, 2019
    * report_variable for ps init when needed
    
    * use teatDown in restart
    
    * add model_init_status check
    
    * reduce test time
  4. add a test for deepfm (#1442)

    QiJune committed Nov 12, 2019
    * init
    
    * fix test
    
    * add deepfm test
    
    * format code
    
    * format code
    
    * follow comments
  5. Fix the function name which misses a 's' (#1443)

    ywskycn committed Nov 12, 2019
    * Fix the function name which misses a
    
    * fix
  6. Execute save model task to export SaveModel (#1426)

    workingloong committed Nov 12, 2019
    * export saved model by save_model task
    
    * move the api to model_hander to  get model from ps
    
    * support config multiple volume
    
    * add multiple volumes to mount in pod
    
    * add stub to ParameterServerModelHandler
    
    * fix by comments
    
    * fix docstring style
    
    * fix by pre-commit
    
    * fix by comments
    
    * fix by pre-commit
    
    * update docstrings for
  7. push gradient rpc update embedding params (#1439)

    LiMinghao1994 authored and QiJune committed Nov 12, 2019
Commits on Nov 11, 2019
  1. support pushing embedding table info and gradients in worker (#1437)

    QiJune committed Nov 11, 2019
    * push embedding grads
    
    * refine code
    
    * add unittest
    
    * refine code
    
    * push embedding info
    
    * add comments
    
    * follow comments
    
    * fix ci
  2. Embedding layer looks up embedding from PS (#1435)

    LiMinghao1994 committed Nov 11, 2019
    * Embedding layer looks up embedding from PS
    
    * rename variable
    
    * follow comments
    
    * minor edits
    
    * rename worker_mnist_test.py
Commits on Nov 8, 2019
  1. add mnist training unit test for worker with multi-PS (#1434)

    QiJune committed Nov 8, 2019
    * init
    
    * debug
    
    * clean code
    
    * clean code
  2. Add func to optimizer wrapper (#1432)

    LiMinghao1994 committed Nov 8, 2019
    * add update and lookup func to optimizer wrapper
    
    * minor edits
    
    * add todos
    
    * modify comment
  3. fix ci (#1431)

    QiJune committed Nov 8, 2019
    Fix ci
Commits on Nov 7, 2019
  1. Check the true status for master pod when TensorBoard is enabled (#1429)

    terrytangyuan committed Nov 7, 2019
    Signed-off-by: terrytangyuan <terrytangyuan@gmail.com>
  2. Properly set pipeline exit option when validating job status (#1430)

    terrytangyuan committed Nov 7, 2019
    * Properly set pipeline exit option when validating job status
    
    Signed-off-by: terrytangyuan <terrytangyuan@gmail.com>
    
    * Address commennts
    
    Signed-off-by: terrytangyuan <terrytangyuan@gmail.com>
  3. partition gradients in worker (#1414)

    QiJune committed Nov 7, 2019
    * implement ps version
    
    * fix ci
    
    * follow comments
    
    * add mnist test
    
    * pass worker mnist test
    
    * add magic
    
    * refine code
    
    * trigger ci
    
    * follow comments
    
    * debug ci
    
    * debug ci
    
    * clean code
    
    * add get_model before training in worker mnist test
  4. Add slot table in PS.Parameters (#1418)

    LiMinghao1994 committed Nov 7, 2019
    * add slot params
    
    * add_slot_table
    
    * minor edits
    
    * add docsting
  5. Parameter worker_pod_priority should be under common_args (#1423)

    ywskycn authored and terrytangyuan committed Nov 7, 2019
Commits on Nov 6, 2019
  1. Add services to rbac manifest (#1422)

    terrytangyuan authored and ywskycn committed Nov 6, 2019
    Signed-off-by: terrytangyuan <terrytangyuan@gmail.com>
  2. Automatically create a service for each newly created ps pod (#1421)

    ywskycn committed Nov 6, 2019
    * Also create service for launched PS
    
    * Add a service for each launched PS
    
    * Address comments
  3. Event handler only cares about pod events (#1419)

    ywskycn committed Nov 6, 2019
  4. Rename k8s_worker_manager and let it also launch ps pods (#1417)

    ywskycn committed Nov 6, 2019
    * Rename k8s_worker_manager and let is also launch ps pods
    
    * Sort imports
    
    * Address test failures
Commits on Nov 5, 2019
  1. Update to use stable release of black formatter (#1415)

    terrytangyuan authored and ywskycn committed Nov 5, 2019
    Signed-off-by: terrytangyuan <terrytangyuan@gmail.com>
  2. Add TensorBoard service in integration tests (#1416)

    terrytangyuan authored and ywskycn committed Nov 5, 2019
  3. Rename variables in k8s_worker_manager (#1413)

    ywskycn committed Nov 5, 2019
  4. add sync update logic in push_gradient of pserver servicer (#1409)

    QiJune committed Nov 5, 2019
    * add sync SGD
    
    * add check_grad
    
    * add unittest
    
    * add unittest for pserver sync update
    
    * fix unittest
    
    * follow comments
    
    * follow comments
Commits on Nov 4, 2019
  1. Move create_service to k8s_client to be reused by other components (#…

    ywskycn committed Nov 4, 2019
    …1412)
    
    * Move create_service to k8s_client to be reused by other components
  2. Add --prediction_outputs_processor to ElasticDL CLI (#1410)

    terrytangyuan authored and ywskycn committed Nov 4, 2019
    Signed-off-by: terrytangyuan <terrytangyuan@gmail.com>
  3. Remove the redundant _design suffix from design docs (#1411)

    terrytangyuan authored and ywskycn committed Nov 4, 2019
    Signed-off-by: terrytangyuan <terrytangyuan@gmail.com>
  4. Pass save model task parameter from client CLI to the worker execution (

    brightcoder01 committed Nov 4, 2019
    #1399)
    
    * Remove saving the last checkpoint to output path. Will export to SavedModel
    
    * Pass the saved_model_path parameter to SavedModelTask
    
    * Auto reformat the code
    
    * Add pair message. Add extended_config field in the Task message.
    
    * Pass the SaveModelTask parameter from the client arguments to the worker execution
    
    * Auto reformat
    
    * Resolve comments
    
    * Use map instead of repeated Pair in elasticdl.proto
    
    * Auto format the code
    
    * Unify the Api name for invoke/add deferred callbacks
    
    * BUGFIX: If the JobType is Training Only, the worker will hang.
    
    * Only create SaveModel task when the JobType is TRAINING or TRAINING_WITH_EVALUATION
    
    * Add more comments for extended_config field
  5. Async push gradients (#1406)

    LiMinghao1994 authored and QiJune committed Nov 4, 2019
Commits on Nov 3, 2019
  1. Restore keras model and parameters to export. (#1400)

    workingloong committed Nov 3, 2019
    * rename initializer parameter to keep consistency with keras
    
    * restore keras model from trained parameters
    
    * remove empty lines
    
    * fix with pre-commit
    
    * fix docstring and delete
Commits on Nov 1, 2019
  1. SQLFlow integration design doc (#1402)

    terrytangyuan committed Nov 1, 2019
    Signed-off-by: terrytangyuan <terrytangyuan@gmail.com>
Older
You can’t perform that action at this time.