New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add TF Compute Server #3525
Add TF Compute Server #3525
Commits on Jun 17, 2022
-
Add support for Tensorflow Data Service
Signed-off-by: Enrico Minack <github@enrico.minack.dev> Co-authored-by: Terence Hernandez <t.na.m.hernandez@gmail.com>
-
Move compute_*.py into horovod.tensorflow.data, fix examples
Signed-off-by: Enrico Minack <github@enrico.minack.dev>
-
Make output_filename configurable in compute_worker.py
Signed-off-by: Enrico Minack <github@enrico.minack.dev>
-
Signed-off-by: Enrico Minack <github@enrico.minack.dev>
-
Make worker and example work with horovodrun, move docs into tensorfl…
…ow.rst Signed-off-by: Enrico Minack <github@enrico.minack.dev>
-
Make spark worker work with spark-submit
Signed-off-by: Enrico Minack <github@enrico.minack.dev>
-
Signed-off-by: Enrico Minack <github@enrico.minack.dev>
-
Remove tensorflow_data_service.rst from summary.rst
Signed-off-by: Enrico Minack <github@enrico.minack.dev>
-
Download to CWD directly, not mnist sub-directory
Signed-off-by: Enrico Minack <github@enrico.minack.dev>
-
Add spark-submit example to docs and CI
Signed-off-by: Enrico Minack <github@enrico.minack.dev>
-
Reduce run time for examples in CI
Signed-off-by: Enrico Minack <github@enrico.minack.dev>
-
Use default path to fetch mnist dataset, which is pre-fetched in test…
… images Signed-off-by: Enrico Minack <github@enrico.minack.dev>
-
Use --mpi instead of --gloo for MPI tests
Signed-off-by: Enrico Minack <github@enrico.minack.dev>
-
Run two workers to save ram, remove -H option
Signed-off-by: Enrico Minack <github@enrico.minack.dev>
-
Escape $ differently in test command, but only for Buildkite
Signed-off-by: Enrico Minack <github@enrico.minack.dev>
-
Revert "Escape $ differently in test command, but only for Buildkite"
This reverts commit bd94d87. Signed-off-by: Enrico Minack <github@enrico.minack.dev>
-
Reference the worker file directly
Signed-off-by: Enrico Minack <github@enrico.minack.dev>
-
Initialize Horovod for Tensorflow in tf worker
Signed-off-by: Enrico Minack <github@enrico.minack.dev>
-
Signed-off-by: Enrico Minack <github@enrico.minack.dev>
-
Signed-off-by: Enrico Minack <github@enrico.minack.dev>
-
Signed-off-by: Enrico Minack <github@enrico.minack.dev>
-
Signed-off-by: Enrico Minack <github@enrico.minack.dev>
-
Signed-off-by: Enrico Minack <github@enrico.minack.dev>
-
Signed-off-by: Enrico Minack <github@enrico.minack.dev>
-
Add timeout parameter to TfDataServiceConfig
Signed-off-by: Enrico Minack <github@enrico.minack.dev>
-
Signed-off-by: Enrico Minack <github@enrico.minack.dev>
-
Syncronize tests, assert batches
Signed-off-by: Enrico Minack <github@enrico.minack.dev>
-
Add processing_mode to send_to_data_service, improve logging
Signed-off-by: Enrico Minack <github@enrico.minack.dev>
-
Relax assertions, add logging, add timeout parameter to compute worke…
…r script Signed-off-by: Enrico Minack <github@enrico.minack.dev>
-
Signed-off-by: Enrico Minack <github@enrico.minack.dev>
-
Add DEBUG level to pytest tests
Signed-off-by: Enrico Minack <github@enrico.minack.dev>
-
Signed-off-by: Enrico Minack <github@enrico.minack.dev>
-
Signed-off-by: Enrico Minack <github@enrico.minack.dev>
-
Remove expected batches, skip pre tf2
Signed-off-by: Enrico Minack <github@enrico.minack.dev>
-
Signed-off-by: Enrico Minack <github@enrico.minack.dev>
-
Remove port detection and address spec for worker
Signed-off-by: Enrico Minack <github@enrico.minack.dev>
-
Signed-off-by: Enrico Minack <github@enrico.minack.dev>
-
Revert "Add DEBUG level to pytest tests"
This reverts commit 5ab15a1. Signed-off-by: Enrico Minack <github@enrico.minack.dev>
-
Signed-off-by: Enrico Minack <github@enrico.minack.dev>
-
Have horovod.tensorflow.data.compute_worker.py script broadcast config
Signed-off-by: Enrico Minack <github@enrico.minack.dev>
-
Add some words about TF data service to docs
Signed-off-by: Enrico Minack <github@enrico.minack.dev>
-
Shutdown dispatcher in finally clause
Signed-off-by: Enrico Minack <github@enrico.minack.dev>
-
Move the finished config file into place
Signed-off-by: Enrico Minack <github@enrico.minack.dev>
-
Fix config broadcast for MPI in GPU environment
Signed-off-by: Enrico Minack <github@enrico.minack.dev>
-
Signed-off-by: Enrico Minack <github@enrico.minack.dev>
-
Add tensorflow issue to skipped test
Signed-off-by: Enrico Minack <github@enrico.minack.dev>
-
Remove extra timeout from compute_worker_fn
Signed-off-by: Enrico Minack <github@enrico.minack.dev>