Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rich #47

Merged
merged 166 commits into from Jan 14, 2021
Merged

Rich #47

Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
166 commits
Select commit Hold shift + click to select a range
c29dce1
Updates to `config.py`
saforem2 Nov 24, 2020
265cf3a
Updates to `utils/`
saforem2 Nov 24, 2020
e648fef
Updates to `dynamics/`
saforem2 Nov 24, 2020
cf5e3bb
Updates to `train.py`
saforem2 Nov 24, 2020
4668cfb
Updates to `test/test_training.py`
saforem2 Nov 24, 2020
9f66c99
Updates to `doc/*`
saforem2 Nov 25, 2020
f9390c5
Updates to `dynamics/*`
saforem2 Nov 25, 2020
9244cfe
Adds helper fn for getting `intQ, sinQ` to `GaugeLattice`
saforem2 Nov 25, 2020
1920d00
Updates to `utils/plotting_utils.py`
saforem2 Nov 25, 2020
b507e10
Clean up `utils/{training,inference}_utils.py`
saforem2 Nov 25, 2020
1912244
Adds numpy `json` Encoder to `utils/file_io.py`
saforem2 Nov 25, 2020
87c9213
Updates to `test/test_training.py`
saforem2 Nov 25, 2020
8e6bc20
Updates to `bin/test_configs.json`
saforem2 Nov 25, 2020
2e1dcd1
Updates to `utils/data_containers.py`
saforem2 Nov 26, 2020
f817c25
Adds `notebooks/inference_from_thetaGPU.ipynb`
saforem2 Nov 26, 2020
7c1c422
Replaces `tf.gather(x, idxs)` with `x[idxs]` in `BaseDynamics`
saforem2 Nov 29, 2020
0d12f25
Replaces `/` with `_` in var names in `tf.keras.Model` object
saforem2 Nov 29, 2020
49d2fc0
Updates to `tests/test_training.py`
saforem2 Nov 29, 2020
78f48b5
Adds `mplstyle.use('fast')` to `utils/plotting_utils.py`'
saforem2 Nov 29, 2020
c086c2c
Wraps calls to `plot_data()` with timer
saforem2 Nov 29, 2020
cad2152
Explicitly close plots in `utils/plotting_utils.py`
saforem2 Nov 29, 2020
236dad0
Wraps `az.plot_trace()` call in `try/except` block
saforem2 Nov 29, 2020
57f54ff
Adds `bin/train_thetaGPU.sh`
saforem2 Nov 29, 2020
fdb8f6b
Updates to how `beta` is initialized
saforem2 Nov 29, 2020
6fdc4ad
Ensure consistent (lowercase) names throughout
saforem2 Nov 29, 2020
edeefad
Updates to `run_inference_from_log_dir()` in `utils/inference_utils.py`
saforem2 Nov 29, 2020
277d3cd
Updates to `run_inference_from_log_dir()` in `utils/inference_utils.py`
saforem2 Nov 29, 2020
72bcee5
Ensure directory exists before adding inference results to `.csv` file
saforem2 Nov 29, 2020
d690898
Explicitly close plots in `utils/data_containers.py`
saforem2 Nov 29, 2020
561f8ed
Updates to `utils/plotting_utils.py`
saforem2 Nov 29, 2020
5dbd173
Updates to `utils/inference_utils.py`
saforem2 Nov 29, 2020
5b83fec
Cleaning up `utils/file_io.py`
saforem2 Nov 29, 2020
e4b8cd1
General improvements, remove direction from HMC
saforem2 Nov 30, 2020
9c052f0
Updates to `tests/test_training.py`
saforem2 Nov 30, 2020
b5fd16a
Updates to `bin/train_configs.json`
saforem2 Nov 30, 2020
d58b06e
Adds `num_chains` arg to `plot_data`
saforem2 Nov 30, 2020
764aef0
Adds `inference.py`
saforem2 Nov 30, 2020
237e5e2
Fixes bug in `dynamics/gauge_dynamics.py
saforem2 Nov 30, 2020
1675900
Changes to how networks are restored in `dynamics/gauge_dynamics.py`
saforem2 Nov 30, 2020
98a0c7c
Explicitly skip existing HMC run (search recursively!)
saforem2 Dec 1, 2020
aee9cba
Fixes minor bug in `utils/inference_utils.py`
saforem2 Dec 1, 2020
2397344
Minor changes to `hmc.py`
saforem2 Dec 1, 2020
144f906
Adds `notebooks/tunneling_rate_compare_2020_10_06.ipynb`
saforem2 Dec 1, 2020
ff6d932
Only plot subset of chains (improve performance) in training plots
saforem2 Dec 2, 2020
9aca847
Adds `beta` to CLI parameters in `inference.py`
saforem2 Dec 2, 2020
a57d460
Updates to `utils/inference_utils.py`
saforem2 Dec 3, 2020
447f762
Updates to `inference.py`
saforem2 Dec 3, 2020
4e302a9
Adds ability to specify `eps` from CLI in `inference.py`
saforem2 Dec 3, 2020
2263f12
Updates to `utils/plotting_utils.py`
saforem2 Dec 3, 2020
f0d8c12
Switch to using `dataclasses` for configs in `dynamics/`
saforem2 Dec 4, 2020
f49f2c9
Switch to using `dataclasses` for configs in `network/`
saforem2 Dec 4, 2020
9f69282
Updates to `utils/plotting_utils.py`
saforem2 Dec 4, 2020
9b3e19d
Updates to `utils/training_utils.py`
saforem2 Dec 5, 2020
7d23b3f
Cleaning up `train.py`
saforem2 Dec 5, 2020
df984ad
Tighten up ridgeplots
saforem2 Dec 5, 2020
bf45846
Cleaning up `utils/inference_utils.py`
saforem2 Dec 5, 2020
c1cd5be
Cleaning up `utils/`
saforem2 Dec 5, 2020
e0897ab
Cleaning up `lattice/utils.py`
saforem2 Dec 5, 2020
1f0aa3f
Cleaning up `config.py`
saforem2 Dec 5, 2020
63e3ee3
Changes to `bin/train_configs.json`
saforem2 Dec 5, 2020
c086b41
Updates to `hmc.py` from `thetaGPU`
saforem2 Dec 5, 2020
6188970
Updates to `notebooks/`
saforem2 Dec 5, 2020
e9a5463
Adds (dynamic) step dependent step-size, `eps`
saforem2 Dec 10, 2020
009f0b4
Updates `training_utils.py` for step-dep eps
saforem2 Dec 10, 2020
68d131d
Splits dynamic step-sizes for x, v
saforem2 Dec 11, 2020
40261f6
Updates to `utils/inference_utils.py`
saforem2 Dec 11, 2020
b9dd613
Fixes minor bug in `dynamics/`
saforem2 Dec 14, 2020
f841b12
Updates to tests
saforem2 Dec 14, 2020
4f9816c
Adds autocorr fns to `utils/data_utils.py`
saforem2 Dec 14, 2020
d206780
Updates to `utils/inference_utils.py`
saforem2 Dec 14, 2020
f276013
Updates to `utils/training_utils.py`
saforem2 Dec 14, 2020
2d0fae7
Updates to `utils/plotting_utils.py`
saforem2 Dec 14, 2020
e83911c
Adds `notebooks/autocorr_stats1.ipynb`
saforem2 Dec 16, 2020
0b18962
Adds separate networks for each of the x sub-updates
saforem2 Dec 16, 2020
a65bc7e
Fixes incorrect size bug when using single net in `GaugeDynamics`
saforem2 Dec 17, 2020
7b2e75c
Updates to `dynamics/*`
saforem2 Dec 21, 2020
42a8a1f
Removes trailing new lines
saforem2 Dec 21, 2020
e3ccb51
Updates to `utils/__init__.py`
saforem2 Dec 21, 2020
523a581
Updates to `utils/{training_utils.py,inference_utils.py}`
saforem2 Dec 21, 2020
8976d0e
Adds `batch_size` arg to `inference.py`
saforem2 Dec 21, 2020
789b31a
Updates to `utils/plotting_utils.py`
saforem2 Dec 21, 2020
367684e
Adds `Rich` formatted output to `utils/file_io.py`
saforem2 Dec 21, 2020
53c1cb4
Updates to `utils/data_containers.py`
saforem2 Dec 21, 2020
f8b8df1
Updates to `utils/data_utils.py`
saforem2 Dec 21, 2020
58ba199
Updates to `notebooks/autocorr_stats1.ipynb`
saforem2 Dec 21, 2020
12818ee
Adds check for `dynamics.xeps`
saforem2 Dec 21, 2020
d9f143f
Updates to `utils/inference_utils.py`
saforem2 Dec 21, 2020
3aeb502
Updates to `utils/inference_utils.py`
saforem2 Dec 21, 2020
56da3ad
Updates to `utils/inference_utils.py`
saforem2 Dec 21, 2020
5def622
Updates to `utils/inference_utils.py`
saforem2 Dec 21, 2020
5fd4345
Updates to `utils/{training_utils.py,inference_utils.py}`
saforem2 Dec 21, 2020
3f7de43
Updates to `train.py`
saforem2 Dec 21, 2020
08559d6
Catches uninitialized writer bug
saforem2 Dec 21, 2020
22ae3e9
Catches uninitialized writer bug
saforem2 Dec 21, 2020
779d808
Updates to `notebooks/autocorr_stats1.ipynb`
saforem2 Dec 22, 2020
8de21d5
Updates to `notebooks/`
saforem2 Dec 28, 2020
9a6dbbf
Clean up logging in `GaugeDynamics`
saforem2 Dec 29, 2020
51f8c32
Creates `rich` branch (with `Rich` formatted output)
saforem2 Dec 29, 2020
56ed691
Updates to `utils/file_io.py`, `utils/training_utils.py`
saforem2 Dec 29, 2020
90d2ffb
Updates to `utils/file_io.py`
saforem2 Dec 29, 2020
9f621c4
Updates to `notebooks/autocorr_stats1.ipynb`
saforem2 Dec 29, 2020
0c672a9
Adds `notebooks/autocorr_stats2.ipynb`
saforem2 Dec 29, 2020
8d00d4d
Updates to `utils/plotting_utils.py`
saforem2 Dec 29, 2020
21d74a7
Updates to `SKEYS` in `utils/__init__.py`
saforem2 Dec 29, 2020
4a9a581
Updates to `train.py`
saforem2 Dec 29, 2020
2642591
Updates logging to use `Rich` throughout
saforem2 Dec 29, 2020
f7965f8
Adds `inference_scripts` for distributed HMC on multiple GPUs
saforem2 Dec 30, 2020
571b339
Fixes minor bugs in `utils/{training_utils.py,plotting_utils.py}`
saforem2 Jan 1, 2021
0e1a569
Updates to `notebooks/*`
saforem2 Jan 1, 2021
e190c11
Removes deprecated arg from `bin/hmc_configs.json`
saforem2 Jan 4, 2021
61a5dc6
Fixes bug caused by incorrect metrics in `dynamics.GaugeDynamics`
saforem2 Jan 4, 2021
75f382f
Fixes minor bugs in `utils/{training_utils.py,plotting_utils.py}` (2)
saforem2 Jan 4, 2021
0d5e0e3
Fixes bug caused by incorrect metrics in `dynamics.GaugeDynamics` (3)
saforem2 Jan 4, 2021
3976677
Updates to hmc code
saforem2 Jan 4, 2021
31c2df6
Fixes hmc bugs
saforem2 Jan 4, 2021
ffabbdd
Moves `inference_scripts/` to `bin/`
saforem2 Jan 4, 2021
a1d6672
Updates to `hmc` code
saforem2 Jan 4, 2021
7fecbd5
Removes verbose metrics from HMC
saforem2 Jan 5, 2021
e6b7529
Adds `utils/autocorr.py`
saforem2 Jan 5, 2021
ea495ef
Updates to `utils/autocorr.py`
saforem2 Jan 5, 2021
1760d2c
Updates to `utils/autocorr.py` (1)
saforem2 Jan 5, 2021
7f86576
Updates to `utils/autocorr.py` (2)
saforem2 Jan 6, 2021
66270df
Updates to `utils/autocorr.py` (3)
saforem2 Jan 6, 2021
cf946e4
Updates to `utils/autocorr.py` (4)
saforem2 Jan 6, 2021
5504cc0
Updates to `utils/autocorr.py` (5)
saforem2 Jan 6, 2021
7635122
Updates to `utils/autocorr.py` (6)
saforem2 Jan 6, 2021
e8c1aa0
Updates to `utils/autocorr.py` (7)
saforem2 Jan 6, 2021
943c9eb
Changes to how `autocorr` computed
saforem2 Jan 6, 2021
69cb4cc
Refactoring `utils/file_io.py`
saforem2 Jan 7, 2021
67cd46f
Adds ability to specify optimizer in `BaseDynamics`
saforem2 Jan 7, 2021
f7598dc
Refactoring `utils/inference_utils.py`
saforem2 Jan 7, 2021
b0c3431
Updates to config files
saforem2 Jan 7, 2021
17a4763
Updates to `utils/data_containers.py`
saforem2 Jan 7, 2021
8f8c12c
Updates `utils/autocorr.py`
saforem2 Jan 7, 2021
760f80a
Removes `CBARS` from `utils/inference_utils.py`
saforem2 Jan 7, 2021
d14e914
Removes `CBARS` from `utils/inference_utils.py`
saforem2 Jan 7, 2021
713ce43
Fixes bug in `utils/inference_utils.py`
saforem2 Jan 7, 2021
4b1018c
Updates to `utils/autocorr.py` (8)
saforem2 Jan 8, 2021
85bb60e
Fixes bug in `utils/autocorr.py`
saforem2 Jan 8, 2021
2200dce
Updates to `utils/autocorr.py`
saforem2 Jan 8, 2021
d6fd77d
Enforce path requirement when loading charge data in `utils/autocorr.py`
saforem2 Jan 8, 2021
f732e1f
Adds check for dynamic step size when calculating params for autocorrs
saforem2 Jan 8, 2021
e4949f6
Removes incorrect import from `utils/training_utils.py`
saforem2 Jan 9, 2021
9c6a781
Adds notebook for running HMC via `tf-probability`
saforem2 Jan 11, 2021
ab4da9b
Prepends string passed to `io.rule` with timestamp
saforem2 Jan 11, 2021
8da0c0e
Updates to `utils/{training_utils.py,inference_utils.py,plotting_util…
saforem2 Jan 11, 2021
d18abe6
Adds `unnormalized_log_prob` method to `GaugeLattice`
saforem2 Jan 11, 2021
6fc9ac6
Updates to `doc/*`
saforem2 Jan 13, 2021
cc51120
Updates to `utils/autocorr.py` (9)
saforem2 Jan 13, 2021
41180b4
Changes to how metrics string is generated in `DataContainers`
saforem2 Jan 13, 2021
3cbe796
Changes `latice_shape` to `x_shape` throughout `lattice/`
saforem2 Jan 13, 2021
1912cd9
Changes `latice_shape` to `x_shape`
saforem2 Jan 13, 2021
4e6db5a
Adds `doc/autocorrs/autocorrs.tex`
saforem2 Jan 13, 2021
ae0a0fc
Updates to `.gitignore`
saforem2 Jan 14, 2021
68c6b07
Renaming `lattice_shape` to `x_shape`
saforem2 Jan 14, 2021
60f35fc
Renaming `lattice_shape` to `x_shape`
saforem2 Jan 14, 2021
fd2cc5b
Renaming `lattice_shape` to `x_shape`
saforem2 Jan 14, 2021
b60a958
Updates to `GaugeDynamics`
saforem2 Jan 14, 2021
53f3725
Updates to `GaugeDynamics`
saforem2 Jan 14, 2021
f9b8ba0
Updates to `.gitignore`
saforem2 Jan 14, 2021
e50b46b
Type annotataions in `utils/file_io.py`
saforem2 Jan 14, 2021
e931399
Updates to `notebooks/autocorr_stats1.ipynb`
saforem2 Jan 14, 2021
3489b84
Fixing CodeFactor issues
saforem2 Jan 14, 2021
e90044b
Fixing CodeFactor issues
saforem2 Jan 14, 2021
9c9ec52
Fixing CodeFactor issues
saforem2 Jan 14, 2021
0c37685
Fixing CodeFactor issues
saforem2 Jan 14, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
13 changes: 13 additions & 0 deletions .gitignore
@@ -1,6 +1,19 @@
## --------------------------------
## USER-DEFINED .gitignore below
## --------------------------------
logs/
plots/
data/
doc/tikz/
**broken**
doc/hmc/
doc/comments.tex
doc/extra_plots.tex
tags
./pyrightconfig.json
./l2hmc-qcd/utils/rich_table_movie.py
l2hmc-qcd.code-workspace

.vscode/
bin/restore.sh

Expand Down
34 changes: 7 additions & 27 deletions bin/hmc_configs.json
Expand Up @@ -3,15 +3,15 @@
"log_dir": null,
"profiler": false,
"md_steps": 100,
"beta_init": 1.0,
"beta_final": 1.0,
"beta_init": 5.0,
"beta_final": 5.0,
"clip_val": 0.0,
"loss_scale": 0.1,
"hmc_steps": 0,
"print_steps": 500,
"logging_steps": 1,
"save_steps": 50000,
"run_steps": 1000,
"print_steps": 100,
"logging_steps": 50,
"save_steps": 1000,
"run_steps": 5000,
"train_steps": 1,
"dynamics_config": {
"verbose": false,
Expand All @@ -30,27 +30,7 @@
"directional_updates": false,
"use_scattered_xnet_update": false,
"use_tempered_traj": false,
"use_combined_updates": false,
"gauge_eq_masks": false,
"lattice_shape": [128, 16, 16, 2]
},
"network_config": {
"units": [128, 128, 18],
"activation_fn": "relu",
"dropout_prob": 0.0
},
"lr_config": {
"lr_init": 1.0e-3,
"decay_rate": 0.8,
"decay_steps": 50000,
"warmup_steps": 0
},
"conv_config": {
"filters": [16, 32],
"sizes": [2, 2],
"pool_sizes": [2, 2],
"conv_activations": ["relu", "relu"],
"conv_paddings": ["valid", "valid"],
"use_batch_norm": true
"lattice_shape": [256, 16, 16, 2]
}
}
104 changes: 104 additions & 0 deletions bin/inference_scripts/hmc_qsub.sh
@@ -0,0 +1,104 @@
#!/bin/bash
#COBALT -A datascience

echo -e "\n"
echo "Starting cobalt job script..."

date
# source /lus/theta-fs0/software/thetagpu/conda/tf_master/2020-12-17/mconda3/setup.sh

# eval "$(lus/theta-fs0/software/thetagpu/conda/tf_master/2020-12-17/mconda3/condabin/conda
# shell.bash hook)"
# source ~/tf_hvd_env.sh
# source ~/conda_zsh_setup.sh

export OMPI_MCA_opal_cuda_support=true
export NCCL_DEBUG=INFO
export KMP_SETTINGS=TRUE
export KMP_AFFINITY='granularity=fine,verbose,compact,1,0'
export AUTOGRAPH_VERBOSITY=10
export TF_XLA_FLAGS="--tf_xla_auto_jit=2 --tf_xla_enable_xla_devices"
export TF_ENABLE_AUTO_MIXED_PRECISION=1
export PATH=$PATH:$HOME/.local/bin
export PYTHONPATH=/lus/theta-fs0/software/thetagpu/conda/tf_master/2020-12-23/mconda3/lib/python3.8/site-packages:$PYTHONPATH
echo python3: $(which python3)

RANK=0
export CUDA_VISIBLE_DEVICES="$RANK"
echo RANK: $RANK, CUDA_VISIBLE_DEVICES: $CUDA_VISIBLE_DEVICES
mpirun -np 1 -H localhost:1 \
--allow-run-as-root -bind-to none -map-by slot \
-x CUDA_VISIBLE_DEVICES -x TF_XLA_FLAGS -x NCCL_DEBUG=INFO \
-x LD_LIBRARY_PATH -x PATH -x NCCL_SOCKET_IFNAME=^docker0,lo \
python3 ../hmc.py --json_file=../../bin/hmc_configs.json \
--run_loop --run_steps 125000 > hmc_rank$RANK.log &

RANK=1
export CUDA_VISIBLE_DEVICES="$RANK"
echo RANK: $RANK, CUDA_VISIBLE_DEVICES: $CUDA_VISIBLE_DEVICES
mpirun -np 1 -H localhost:1 \
--allow-run-as-root -bind-to none -map-by slot \
-x CUDA_VISIBLE_DEVICES -x TF_XLA_FLAGS -x NCCL_DEBUG=INFO \
-x LD_LIBRARY_PATH -x PATH -x NCCL_SOCKET_IFNAME=^docker0,lo \
python3 ../hmc.py --json_file=../../bin/hmc_configs.json \
--run_loop --run_steps 125000 > hmc_rank$RANK.log &

RANK=2
export CUDA_VISIBLE_DEVICES="$RANK"
echo RANK: $RANK, CUDA_VISIBLE_DEVICES: $CUDA_VISIBLE_DEVICES
mpirun -np 1 -H localhost:1 \
--allow-run-as-root -bind-to none -map-by slot \
-x CUDA_VISIBLE_DEVICES -x TF_XLA_FLAGS -x NCCL_DEBUG=INFO \
-x LD_LIBRARY_PATH -x PATH -x NCCL_SOCKET_IFNAME=^docker0,lo \
python3 ../hmc.py --json_file=../../bin/hmc_configs.json \
--run_loop --run_steps 125000 > hmc_rank$RANK.log &

RANK=3
export CUDA_VISIBLE_DEVICES="$RANK"
echo RANK: $RANK, CUDA_VISIBLE_DEVICES: $CUDA_VISIBLE_DEVICES
mpirun -np 1 -H localhost:1 \
--allow-run-as-root -bind-to none -map-by slot \
-x CUDA_VISIBLE_DEVICES -x TF_XLA_FLAGS -x NCCL_DEBUG=INFO \
-x LD_LIBRARY_PATH -x PATH -x NCCL_SOCKET_IFNAME=^docker0,lo \
python3 ../hmc.py --json_file=../../bin/hmc_configs.json \
--run_loop --run_steps 125000 > hmc_rank$RANK.log &

RANK=4
export CUDA_VISIBLE_DEVICES="$RANK"
echo RANK: $RANK, CUDA_VISIBLE_DEVICES: $CUDA_VISIBLE_DEVICES
mpirun -np 1 -H localhost:1 \
--allow-run-as-root -bind-to none -map-by slot \
-x CUDA_VISIBLE_DEVICES -x TF_XLA_FLAGS -x NCCL_DEBUG=INFO \
-x LD_LIBRARY_PATH -x PATH -x NCCL_SOCKET_IFNAME=^docker0,lo \
python3 ../hmc.py --json_file=../../bin/hmc_configs.json \
--run_loop --run_steps 125000 > hmc_rank$RANK.log &

RANK=5
export CUDA_VISIBLE_DEVICES="$RANK"
echo RANK: $RANK, CUDA_VISIBLE_DEVICES: $CUDA_VISIBLE_DEVICES
mpirun -np 1 -H localhost:1 \
--allow-run-as-root -bind-to none -map-by slot \
-x CUDA_VISIBLE_DEVICES -x TF_XLA_FLAGS -x NCCL_DEBUG=INFO \
-x LD_LIBRARY_PATH -x PATH -x NCCL_SOCKET_IFNAME=^docker0,lo \
python3 ../hmc.py --json_file=../../bin/hmc_configs.json \
--run_loop --run_steps 125000 > hmc_rank$RANK.log &

RANK=6
export CUDA_VISIBLE_DEVICES="$RANK"
echo RANK: $RANK, CUDA_VISIBLE_DEVICES: $CUDA_VISIBLE_DEVICES
mpirun -np 1 -H localhost:1 \
--allow-run-as-root -bind-to none -map-by slot \
-x CUDA_VISIBLE_DEVICES -x TF_XLA_FLAGS -x NCCL_DEBUG=INFO \
-x LD_LIBRARY_PATH -x PATH -x NCCL_SOCKET_IFNAME=^docker0,lo \
python3 ../hmc.py --json_file=../../bin/hmc_configs.json \
--run_loop --run_steps 125000 > hmc_rank$RANK.log &

RANK=7
export CUDA_VISIBLE_DEVICES="$RANK"
echo RANK: $RANK, CUDA_VISIBLE_DEVICES: $CUDA_VISIBLE_DEVICES
mpirun -np 1 -H localhost:1 \
--allow-run-as-root -bind-to none -map-by slot \
-x CUDA_VISIBLE_DEVICES -x TF_XLA_FLAGS -x NCCL_DEBUG=INFO \
-x LD_LIBRARY_PATH -x PATH -x NCCL_SOCKET_IFNAME=^docker0,lo \
python3 ../hmc.py --json_file=../../bin/hmc_configs.json \
--run_loop --run_steps 125000 > hmc_rank$RANK.log &