diff --git a/apps/kg/README.md b/apps/kg/README.md
index a406fa34d9f0..f5b57604428e 100644
--- a/apps/kg/README.md
+++ b/apps/kg/README.md
@@ -57,62 +57,121 @@ DGL-KE provides five built-in knowledge graphs:
 Users can specify one of the datasets with `--dataset` in `train.py` and `eval.py`.
 
 ## Performance
+The 1 GPU speed is measured with 8 CPU cores and one Nvidia V100 GPU. (AWS P3.2xlarge)
+The 8 GPU speed is measured with 64 CPU cores and eight Nvidia V100 GPU. (AWS P3.16xlarge)
 
-The speed is measured with 16 CPU cores and one Nvidia V100 GPU.
+The speed on FB15k 1GPU
 
-The speed on FB15k
+|  Models | TransE_l1 | TransE_l2 | DistMult | ComplEx | RESCAL | TransR | RotatE |
+|---------|-----------|-----------|----------|---------|--------|--------|--------|
+|MAX_STEPS| 48000     | 32000     | 40000    | 100000  | 32000  | 32000  | 20000  |
+|TIME     | 370s      | 270s      | 312s     | 282s    | 2095s  | 1556s  | 1861s  |
+
+The accuracy on FB15k
+
+|  Models   |  MR   |  MRR  | HITS@1 | HITS@3 | HITS@10 |
+|-----------|-------|-------|--------|--------|---------|
+| TransE_l1 | 44.18 | 0.675 | 0.551  | 0.774  | 0.861   |
+| TransE_l2 | 46.71 | 0.665 | 0.551  | 0.804  | 0.846   |
+| DistMult  | 61.04 | 0.725 | 0.625  | 0.837  | 0.883   |
+| ComplEx   | 64.59 | 0.785 | 0.718  | 0.835  | 0.889   |
+| RESCAL    | 122.3 | 0.669 | 0.598  | 0.711  | 0.793   |
+| TransR    | 59.86 | 0.676 | 0.591  | 0.735  | 0.814   |
+| RotatE    | 43.66 | 0.728 | 0.632  | 0.801  | 0.874   |
+
+
+The speed on FB15k 8GPU
 
 |  Models | TransE_l1 | TransE_l2 | DistMult | ComplEx | RESCAL | TransR | RotatE |
 |---------|-----------|-----------|----------|---------|--------|--------|--------|
-|MAX_STEPS| 20000     | 30000     |100000    | 100000  | 30000  | 100000 | 100000 |
-|TIME     | 411s      | 329s      |690s      | 806s    | 1800s  | 7627s  | 4327s  |
+|MAX_STEPS| 6000      | 4000      | 5000     | 4000    | 4000   | 4000   | 2500   |
+|TIME     | 88.93s    | 62.99s    | 72.74s   | 68.37s  | 245.9s | 203.9s | 126.7s |
 
 The accuracy on FB15k
 
 |  Models   |  MR   |  MRR  | HITS@1 | HITS@3 | HITS@10 |
 |-----------|-------|-------|--------|--------|---------|
-| TransE_l1 | 69.12 | 0.656 | 0.567  | 0.718  | 0.802   |
-| TransE_l2 | 35.86 | 0.570 | 0.400  | 0.708  | 0.834   |
-| DistMult  | 43.35 | 0.783 | 0.713  | 0.837  | 0.897   |
-| ComplEx   | 51.99 | 0.785 | 0.720  | 0.832  | 0.889   |
-| RESCAL    | 130.89| 0.668 | 0.597  | 0.720  | 0.800   |
-| TransR    | 138.7 | 0.501 | 0.274  | 0.704  | 0.801   |
-| RotatE    | 39.6  | 0.725 | 0.628  | 0.802  | 0.875   |
+| TransE_l1 | 44.25 | 0.672 | 0.547  | 0.774  | 0.860   |
+| TransE_l2 | 46.13 | 0.658 | 0.539  | 0.748  | 0.845   |
+| DistMult  | 61.72 | 0.723 | 0.626  | 0.798  | 0.881   |
+| ComplEx   | 65.84 | 0.754 | 0.676  | 0.813  | 0.880   |
+| RESCAL    | 135.6 | 0.652 | 0.580  | 0.693  | 0.779   |
+| TransR    | 65.27 | 0.676 | 0.591  | 0.736  | 0.811   |
+| RotatE    | 49.59 | 0.683 | 0.581  | 0.759  | 0.848   |
 
-In comparison, GraphVite uses 4 GPUs and takes 14 minutes. Thus, DGL-KE trains TransE on FB15k twice as fast as GraphVite while using much few resources. More performance information on GraphVite can be found [here](https://github.com/DeepGraphLearning/graphvite).
+In comparison, GraphVite uses 4 GPUs and takes 14 minutes. Thus, DGL-KE trains TransE on FB15k 9.5X as fast as GraphVite with 8 GPUs. More performance information on GraphVite can be found [here](https://github.com/DeepGraphLearning/graphvite).
 
-The speed on wn18
+The speed on wn18 1GPU
 
 |  Models | TransE_l1 | TransE_l2 | DistMult | ComplEx | RESCAL | TransR | RotatE |
 |---------|-----------|-----------|----------|---------|--------|--------|--------|
-|MAX_STEPS| 40000     | 20000     | 10000    | 20000   | 20000  | 20000  | 20000  |
-|TIME     | 719s      | 254s      | 126s     | 266s    | 333s   | 1547s  | 786s   |
+|MAX_STEPS| 32000     | 32000     | 20000    | 20000   | 20000  | 30000  | 24000  |
+|TIME     | 531.5s    | 406.6s    | 284.1s   | 282.3s  | 443.6s | 766.2s | 829.4s |
 
 The accuracy on wn18
 
+|  Models   |  MR   |  MRR  | HITS@1 | HITS@3 | HITS@10 |
+|-----------|-------|-------|--------|--------|---------|
+| TransE_l1 | 318.4 | 0.764 | 0.602  | 0.929  | 0.949   |
+| TransE_l2 | 206.2 | 0.561 | 0.306  | 0.800  | 0.944   |
+| DistMult  | 486.0 | 0.818 | 0.711  | 0.921  | 0.948   |
+| ComplEx   | 268.6 | 0.933 | 0.916  | 0.949  | 0.961   |
+| RESCAL    | 536.6 | 0.848 | 0.790  | 0.900  | 0.927   |
+| TransR    | 452.4 | 0.620 | 0.461  | 0.758  | 0.856   |
+| RotatE    | 487.9 | 0.944 | 0.940  | 0.947  | 0.952   |
+
+The speed on wn18 8GPU
+
+|  Models | TransE_l1 | TransE_l2 | DistMult | ComplEx | RESCAL | TransR | RotatE |
+|---------|-----------|-----------|----------|---------|--------|--------|--------|
+|MAX_STEPS| 4000      | 4000      | 2500     | 2500    | 2500   | 2500   | 3000   |
+|TIME     | 119.3s    | 81.1s     | 76.0s    | 58.0s   | 594.1s | 1168s  | 139.8s |
+
+The accuracy on wn18
+
+|  Models   |  MR   |  MRR  | HITS@1 | HITS@3 | HITS@10 |
+|-----------|-------|-------|--------|--------|---------|
+| TransE_l1 | 360.3 | 0.745 | 0.562  | 0.930  | 0.951   |
+| TransE_l2 | 193.8 | 0.557 | 0.301  | 0.799  | 0.942   |
+| DistMult  | 499.9 | 0.807 | 0.692  | 0.917  | 0.945   |
+| ComplEx   | 476.7 | 0.935 | 0.926  | 0.943  | 0.949   |
+| RESCAL    | 618.8 | 0.848 | 0.791  | 0.897  | 0.927   |
+| TransR    | 513.1 | 0.659 | 0.491  | 0.821  | 0.871   |
+| RotatE    | 466.2 | 0.944 | 0.940  | 0.945  | 0.951   |
+
+
+The speed on Freebase (8 GPU)
+
+|  Models | TransE_l2 | DistMult | ComplEx | TransR | RotatE |
+|---------|-----------|----------|---------|--------|--------|
+|MAX_STEPS| 320000   | 300000   | 360000  | 300000 | 300000 |
+|TIME     | 7908s     | 7425s    | 8946s   | 16816s | 12817s |
+
+The accuracy on Freebase (it is tested when 1000 negative edges are sampled for each positive edge).
+
 |  Models   |  MR    |  MRR  | HITS@1 | HITS@3 | HITS@10 |
 |-----------|--------|-------|--------|--------|---------|
-| TransE_l1 | 321.35 | 0.760 | 0.652  | 0.850  | 0.940   |
-| TransE_l2 | 181.57 | 0.570 | 0.322  | 0.802  | 0.944   |
-| DistMult  | 271.09 | 0.769 | 0.639  | 0.892  | 0.949   |
-| ComplEx   | 276.37 | 0.935 | 0.916  | 0.950  | 0.960   |
-| RESCAL    | 579.54 | 0.846 | 0.791  | 0.898  | 0.931   |
-| TransR    | 615.56 | 0.606 | 0.378  | 0.826  | 0.890   |
-| RotatE    | 367.64 | 0.931 | 0.924  | 0.935  | 0.944   | 
+| TransE_l2 | 22.4   | 0.756 | 0.688  | 0.800  | 0.882   |
+| DistMul   | 45.4   | 0.833 | 0.812  | 0.843  | 0.872   |
+| ComplEx   | 48.0   | 0.830 | 0.812  | 0.838  | 0.864   |
+| TransR    | 51.2   | 0.697 | 0.656  | 0.716  | 0.771   |
+| RotatE    | 93.3   | 0.770 | 0.749  | 0.780  | 0.805   |
 
-The speed on Freebase
+The speed on Freebase (48 CPU)
+This measured with 48 CPU cores on an AWS r5dn.24xlarge
 
-|  Models | DistMult | ComplEx |
-|---------|----------|---------|
-|MAX_STEPS| 3200000  | 3200000 |
-|TIME     | 2.44h    | 2.94h   |
+|  Models | TransE_l2 | DistMult | ComplEx |
+|---------|-----------|----------|---------|
+|MAX_STEPS| 50000     | 50000    | 50000   |
+|TIME     | 7002s     | 6340s    | 8133s   |
 
-The accuracy on Freebase (it is tested when 100,000 negative edges are sampled for each positive edge).
+The accuracy on Freebase (it is tested when 1000 negative edges are sampled for each positive edge).
 
-|  Models  |  MR    |  MRR  | HITS@1 | HITS@3 | HITS@10 |
-|----------|--------|-------|--------|--------|---------|
-| DistMul  | 6159.1 | 0.716 | 0.690  | 0.729  | 0.760   |
-| ComplEx  | 6888.8 | 0.716 | 0.697  | 0.728  | 0.760   |
+|  Models   |  MR    |  MRR  | HITS@1 | HITS@3 | HITS@10 |
+|-----------|--------|-------|--------|--------|---------|
+| TransE_l2 | 30.8   | 0.814 | 0.764  | 0.848  | 0.902   |
+| DistMul   | 45.1   | 0.834 | 0.815  | 0.843  | 0.871   |
+| ComplEx   | 44.9   | 0.837 | 0.819  | 0.845  | 0.870   |
 
 The configuration for reproducing the performance results can be found [here](https://github.com/dmlc/dgl/blob/master/apps/kg/config/best_config.sh).
 
@@ -162,34 +221,36 @@ Here are some examples of using the training script.
 Train KGE models with GPU.
 
 ```bash
-python3 train.py --model DistMult --dataset FB15k --batch_size 1024 \
-    --neg_sample_size 256 --hidden_dim 2000 --gamma 500.0 --lr 0.1 --max_step 100000 \
-    --batch_size_eval 16 --gpu 0 --valid --test -adv
+python3 train.py --model DistMult --dataset FB15k --batch_size 1024 --neg_sample_size 256 \
+    --hidden_dim 400 --gamma 143.0 --lr 0.08 --batch_size_eval 16 --valid --test -adv \
+    --gpu 0 --max_step 40000
 ```
 
-Train KGE models with mixed CPUs and GPUs.
+Train KGE models with mixed multiple GPUs.
 
 ```bash
-python3 train.py --model DistMult --dataset FB15k --batch_size 1024 \
-    --neg_sample_size 256 --hidden_dim 2000 --gamma 500.0 --lr 0.1 --max_step 100000 \
-    --batch_size_eval 16 --gpu 0 --valid --test -adv --mix_cpu_gpu
+python3 train.py --model DistMult --dataset FB15k --batch_size 1024 --neg_sample_size 256 \
+    --hidden_dim 400 --gamma 143.0 --lr 0.08 --batch_size_eval 16 --valid --test -adv \
+    --max_step 5000 --mix_cpu_gpu --num_proc 8 --gpu 0 1 2 3 4 5 6 7 --async_update \
+    --soft_rel_part --force_sync_interval 1000
 ```
 
 Train embeddings and verify it later.
 
 ```bash
-python3 train.py --model DistMult --dataset FB15k --batch_size 1024 \
-    --neg_sample_size 256 --hidden_dim 2000 --gamma 500.0 --lr 0.1 --max_step 100000 \
-    --batch_size_eval 16 --gpu 0 --valid -adv --save_emb DistMult_FB15k_emb
+python3 train.py --model DistMult --dataset FB15k --batch_size 1024 --neg_sample_size 256 \
+    --hidden_dim 400 --gamma 143.0 --lr 0.08 --batch_size_eval 16 --valid --test -adv \
+     --gpu 0 --max_step 40000 --save_emb DistMult_FB15k_emb
 
-python3 eval.py --model_name DistMult --dataset FB15k --hidden_dim 2000 \
-    --gamma 500.0 --batch_size 16 --gpu 0 --model_path DistMult_FB15k_emb/
+python3 eval.py --model_name DistMult --dataset FB15k --hidden_dim 400 \
+    --gamma 143.0 --batch_size 16 --gpu 0 --model_path DistMult_FB15k_emb/
 
 ```
 
 Train embeddings with multi-processing. This currently doesn't work in MXNet.
 ```bash
-python3 train.py --model DistMult --dataset FB15k --batch_size 1024 \
-    --neg_sample_size 256 --hidden_dim 2000 --gamma 500.0 --lr 0.07 --max_step 3000 \
-    --batch_size_eval 16 --regularization_coef 0.000001 --valid --test -adv --num_proc 8
+python3 train.py --model TransE_l2 --dataset Freebase --batch_size 1000 \
+    --neg_sample_size 200 --hidden_dim 400 --gamma 10 --lr 0.1 --max_step 50000 \
+    --log_interval 100 --batch_size_eval 1000 --neg_sample_size_eval 1000 --test \
+   -adv --regularization_coef 1e-9 --num_thread 1 --num_proc 48
 ```
diff --git a/apps/kg/config/best_config.sh b/apps/kg/config/best_config.sh
index 31e917067b89..f5279c8115da 100644
--- a/apps/kg/config/best_config.sh
+++ b/apps/kg/config/best_config.sh
@@ -4,119 +4,226 @@
 # DistMult 1GPU
 DGLBACKEND=pytorch python3 train.py --model DistMult --dataset FB15k --batch_size 1024 \
     --neg_sample_size 256 --hidden_dim 400 --gamma 143.0 --lr 0.08 --batch_size_eval 16 \
-    --valid --test -adv --mix_cpu_gpu --eval_interval 100000 --gpu 0 --num_thread 4 --max_step 40000
+    --valid --test -adv --gpu 0 --max_step 40000
+
 # DistMult 8GPU
 DGLBACKEND=pytorch python3 train.py --model DistMult --dataset FB15k --batch_size 1024 \
     --neg_sample_size 256 --hidden_dim 400 --gamma 143.0 --lr 0.08 --batch_size_eval 16 \
-    --valid --test -adv --mix_cpu_gpu --eval_interval 100000 --num_proc 8 --gpu 0 1 2 3 4 5 6 7 \
-    --max_step 10000 --num_thread 4 --rel_part --async_update
+    --valid --test -adv --max_step 5000 --mix_cpu_gpu --num_proc 8 \
+    --gpu 0 1 2 3 4 5 6 7 --async_update --soft_rel_part --force_sync_interval 1000
 
 # ComplEx 1GPU
 DGLBACKEND=pytorch python3 train.py --model ComplEx --dataset FB15k --batch_size 1024 \
-    --neg_sample_size 256 --hidden_dim 400 --gamma 143.0 --lr 0.1 --regularization_coef 2.00E-06 \
-    --batch_size_eval 16 --valid --test -adv --mix_cpu_gpu --eval_interval 100000 \
-    --gpu 0 --num_thread 4 --max_step 32000
+    --neg_sample_size 1024 --hidden_dim 400 --gamma 143.0 --lr 0.1 \
+    --regularization_coef 2.00E-06 --batch_size_eval 16 --valid --test -adv --gpu 0 \
+    --max_step 32000
+
 # ComplEx 8GPU
 DGLBACKEND=pytorch python3 train.py --model ComplEx --dataset FB15k --batch_size 1024 \
-    --neg_sample_size 256 --hidden_dim 400 --gamma 143.0 --lr 0.1 --regularization_coef 2.00E-06 \
-    --batch_size_eval 16 --valid --test -adv --mix_cpu_gpu --eval_interval 100000 --num_proc 8 \
-    --gpu 0 1 2 3 4 5 6 7 --max_step 4000 --num_thread 4 --rel_part --async_update
+    --neg_sample_size 1024 --hidden_dim 400 --gamma 143.0 --lr 0.1 \
+    --regularization_coef 2.00E-06 --batch_size_eval 16 --valid --test -adv \
+    --max_step 4000 --mix_cpu_gpu --num_proc 8 --gpu 0 1 2 3 4 5 6 7 --async_update \
+    --soft_rel_part --force_sync_interval 1000
 
 # TransE_l1 1GPU
 DGLBACKEND=pytorch python3 train.py --model TransE_l1 --dataset FB15k --batch_size 1024 \
-    --neg_sample_size 64 --regularization_coef 1e-07 --hidden_dim 400 --gamma 16.0 --lr 0.01 \
-    --batch_size_eval 16 --valid --test -adv --mix_cpu_gpu --eval_interval 100000 \
-    --gpu 0 --num_thread 4 --max_step 48000
+    --neg_sample_size 64 --regularization_coef 1e-07 --hidden_dim 400 --gamma 16.0 \
+    --lr 0.01 --batch_size_eval 16 --valid --test -adv --gpu 0 --max_step 48000
+
 # TransE_l1 8GPU
 DGLBACKEND=pytorch python3 train.py --model TransE_l1 --dataset FB15k --batch_size 1024 \
-    --neg_sample_size 64 --regularization_coef 1e-07 --hidden_dim 400 --gamma 16.0 --lr 0.01 \
-    --batch_size_eval 16 --valid --test -adv --mix_cpu_gpu --eval_interval 100000 --num_proc 8 \
-    --gpu 0 1 2 3 4 5 6 7 --max_step 6000 --num_thread 4 --rel_part --async_update
+    --neg_sample_size 64 --regularization_coef 1e-07 --hidden_dim 400 --gamma 16.0 \
+    --lr 0.01 --batch_size_eval 16 --valid --test -adv --max_step 6000 --mix_cpu_gpu \
+    --num_proc 8 --gpu 0 1 2 3 4 5 6 7 --async_update --soft_rel_part \
+    --force_sync_interval 1000
 
 # TransE_l2 1GPU
 DGLBACKEND=pytorch python3 train.py --model TransE_l2 --dataset FB15k --batch_size 1024 \
-    --neg_sample_size 256 --hidden_dim 2000 --gamma 12.0 --lr 0.1 --max_step 30000 \
-    --batch_size_eval 16 --gpu 0 --valid --test -adv --num_thread 4 --regularization_coef=2e-7 
+    --neg_sample_size 256 --regularization_coef=1e-9 --hidden_dim 400 --gamma 19.9 \
+    --lr 0.25 --batch_size_eval 16 --valid --test -adv --gpu 0 --max_step 32000
+
+# TransE_l2 8GPU
+DGLBACKEND=pytorch python3 train.py --model TransE_l2 --dataset FB15k --batch_size 1024 \
+    --neg_sample_size 256 --regularization_coef=1e-9 --hidden_dim 400 --gamma 19.9 \
+    --lr 0.25 --batch_size_eval 16 --valid --test -adv --gpu 0 --max_step 4000 \
+    --mix_cpu_gpu --num_proc 8 --gpu 0 1 2 3 4 5 6 7 --async_update --soft_rel_part \
+    --force_sync_interval 1000
 
 # RESCAL 1GPU
 DGLBACKEND=pytorch python3 train.py --model RESCAL --dataset FB15k --batch_size 1024 \
-    --neg_sample_size 256 --hidden_dim 500 --gamma 24.0 --lr 0.03 --max_step 30000 \
-    --batch_size_eval 16 --gpu 0 --num_thread 4 --valid --test -adv
+    --neg_sample_size 256 --hidden_dim 500 --gamma 24.0 --lr 0.03 --batch_size_eval 16 \
+    --gpu 0 --valid --test -adv --max_step 30000
+
+# RESCAL 8GPU
+DGLBACKEND=pytorch python3 train.py --model RESCAL --dataset FB15k --batch_size 1024 \
+    --neg_sample_size 256 --hidden_dim 500 --gamma 24.0 --lr 0.03 --batch_size_eval 16 \
+    --valid --test -adv --max_step 4000 --mix_cpu_gpu --num_proc 8 \
+    --gpu 0 1 2 3 4 5 6 7 --async_update --soft_rel_part --force_sync_interval 1000
 
 # TransR 1GPU
 DGLBACKEND=pytorch python3 train.py --model TransR --dataset FB15k --batch_size 1024 \
-    --neg_sample_size 256 --regularization_coef 5e-8 --hidden_dim 200 --gamma 8.0 --lr 0.015 \
-    --batch_size_eval 16 --valid --test -adv --mix_cpu_gpu --eval_interval 100000 \
-    --gpu 0 --num_thread 4 --max_step 32000
+    --neg_sample_size 256 --regularization_coef 5e-8 --hidden_dim 200 --gamma 8.0 \
+    --lr 0.015 --batch_size_eval 16 --valid --test -adv --gpu 0 --max_step 32000
+
 # TransR 8GPU
 DGLBACKEND=pytorch python3 train.py --model TransR --dataset FB15k --batch_size 1024 \
-    --neg_sample_size 256 --regularization_coef 5e-8 --hidden_dim 200 --gamma 8.0 --lr 0.015 \
-    --batch_size_eval 16 --valid --test -adv --mix_cpu_gpu --eval_interval 100000 --num_proc 8 \
-    --gpu 0 1 2 3 4 5 6 7 --max_step 4000 --num_thread 4 --rel_part --async_update
+    --neg_sample_size 256 --regularization_coef 5e-8 --hidden_dim 200 --gamma 8.0 \
+    --lr 0.015 --batch_size_eval 16 --valid --test -adv --max_step 4000 --mix_cpu_gpu \
+    --num_proc 8 --gpu 0 1 2 3 4 5 6 7 --async_update --soft_rel_part \
+    --force_sync_interval 1000
 
 # RotatE 1GPU
 DGLBACKEND=pytorch python3 train.py --model RotatE --dataset FB15k --batch_size 2048 \
-    --neg_sample_size 256 --regularization_coef 1e-07 --hidden_dim 200 --gamma 12.0 --lr 0.009 \
-    --batch_size_eval 16 --valid --test -adv --mix_cpu_gpu --eval_interval 100000 -de \
-    --mix_cpu_gpu --num_thread 4 --max_step 40000 --gpu 0
+    --neg_sample_size 256 --regularization_coef 1e-07 --hidden_dim 200 --gamma 12.0 \
+    --lr 0.009 --batch_size_eval 16 --valid --test -adv -de --max_step 20000 \
+    --neg_deg_sample --gpu 0
 
 # RotatE 8GPU
-DGLBACKEND=pytorch python3 train.py --model RotatE --dataset FB15k --batch_size 2048 \
-    --neg_sample_size 256 --regularization_coef 1e-07 --hidden_dim 200 --gamma 12.0 --lr 0.009 \
-    --batch_size_eval 16 --valid --test -adv --mix_cpu_gpu --eval_interval 100000 -de \
-    --mix_cpu_gpu --max_step 5000 --num_proc 8 --gpu 0 1 2 3 4 5 6 7 \
-    --num_thread 4 --rel_part --async_update
+DGLBACKEND=pytorch python3 train.py --model RotatE --dataset FB15k --batch_size 1024 \
+    --neg_sample_size 256 --regularization_coef 1e-07 --hidden_dim 200 --gamma 12.0 \
+    --lr 0.009 --batch_size_eval 16 --valid --test -adv -de --max_step 2500 \
+    --neg_deg_sample --mix_cpu_gpu --num_proc 8 --gpu 0 1 2 3 4 5 6 7 --async_update \
+    --soft_rel_part --force_sync_interval 1000
 
 # for wn18
-DGLBACKEND=pytorch python3 train.py --model TransE_l1 --dataset wn18 --batch_size 1024 \
-    --neg_sample_size 512 --hidden_dim 500 --gamma 12.0 --adversarial_temperature 0.5 \
-    --lr 0.01 --max_step 40000 --batch_size_eval 16 --gpu 0 --valid --test -adv \
-    --regularization_coef 0.00001
+# DistMult 1GPU 
+DGLBACKEND=pytorch python3 train.py --model DistMult --dataset wn18 --batch_size 2048 \
+    --neg_sample_size 128 --regularization_coef 1e-06 --hidden_dim 512 --gamma 20.0 \
+    --lr 0.14 --batch_size_eval 16 --valid --test -adv --gpu 0 --max_step 20000
+
+# DistMult 8GPU 
+DGLBACKEND=pytorch python3 train.py --model DistMult --dataset wn18 --batch_size 2048 \
+    --neg_sample_size 128 --regularization_coef 1e-06 --hidden_dim 512 --gamma 20.0 \
+    --lr 0.14 --batch_size_eval 16 --valid --test -adv --gpu 0 --max_step 2500 \
+    --mix_cpu_gpu --num_proc 8 --gpu 0 1 2 3 4 5 6 7 --async_update \
+    --force_sync_interval 1000
 
+# ComplEx 1GPU
+DGLBACKEND=pytorch python3 train.py --model ComplEx --dataset wn18 --batch_size 1024 \
+    --neg_sample_size 1024 --regularization_coef 0.00001 --hidden_dim 512 --gamma 200.0 \
+    --lr 0.1 --batch_size_eval 16 --valid --test -adv --gpu 0 --max_step 20000
+
+# ComplEx 8GPU 
+DGLBACKEND=pytorch python3 train.py --model ComplEx --dataset wn18 --batch_size 1024 \
+    --neg_sample_size 1024 --regularization_coef 0.00001 --hidden_dim 512 --gamma 200.0 \
+    --lr 0.1 --batch_size_eval 16 --valid --test -adv --gpu 0 --max_step 2500 \
+    --mix_cpu_gpu --num_proc 8 --gpu 0 1 2 3 4 5 6 7 --async_update \
+    --force_sync_interval 1000
+
+# TransE_l1 1GPU
+DGLBACKEND=pytorch python3 train.py --model TransE_l1 --dataset wn18 --batch_size 2048 \
+    --neg_sample_size 128 --regularization_coef 2e-07 --hidden_dim 512 --gamma 12.0 \
+    --lr 0.007 --batch_size_eval 16 --valid --test -adv --gpu 0 --max_step 32000
+
+# TransE_l1 8GPU
+DGLBACKEND=pytorch python3 train.py --model TransE_l1 --dataset wn18 --batch_size 2048 \
+    --neg_sample_size 128 --regularization_coef 2e-07 --hidden_dim 512 --gamma 12.0 \
+    --lr 0.007 --batch_size_eval 16 --valid --test -adv --gpu 0 --max_step 4000 \
+    --mix_cpu_gpu --num_proc 8 --gpu 0 1 2 3 4 5 6 7 --async_update \
+    --force_sync_interval 1000
+
+# TransE_l2 1GPU
 DGLBACKEND=pytorch python3 train.py --model TransE_l2 --dataset wn18 --batch_size 1024 \
-    --neg_sample_size 512 --hidden_dim 500 --gamma 6.0 --lr 0.1 --max_step 20000 \
-    --batch_size_eval 16 --gpu 0 --valid --test -adv --regularization_coef 0.0000001
+    --neg_sample_size 256 --regularization_coef 0.0000001 --hidden_dim 512 --gamma 6.0 \
+    --lr 0.1 --batch_size_eval 16 --valid --test -adv --gpu 0 --max_step 32000
 
-DGLBACKEND=pytorch python3 train.py --model DistMult --dataset wn18 --batch_size 1024 \
-    --neg_sample_size 1024 --hidden_dim 1000 --gamma 200.0 --lr 0.1 --max_step 10000 \
-    --batch_size_eval 16 --gpu 0 --valid --test -adv --regularization_coef 0.00001
+# TransE_l2 8GPU
+DGLBACKEND=pytorch python3 train.py --model TransE_l2 --dataset wn18 --batch_size 1024 \
+    --neg_sample_size 256 --regularization_coef 0.0000001 --hidden_dim 512 --gamma 6.0 \
+    --lr 0.1 --batch_size_eval 16 --valid --test -adv --gpu 0 --max_step 4000 \
+    --mix_cpu_gpu --num_proc 8 --gpu 0 1 2 3 4 5 6 7 --async_update \
+    --force_sync_interval 1000
 
-DGLBACKEND=pytorch python3 train.py --model ComplEx --dataset wn18 --batch_size 1024 \
-    --neg_sample_size 1024 --hidden_dim 500 --gamma 200.0 --lr 0.1 --max_step 20000 \
-    --batch_size_eval 16 --gpu 0 --valid --test -adv --regularization_coef 0.00001
+# RESCAL 1GPU
+DGLBACKEND=pytorch python3 train.py --model RESCAL --dataset wn18 --batch_size 1024 \
+    --neg_sample_size 256 --hidden_dim 250 --gamma 24.0 --lr 0.03 --batch_size_eval 16 \
+    --valid --test -adv --gpu 0 --max_step 20000
 
+# RESCAL 8GPU
 DGLBACKEND=pytorch python3 train.py --model RESCAL --dataset wn18 --batch_size 1024 \
-    --neg_sample_size 256 --hidden_dim 250 --gamma 24.0 --lr 0.03 --max_step 20000 \
-    --batch_size_eval 16 --gpu 0 --valid --test -adv
+    --neg_sample_size 256 --hidden_dim 250 --gamma 24.0 --lr 0.03 --batch_size_eval 16 \
+    --valid --test -adv --gpu 0 --max_step 2500  --mix_cpu_gpu --num_proc 8 \
+    --gpu 0 1 2 3 4 5 6 7 --async_update --force_sync_interval 1000 --soft_rel_part
 
+# TransR 1GPU
 DGLBACKEND=pytorch python3 train.py --model TransR --dataset wn18 --batch_size 1024 \
-    --neg_sample_size 256 --hidden_dim 500 --gamma 16.0 --lr 0.1 --max_step 30000 \
-    --batch_size_eval 16 --gpu 0 --valid --test -adv
+    --neg_sample_size 256 --hidden_dim 250 --gamma 16.0 --lr 0.1 --batch_size_eval 16 \
+    --valid --test -adv --gpu 0 --max_step 30000
 
-DGLBACKEND=pytorch python3 train.py --model RotatE --dataset wn18 --batch_size 1024 \
-    --neg_sample_size 256 --hidden_dim 400 --gamma 12.0 --lr 0.02 --max_step 20000 \
-    --batch_size_eval 16 --gpu 0 --valid --test -adv -de
+# TransR 8GPU
+DGLBACKEND=pytorch python3 train.py --model TransR --dataset wn18 --batch_size 1024 \
+    --neg_sample_size 256 --hidden_dim 250 --gamma 16.0 --lr 0.1 --batch_size_eval 16 \
+    --valid --test -adv --max_step 2500  --mix_cpu_gpu --num_proc 8 \
+    --gpu 0 1 2 3 4 5 6 7 --async_update --force_sync_interval 1000 --soft_rel_part
 
-# for Freebase
+# RotatE 1GPU
+DGLBACKEND=pytorch python3 train.py --model RotatE --dataset wn18 --batch_size 2048 \
+    --neg_sample_size 64 --regularization_coef 2e-07 --hidden_dim 256 --gamma 9.0 \
+    --lr 0.0025 -de --batch_size_eval 16 --neg_deg_sample --valid --test -adv --gpu 0 \
+    --max_step 24000 
 
+# RotatE 8GPU
+DGLBACKEND=pytorch python3 train.py --model RotatE --dataset wn18 --batch_size 2048 \
+    --neg_sample_size 64 --regularization_coef 2e-07 --hidden_dim 256 --gamma 9.0 \
+    --lr 0.0025 -de --batch_size_eval 16 --neg_deg_sample --valid --test -adv \
+    --max_step 3000 --mix_cpu_gpu --num_proc 8 --gpu 0 1 2 3 4 5 6 7 --async_update \
+    --force_sync_interval 1000
+
+# for Freebase multi-process-cpu
+# TransE_l2
+DGLBACKEND=pytorch python3 train.py --model TransE_l2 --dataset Freebase --batch_size 1000 \
+    --neg_sample_size 200 --hidden_dim 400 --gamma 10 --lr 0.1 --max_step 50000 \
+    --log_interval 100 --batch_size_eval 1000 --neg_sample_size_eval 1000 --test -adv \
+    --regularization_coef 1e-9 --num_thread 1 --num_proc 48
+
+# DistMult
+DGLBACKEND=pytorch python3 train.py --model DistMult --dataset Freebase --batch_size 1024 \
+    --neg_sample_size 256 --hidden_dim 400 --gamma 143.0 --lr 0.08 --max_step 50000 \
+    --log_interval 100 --batch_size_eval 1000 --neg_sample_size_eval 1000 --test -adv \
+    --num_thread 1 --num_proc 48
+
+# ComplEx
 DGLBACKEND=pytorch python3 train.py --model ComplEx --dataset Freebase --batch_size 1024 \
-    --neg_sample_size 256 --hidden_dim 400 --gamma 500.0 --lr 0.1 --max_step 50000 \
-    --batch_size_eval 128 --test -adv --eval_interval 300000 --num_thread 1 \
-    --neg_sample_size_eval 100000 --eval_percent 0.02 --num_proc 48
+    --neg_sample_size 256 --hidden_dim 400 --gamma 143.0 --lr 0.1 --max_step 50000 \
+    --log_interval 100 --batch_size_eval 1000 --neg_sample_size_eval 1000 --test -adv \
+    --num_thread 1 --num_proc 48
 
 # Freebase multi-gpu
-# TransE_l2 8 GPU
-DGLBACKEND=pytorch python3 train.py --model TransE_l2 --dataset Freebase --batch_size 1024 \
-    --neg_sample_size 256 --hidden_dim 400 --gamma 10 --lr 0.1 --batch_size_eval 1000 \
-    --valid --test -adv --mix_cpu_gpu --neg_deg_sample_eval --neg_sample_size_eval 1000 \
-    --num_proc 8 --gpu 0 1 2 3 4 5 6 7 --num_thread 4 --regularization_coef 1e-9 \
-    --no_eval_filter --max_step 400000 --rel_part --eval_interval 100000 --log_interval 10000 \
-    --no_eval_filter --async_update --neg_deg_sample --force_sync_interval 1000
-
-# TransE_l2 16 GPU
-DGLBACKEND=pytorch python3 train.py --model TransE_l2 --dataset Freebase --batch_size 1024 \
-    --neg_sample_size 256 --hidden_dim 400 --gamma 10 --lr 0.1 --batch_size_eval 1000 \
-    --valid --test -adv --mix_cpu_gpu --neg_deg_sample_eval --neg_sample_size_eval 1000 \
-    --num_proc 16 --gpu 0 1 2 3 4 5 6 7 --num_thread 4 --regularization_coef 1e-9 \
-    --no_eval_filter --max_step 200000 --soft_rel_part --eval_interval 100000 --log_interval 10000 \
-    --no_eval_filter --async_update --neg_deg_sample --force_sync_interval 1000
+# TransE_l2 8GPU
+DGLBACKEND=pytorch python3 train.py --model TransE_l2 --dataset Freebase --batch_size 1000 \
+    --neg_sample_size 200 --hidden_dim 400 --gamma 10 --lr 0.1 --regularization_coef 1e-9 \
+    --batch_size_eval 1000 --valid --test -adv --mix_cpu_gpu --num_proc 8 \
+    --gpu 0 1 2 3 4 5 6 7 --max_step 320000 --neg_sample_size_eval 1000 --eval_interval \
+    100000 --log_interval 10000 --async_update --soft_rel_part --force_sync_interval 10000
+
+# DistMult 8GPU
+DGLBACKEND=pytorch python3 train.py --model DistMult --dataset Freebase --batch_size 1024 \
+    --neg_sample_size 256 --hidden_dim 400 --gamma 143.0 --lr 0.08 --batch_size_eval 1000 \
+    --valid --test -adv --mix_cpu_gpu --num_proc 8 --gpu 0 1 2 3 4 5 6 7 --max_step 300000 \
+    --neg_sample_size_eval 1000 --eval_interval 100000 --log_interval 10000 --async_update \
+    --soft_rel_part --force_sync_interval 10000
+
+# ComplEx 8GPU
+DGLBACKEND=pytorch python3 train.py --model ComplEx --dataset Freebase --batch_size 1024 \
+    --neg_sample_size 256 --hidden_dim 400 --gamma 143 --lr 0.1 \
+    --regularization_coef 2.00E-06 --batch_size_eval 1000 --valid --test -adv \
+    --mix_cpu_gpu --num_proc 8 --gpu 0 1 2 3 4 5 6 7 --max_step 360000 \
+    --neg_sample_size_eval 1000 --eval_interval 100000 --log_interval 10000 \
+    --async_update --soft_rel_part --force_sync_interval 10000
+
+# TransR 8GPU
+DGLBACKEND=pytorch python3 train.py --model TransR --dataset Freebase --batch_size 1024 \
+    --neg_sample_size 256 --regularization_coef 5e-8 --hidden_dim 200 --gamma 8.0 \
+    --lr 0.015 --batch_size_eval 1000 --valid --test -adv --mix_cpu_gpu --num_proc 8 \
+    --gpu 0 1 2 3 4 5 6 7 --max_step 300000 --neg_sample_size_eval 1000 \
+    --eval_interval 100000 --log_interval 10000 --async_update --soft_rel_part \
+    --force_sync_interval 10000
+
+# RotatE 8GPU
+DGLBACKEND=pytorch python3 train.py --model RotatE --dataset Freebase --batch_size 1024 \
+    --neg_sample_size 256 -de --hidden_dim 200 --gamma 12.0 --lr 0.01 \
+    --regularization_coef 1e-7 --batch_size_eval 1000 --valid --test -adv --mix_cpu_gpu \
+    --num_proc 8 --gpu 0 1 2 3 4 5 6 7 --max_step 300000 --neg_sample_size_eval 1000 \
+    --eval_interval 100000 --log_interval 10000 --async_update --soft_rel_part \
+    --force_sync_interval 10000
+