Very high loss when finetuning #17

x-zho14 · 2022-07-19T09:47:30Z

Dear Author,

I am running your commands and find that the pretraining process seems good while the finetuning is weird. The pretraining loss is just 0.140160306.

The commands I run are

CUDA_VISIBLE_DEVICES=4 python src/main.py --output_dir experiments --comment "pretraining through imputation" --name BeijingPM25Quality_pretrained --records_file Imputation_records.xls --data_dir BeijingPM25Quality --data_class tsra --pattern TRAIN --val_ratio 0.2 --epochs 700 --lr 0.001 --optimizer RAdam --batch_size 32 --pos_encoding learnable --d_model 128

CUDA_VISIBLE_DEVICES=1 python src/main.py --output_dir experiments --comment "finetune for regression" --name BeijingPM25Quality_finetuned --records_file Regression_records.xls --data_dir BeijingPM25Quality --data_class tsra --pattern TRAIN --val_pattern TEST --epochs 200 --lr 0.001 --optimizer RAdam --pos_encoding learnable --d_model 128 --load_model /home/xzhoubi/paperreading/mvts_transformer/experiments/BeijingPM25Quality_pretrained_2022-07-19_10-27-28_tlB/checkpoints/model_best.pth --task regression --change_output --batch_size 128

Can you please help check it?

2022-07-19 17:42:53,244 | INFO : Epoch 85 Training Summary: epoch: 85.000000 | loss: 1024.587302 | 
2022-07-19 17:42:53,244 | INFO : Epoch runtime: 0.0 hours, 0.0 minutes, 4.77277946472168 seconds

2022-07-19 17:42:53,244 | INFO : Avg epoch train. time: 0.0 hours, 0.0 minutes, 4.6006609103258915 seconds
2022-07-19 17:42:53,245 | INFO : Avg batch train. time: 0.048943201173679694 seconds
2022-07-19 17:42:53,245 | INFO : Avg sample train. time: 0.00038602625527151295 seconds
Training Epoch:  42%|████████████████████▊                            | 85/200 [07:40<10:05,  5.26s/it]Training Epoch 86   0.0% | batch:         0 of        94 |       loss: 566.886
Training Epoch 86   1.1% | batch:         1 of        94        |       loss: 686.58
Training Epoch 86   2.1% | batch:         2 of        94        |       loss: 1297.63
Training Epoch 86   3.2% | batch:         3 of        94        |       loss: 976.956
Training Epoch 86   4.3% | batch:         4 of        94        |       loss: 565.19
Training Epoch 86   5.3% | batch:         5 of        94        |       loss: 809.262
Training Epoch 86   6.4% | batch:         6 of        94        |       loss: 1095.96
Training Epoch 86   7.4% | batch:         7 of        94        |       loss: 1047.49
Training Epoch 86   8.5% | batch:         8 of        94        |       loss: 782.682
Training Epoch 86   9.6% | batch:         9 of        94        |       loss: 697.767
Training Epoch 86  10.6% | batch:        10 of        94        |       loss: 900.141
Training Epoch 86  11.7% | batch:        11 of        94        |       loss: 919.351
Training Epoch 86  12.8% | batch:        12 of        94        |       loss: 782.872
Training Epoch 86  13.8% | batch:        13 of        94        |       loss: 1082.41
Training Epoch 86  14.9% | batch:        14 of        94        |       loss: 1004.29
Training Epoch 86  16.0% | batch:        15 of        94        |       loss: 960.513
Training Epoch 86  17.0% | batch:        16 of        94        |       loss: 776.499
Training Epoch 86  18.1% | batch:        17 of        94        |       loss: 995.985
Training Epoch 86  19.1% | batch:        18 of        94        |       loss: 655.607
Training Epoch 86  20.2% | batch:        19 of        94        |       loss: 733.846
Training Epoch 86  21.3% | batch:        20 of        94        |       loss: 1190.87
Training Epoch 86  22.3% | batch:        21 of        94        |       loss: 698.143
Training Epoch 86  23.4% | batch:        22 of        94        |       loss: 992.943
Training Epoch 86  24.5% | batch:        23 of        94        |       loss: 1017.47
Training Epoch 86  25.5% | batch:        24 of        94        |       loss: 696.403
Training Epoch 86  26.6% | batch:        25 of        94        |       loss: 822.942
Training Epoch 86  27.7% | batch:        26 of        94        |       loss: 935.869
Training Epoch 86  28.7% | batch:        27 of        94        |       loss: 1040.06
Training Epoch 86  29.8% | batch:        28 of        94        |       loss: 904.523
Training Epoch 86  30.9% | batch:        29 of        94        |       loss: 882.923
Training Epoch 86  31.9% | batch:        30 of        94        |       loss: 805.928
Training Epoch 86  33.0% | batch:        31 of        94        |       loss: 803.492
Training Epoch 86  34.0% | batch:        32 of        94        |       loss: 1720.69
Training Epoch 86  35.1% | batch:        33 of        94        |       loss: 778.216
Training Epoch 86  36.2% | batch:        34 of        94        |       loss: 729.644
Training Epoch 86  37.2% | batch:        35 of        94        |       loss: 1233.58
Training Epoch 86  38.3% | batch:        36 of        94        |       loss: 960.826
Training Epoch 86  39.4% | batch:        37 of        94        |       loss: 986.129
Training Epoch 86  40.4% | batch:        38 of        94        |       loss: 1316.68
Training Epoch 86  41.5% | batch:        39 of        94        |       loss: 1351.79
Training Epoch 86  42.6% | batch:        40 of        94        |       loss: 1661.48
Training Epoch 86  43.6% | batch:        41 of        94        |       loss: 956.305
Training Epoch 86  44.7% | batch:        42 of        94        |       loss: 1017.96
Training Epoch 86  45.7% | batch:        43 of        94        |       loss: 851.958
Training Epoch 86  46.8% | batch:        44 of        94        |       loss: 816.494
Training Epoch 86  47.9% | batch:        45 of        94        |       loss: 603.491
Training Epoch 86  48.9% | batch:        46 of        94        |       loss: 710.572
Training Epoch 86  50.0% | batch:        47 of        94        |       loss: 1318.47
Training Epoch 86  51.1% | batch:        48 of        94        |       loss: 905.094
Training Epoch 86  52.1% | batch:        49 of        94        |       loss: 662.117
Training Epoch 86  53.2% | batch:        50 of        94        |       loss: 850.853
Training Epoch 86  54.3% | batch:        51 of        94        |       loss: 1007.81
Training Epoch 86  55.3% | batch:        52 of        94        |       loss: 1236.99
Training Epoch 86  56.4% | batch:        53 of        94        |       loss: 809.194
Training Epoch 86  57.4% | batch:        54 of        94        |       loss: 1075.82
Training Epoch 86  58.5% | batch:        55 of        94        |       loss: 859.909
Training Epoch 86  59.6% | batch:        56 of        94        |       loss: 739.112
Training Epoch 86  60.6% | batch:        57 of        94        |       loss: 992.518
Training Epoch 86  61.7% | batch:        58 of        94        |       loss: 953.861
Training Epoch 86  62.8% | batch:        59 of        94        |       loss: 881.18
Training Epoch 86  63.8% | batch:        60 of        94        |       loss: 878.613
Training Epoch 86  64.9% | batch:        61 of        94        |       loss: 1006.92
Training Epoch 86  66.0% | batch:        62 of        94        |       loss: 728.144
Training Epoch 86  67.0% | batch:        63 of        94        |       loss: 865.157
Training Epoch 86  68.1% | batch:        64 of        94        |       loss: 895.809
Training Epoch 86  69.1% | batch:        65 of        94        |       loss: 616.984
Training Epoch 86  70.2% | batch:        66 of        94        |       loss: 893.007
Training Epoch 86  71.3% | batch:        67 of        94        |       loss: 859.431
Training Epoch 86  72.3% | batch:        68 of        94        |       loss: 1648.19
Training Epoch 86  73.4% | batch:        69 of        94        |       loss: 657.725
Training Epoch 86  74.5% | batch:        70 of        94        |       loss: 960.164
Training Epoch 86  75.5% | batch:        71 of        94        |       loss: 666.139
Training Epoch 86  76.6% | batch:        72 of        94        |       loss: 3079.8
Training Epoch 86  77.7% | batch:        73 of        94        |       loss: 802.407
Training Epoch 86  78.7% | batch:        74 of        94        |       loss: 1103.64
Training Epoch 86  79.8% | batch:        75 of        94        |       loss: 1029.07
Training Epoch 86  80.9% | batch:        76 of        94        |       loss: 1488.64
Training Epoch 86  81.9% | batch:        77 of        94        |       loss: 924.513
Training Epoch 86  83.0% | batch:        78 of        94        |       loss: 909.587
Training Epoch 86  84.0% | batch:        79 of        94        |       loss: 862.864
Training Epoch 86  85.1% | batch:        80 of        94        |       loss: 607.052
Training Epoch 86  86.2% | batch:        81 of        94        |       loss: 967.5
Training Epoch 86  87.2% | batch:        82 of        94        |       loss: 942.684
Training Epoch 86  88.3% | batch:        83 of        94        |       loss: 1217.01
Training Epoch 86  89.4% | batch:        84 of        94        |       loss: 685.092
Training Epoch 86  90.4% | batch:        85 of        94        |       loss: 949.638
Training Epoch 86  91.5% | batch:        86 of        94        |       loss: 737.985
Training Epoch 86  92.6% | batch:        87 of        94        |       loss: 1085.89
Training Epoch 86  93.6% | batch:        88 of        94        |       loss: 936.676
Training Epoch 86  94.7% | batch:        89 of        94        |       loss: 1203.51
Training Epoch 86  95.7% | batch:        90 of        94        |       loss: 677.801
Training Epoch 86  96.8% | batch:        91 of        94        |       loss: 2214.77
Training Epoch 86  97.9% | batch:        92 of        94        |       loss: 1357.56
Training Epoch 86  98.9% | batch:        93 of        94        |       loss: 1019.23

2022-07-19 17:42:57,306 | INFO : Epoch 86 Training Summary: epoch: 86.000000 | loss: 974.012262 | 
2022-07-19 17:42:57,307 | INFO : Epoch runtime: 0.0 hours, 0.0 minutes, 3.9919965267181396 seconds

2022-07-19 17:42:57,307 | INFO : Avg epoch train. time: 0.0 hours, 0.0 minutes, 4.593583417493243 seconds
2022-07-19 17:42:57,307 | INFO : Avg batch train. time: 0.04886790869673663 seconds
2022-07-19 17:42:57,307 | INFO : Avg sample train. time: 0.00038543240623370055 seconds
2022-07-19 17:42:57,307 | INFO : Evaluating on validation set ...
Evaluating Epoch 86   0.0% | batch:         0 of        40      |       loss: 7538.28
Evaluating Epoch 86   2.5% | batch:         1 of        40      |       loss: 1100.53
Evaluating Epoch 86   5.0% | batch:         2 of        40      |       loss: 2441.92
Evaluating Epoch 86   7.5% | batch:         3 of        40      |       loss: 7944.98
Evaluating Epoch 86  10.0% | batch:         4 of        40      |       loss: 2934.04
Evaluating Epoch 86  12.5% | batch:         5 of        40      |       loss: 2394.65
Evaluating Epoch 86  15.0% | batch:         6 of        40      |       loss: 8225.28
Evaluating Epoch 86  17.5% | batch:         7 of        40      |       loss: 3071.4
Evaluating Epoch 86  20.0% | batch:         8 of        40      |       loss: 3004.23
Evaluating Epoch 86  22.5% | batch:         9 of        40      |       loss: 2549.05
Evaluating Epoch 86  25.0% | batch:        10 of        40      |       loss: 5039.37
Evaluating Epoch 86  27.5% | batch:        11 of        40      |       loss: 1271.33
Evaluating Epoch 86  30.0% | batch:        12 of        40      |       loss: 7026.6
Evaluating Epoch 86  32.5% | batch:        13 of        40      |       loss: 4039.62
Evaluating Epoch 86  35.0% | batch:        14 of        40      |       loss: 1919.55
Evaluating Epoch 86  37.5% | batch:        15 of        40      |       loss: 3505.34
Evaluating Epoch 86  40.0% | batch:        16 of        40      |       loss: 5214.82
Evaluating Epoch 86  42.5% | batch:        17 of        40      |       loss: 2959.36
Evaluating Epoch 86  45.0% | batch:        18 of        40      |       loss: 2551.97
Evaluating Epoch 86  47.5% | batch:        19 of        40      |       loss: 6823
Evaluating Epoch 86  50.0% | batch:        20 of        40      |       loss: 4544.8
Evaluating Epoch 86  52.5% | batch:        21 of        40      |       loss: 1190.93
Evaluating Epoch 86  55.0% | batch:        22 of        40      |       loss: 3702.28
Evaluating Epoch 86  57.5% | batch:        23 of        40      |       loss: 3874.76
Evaluating Epoch 86  60.0% | batch:        24 of        40      |       loss: 1572.05
Evaluating Epoch 86  62.5% | batch:        25 of        40      |       loss: 3755.92
Evaluating Epoch 86  65.0% | batch:        26 of        40      |       loss: 10556.1
Evaluating Epoch 86  67.5% | batch:        27 of        40      |       loss: 3082.73
Evaluating Epoch 86  70.0% | batch:        28 of        40      |       loss: 1867.05
Evaluating Epoch 86  72.5% | batch:        29 of        40      |       loss: 10148.6
Evaluating Epoch 86  75.0% | batch:        30 of        40      |       loss: 1724.54
Evaluating Epoch 86  77.5% | batch:        31 of        40      |       loss: 1341.73
Evaluating Epoch 86  80.0% | batch:        32 of        40      |       loss: 7704.38
Evaluating Epoch 86  82.5% | batch:        33 of        40      |       loss: 7095.86
Evaluating Epoch 86  85.0% | batch:        34 of        40      |       loss: 1109.71
Evaluating Epoch 86  87.5% | batch:        35 of        40      |       loss: 5296.75
Evaluating Epoch 86  90.0% | batch:        36 of        40      |       loss: 6882.2
Evaluating Epoch 86  92.5% | batch:        37 of        40      |       loss: 2588.44
Evaluating Epoch 86  95.0% | batch:        38 of        40      |       loss: 3639.52
Evaluating Epoch 86  97.5% | batch:        39 of        40      |       loss: 11084.6
2022-07-19 17:42:58,800 | INFO : Validation runtime: 0.0 hours, 0.0 minutes, 1.4921939373016357 seconds

2022-07-19 17:42:58,800 | INFO : Avg val. time: 0.0 hours, 0.0 minutes, 1.4729443497127956 seconds
2022-07-19 17:42:58,800 | INFO : Avg batch val. time: 0.03682360874281989 seconds
2022-07-19 17:42:58,800 | INFO : Avg sample val. time: 0.0002917877079462749 seconds

The text was updated successfully, but these errors were encountered:

gzerveas · 2022-07-20T06:15:12Z

Hi, as written in the README, these values are MSE and you will have to take the square root.

Also, as I note in the README, you should consult the tables of optimal hyperparameters in the paper to achieve the best performance. For example, for this dataset, you should do the pretraining for max. 700 epochs with a batch size of 128, not 32. I have now added this value for this dataset explicitly in the README.
Finally, there is definitely variance when running experiments, and thus you would generally have to run several iterations, but in expectation get something like MSE = 2870.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Very high loss when finetuning #17

Very high loss when finetuning #17

x-zho14 commented Jul 19, 2022

gzerveas commented Jul 20, 2022 •

edited

Very high loss when finetuning #17

Very high loss when finetuning #17

Comments

x-zho14 commented Jul 19, 2022

gzerveas commented Jul 20, 2022 • edited

gzerveas commented Jul 20, 2022 •

edited