Training fails on sample datasets. #3

lorrp1 · 2020-09-05T09:31:29Z

Hello, first of all i want to thank you for this repo because it is the only one i found with some recent "complex" implementation in julia.

im trying the example of tpa-lstm but it seems it is not able to forecast local minima (after a while it turn into a line moving up and down) (I’m not sure if I should try with a larger dataset)
The DSANet Instead output no "pred" (nan32) instead.
any idea?

(julia 1.5.1 and i have all the pkg used updated)

sdobber · 2020-09-05T12:18:02Z

Thanks for reporting, I'll have a look.

TPA-LSTM works fine for me for bigger datasets, but I've had similar issues with DSANet occasionally. The normalization layers sometimes lead to NaNs in the training loop, which makes the whole model output useless. Unfortunately, I've never gotten to the bottom of when and why exactly this happens.

lorrp1 · 2020-09-05T17:34:22Z

the other models give me problems as well, could it be because of different flux version?(im using flux "0.11.1" )
i cant install properly the version used in your manifest (errors during "pre-compiling")

im not getting error though, just straight lines or NaN32

sdobber · 2020-09-06T09:04:38Z

As far as I know, Flux 0.10 only works on Julia up to 1.4.2, that's why you can't use the version from the manifest files. When I update to Flux v0.11 and Julia 1.5.1, I can run all the files, but DSAnet gives NANs as you describe, and the training for the other models does not produce any usable results.

As for the latter, I think this might be related to some changes in recent Flux versions. There was an issue where training of recurrent neural networks was not handled properly, and there still seems to be some remaining bugs to be fleshed out (see e.g. FluxML/Flux.jl#1209 or FluxML/Flux.jl#1324). I would guess that the metaparameters in the example files (number of hidden layers etc.) are way off currently.

[Edit:] OK, now I'm having weird problems as well, with LSTnet throwing an error and DARNN running in some infinite loop. I seriously don't know yet what is causing this - I am using the same code in a bigger project where all models train fine...

lorrp1 · 2020-09-07T18:27:07Z

im trying with julia-1.4.2, i had some issue compiling flux, but it should be fine now
using @show LSTnet return me this:
Flux.mse(pred, target) = 1.6141138f30
Flux.mse(pred, target) = 1.6141138f30
Flux.mse(pred, target) = 1.6141138f30
Flux.mse(pred, target) = 1.6141138f30
Flux.mse(pred, target) = 1.589704f30
...
Flux.mse(pred, target) = 1.589704f30 (after some minutes)
and the chart is very off.

DSAnet return nan32.
DARNN may be working instead (im still running it, but it is very slow). edit: it is turning into a line. TPALSTM works (it turns into a line as written in the repo)

sdobber · 2020-09-08T15:58:38Z

@lorrp1 This is how far I got with fixing things. DSAnet is still broken, and I fear that the issue there is either rather complex or well hidden.

lorrp1 · 2020-09-09T19:57:09Z

hello @sdobber i have tested the last update:
DSAnet works sometimes
LSTnet/TPA/DARNN work now

sdobber changed the title ~~Question~~ Training fails on sample datasets. Sep 5, 2020

sdobber closed this as completed in 7ef4da2 Sep 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training fails on sample datasets. #3

Training fails on sample datasets. #3

lorrp1 commented Sep 5, 2020 •

edited

Loading

sdobber commented Sep 5, 2020

lorrp1 commented Sep 5, 2020

sdobber commented Sep 6, 2020 •

edited

Loading

lorrp1 commented Sep 7, 2020 •

edited

Loading

sdobber commented Sep 8, 2020

lorrp1 commented Sep 9, 2020 •

edited

Loading

Training fails on sample datasets. #3

Training fails on sample datasets. #3

Comments

lorrp1 commented Sep 5, 2020 • edited Loading

sdobber commented Sep 5, 2020

lorrp1 commented Sep 5, 2020

sdobber commented Sep 6, 2020 • edited Loading

lorrp1 commented Sep 7, 2020 • edited Loading

sdobber commented Sep 8, 2020

lorrp1 commented Sep 9, 2020 • edited Loading

lorrp1 commented Sep 5, 2020 •

edited

Loading

sdobber commented Sep 6, 2020 •

edited

Loading

lorrp1 commented Sep 7, 2020 •

edited

Loading

lorrp1 commented Sep 9, 2020 •

edited

Loading