Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training fails on sample datasets. #3

Closed
lorrp1 opened this issue Sep 5, 2020 · 6 comments
Closed

Training fails on sample datasets. #3

lorrp1 opened this issue Sep 5, 2020 · 6 comments

Comments

@lorrp1
Copy link

lorrp1 commented Sep 5, 2020

Hello, first of all i want to thank you for this repo because it is the only one i found with some recent "complex" implementation in julia.

im trying the example of tpa-lstm but it seems it is not able to forecast local minima (after a while it turn into a line moving up and down) (I’m not sure if I should try with a larger dataset)
The DSANet Instead output no "pred" (nan32) instead.
any idea?

(julia 1.5.1 and i have all the pkg used updated)

@sdobber
Copy link
Owner

sdobber commented Sep 5, 2020

Thanks for reporting, I'll have a look.

TPA-LSTM works fine for me for bigger datasets, but I've had similar issues with DSANet occasionally. The normalization layers sometimes lead to NaNs in the training loop, which makes the whole model output useless. Unfortunately, I've never gotten to the bottom of when and why exactly this happens.

@sdobber sdobber changed the title Question Training fails on sample datasets. Sep 5, 2020
@lorrp1
Copy link
Author

lorrp1 commented Sep 5, 2020

the other models give me problems as well, could it be because of different flux version?(im using flux "0.11.1" )
i cant install properly the version used in your manifest (errors during "pre-compiling")

im not getting error though, just straight lines or NaN32

@sdobber
Copy link
Owner

sdobber commented Sep 6, 2020

As far as I know, Flux 0.10 only works on Julia up to 1.4.2, that's why you can't use the version from the manifest files. When I update to Flux v0.11 and Julia 1.5.1, I can run all the files, but DSAnet gives NANs as you describe, and the training for the other models does not produce any usable results.

As for the latter, I think this might be related to some changes in recent Flux versions. There was an issue where training of recurrent neural networks was not handled properly, and there still seems to be some remaining bugs to be fleshed out (see e.g. FluxML/Flux.jl#1209 or FluxML/Flux.jl#1324). I would guess that the metaparameters in the example files (number of hidden layers etc.) are way off currently.

[Edit:] OK, now I'm having weird problems as well, with LSTnet throwing an error and DARNN running in some infinite loop. I seriously don't know yet what is causing this - I am using the same code in a bigger project where all models train fine...

@lorrp1
Copy link
Author

lorrp1 commented Sep 7, 2020

im trying with julia-1.4.2, i had some issue compiling flux, but it should be fine now
using @show LSTnet return me this:
Flux.mse(pred, target) = 1.6141138f30
Flux.mse(pred, target) = 1.6141138f30
Flux.mse(pred, target) = 1.6141138f30
Flux.mse(pred, target) = 1.6141138f30
Flux.mse(pred, target) = 1.589704f30
...
Flux.mse(pred, target) = 1.589704f30 (after some minutes)
and the chart is very off.

DSAnet return nan32.
DARNN may be working instead (im still running it, but it is very slow). edit: it is turning into a line. TPALSTM works (it turns into a line as written in the repo)

@sdobber sdobber closed this as completed in 7ef4da2 Sep 8, 2020
@sdobber
Copy link
Owner

sdobber commented Sep 8, 2020

@lorrp1 This is how far I got with fixing things. DSAnet is still broken, and I fear that the issue there is either rather complex or well hidden.

@lorrp1
Copy link
Author

lorrp1 commented Sep 9, 2020

hello @sdobber i have tested the last update:
DSAnet works sometimes
LSTnet/TPA/DARNN work now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants