-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LSTM Learnable Hidden State #899
Conversation
Fixes forward pass in models without learnable initial hidden state Fixes loading serialized models
I've fixed some small issues with the code, so the unit tests now run through. I've also done a quick experiment using learnable hidden states but I cannot say if this makes much of a difference. |
I think it can be a result of one of three reasons. First one is if you tried with Conll-03, accuracy is already too high and default network is shallow, maybe benefits become unobservable. How about trying harder tasks with larger datasets using networks with larger capabilities? Secondly, our initialization might be a problem. It can get better with some experimentation. Thirdly, maybe it actually just does not make much difference :) |
That's true :) Would you like to experiment with initialization, or should we go ahead and merge this as it is? |
Let's just merge it as it is, and maybe change default parameter to false for not causing problems. And later if I can find some time, I will try to experiment and put the results here. At the same time, we can comment on the code so that maybe other people put their results about it. |
👍 |
1 similar comment
👍 |
Learnable hidden state feature, I did not thoroughly test whether accuracy for sequence_tagger gets better or not.
@alanakbik not sure how to initialize the hidden state though, just used rand for now
(Forgive me if I skipped some steps about the rules of PR, appreciate your guidance)