New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HMM: calculate likelihood for data stream with/without pre-calculated emission probability #2142
Conversation
…ity for data stream
…each point of time
…side ForwardAtT0()
Hey there @aabghari, thanks for writing this up. Really nice to see improvements to the HMM code. I think we should discuss what the API for this should look like first though. So let me make sure that I understand the task correctly: The goal here is to provide an interface so that a user can do streaming HMMs; e.g., pass in one time step at a time, and get a log-likelihood of the full sequence up to that point. If I understood right, this is the "user facing" method to do that:
I think maybe we could match this a little bit better to the existing API. For instance, I think it would be cleaner if the API for using HMMs in this streaming sense were just some extra parameters to the way users normally use
and in order to use it in a streaming sense, they should do something like...
Ideally, we'd want that call to Since all we need to predict the next time step is the previous log-probabilities of each state, we could use this signature:
So, that's basically exactly the same as the existing
In this way we can now also take multiple steps at once, and it matches the existing API quite closely, so that the HMM object feels natural to use both in a streaming and non-streaming setting. Let me know what you think of the idea. 👍 |
This issue has been automatically marked as stale because it has not had any recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions! 👍 |
I agree.
I disagree with this API suggestion. We need to keep track of loglikelihood value somewhere. The loglikelihood value is the accumulation over time. We cannot do this inside the API. Every time the API gets called for a new time step, we need to pass the loglikelihood value up to that time to the API and get the updated value. This is what the
|
Hey @aabghari, sorry for the slow response.
I'm wondering if maybe I don't understand the full details of what you're hoping to do with the changes, so please point it out if I've misunderstood or overlooked anything. :) The log-likelihood of a whole sequence is just the sum of log-likelihoods of each individual observation in that sequence. The log-likelihood of each individual observation, for an HMM, also depends on probabilities for the current internal state of the HMM. (And if this is the first observation in the sequence, then the probabilities for the internal state of the HMM are the initial probabilities.) So, the idea would be that the compute the log-likelihood of a part of a sequence, the accumulation would actually happen outside of the API, like this:
And for a streaming case you might do...
Maybe I overlooked something? I think that still covers the use cases you originally proposed. 👍 |
This issue has been automatically marked as stale because it has not had any recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions! 👍 |
My initial design was to calculate the accumulated likelihood inside the API that's why I have this |
This issue has been automatically marked as stale because it has not had any recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions! 👍 |
I'm sorry this has sat for so long. Everything is overwhelming... Personally I think accumulation outside the API is a little bit more flexible, and it helps us ensure that no internal state is changed during the course of a prediction. That helps us not have objects in weird state situations. Ideally, we should be able to mark this overload of For starting a new stream, actually what you could do in the example I proposed above is just call Again, sorry it took so long for this simple response---let me know if I overlooked something! 👍 |
Co-authored-by: Ryan Curtin <ryan@ratml.org>
Co-authored-by: Ryan Curtin <ryan@ratml.org>
Co-authored-by: Ryan Curtin <ryan@ratml.org>
Co-authored-by: Ryan Curtin <ryan@ratml.org>
Co-authored-by: Ryan Curtin <ryan@ratml.org>
Co-authored-by: Ryan Curtin <ryan@ratml.org>
Co-authored-by: Ryan Curtin <ryan@ratml.org>
Co-authored-by: Ryan Curtin <ryan@ratml.org>
Co-authored-by: Ryan Curtin <ryan@ratml.org>
Co-authored-by: Ryan Curtin <ryan@ratml.org>
Co-authored-by: Ryan Curtin <ryan@ratml.org>
Co-authored-by: Ryan Curtin <ryan@ratml.org>
Co-authored-by: Ryan Curtin <ryan@ratml.org>
Co-authored-by: Ryan Curtin <ryan@ratml.org>
Co-authored-by: Ryan Curtin <ryan@ratml.org>
Co-authored-by: Ryan Curtin <ryan@ratml.org>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @aabghari! There is a merge conflict but it should be easy to resolve. I found a few other small style issues, which I had hoped would be easy to add suggestions for but I can't do multiline suggestions on a phone, so there are lots of them. Sorry. :(
If you want to merge those suggestions and fix the merge conflict, feel free; if not, I'll do it during merge (when I am not using a phone :)).
Thank you for this nice support! 💯
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Second approval provided automatically after 24 hours. 👍
I went ahead and committed the suggestions and merged master; when I see that the tests pass, I'll go ahead and merge. Thanks again @aabghari! Great to finally merge this in. :) |
I was going to apply the changes but you did anyway. Thanks for merging. |
Looks like some of the builds had problems but they don't appear to be related to the changes here. |
I am introducing new APIs to address the following scenarios:
There are cases such as audio or video live stream in which one would like to calculate HMM likelihood as data is coming.
You can also have the emission probabilities pre-calculated by using a neural network trained on individual states for example.