Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model.predict() gives only minus ones #1003

Closed
mhavu opened this issue Nov 6, 2022 · 7 comments
Closed

Model.predict() gives only minus ones #1003

mhavu opened this issue Nov 6, 2022 · 7 comments

Comments

@mhavu
Copy link

mhavu commented Nov 6, 2022

I am new to pomegranate, and I want to build a HMM with two hidden states. I have labels, and a sequence of series of observations, most of which come from a multinomial distribution (k = 4, n = 60). (There is also one that comes from a wrapped normal distribution—see issue #1002—but it can be forgotten for now.) Any calls to predict() result in just a list of minus ones, and sometimes the warning "Sequence is impossible":

import numpy as np
from pomegranate import HiddenMarkovModel, DirichletDistribution

# label = np.array([0, 0, 0, ..., 1, 1, 1])
# data = np.array([[60, 0, 0, 0], [58, 2, 0, 0], ...])
dists = [DirichletDistribution.from_samples(data[np.where(label == 0)]),
         DirichletDistribution.from_samples(data[np.where(label == 1)])]
trans_mat = np.array([[0.999, 0.001],
                      [0.002, 0.998]])
starts = np.array([0.5, 0.5])
model = HiddenMarkovModel.from_matrix(trans_mat, dists, starts)
model.predict(data)
# [-1, -1, -1, ...]

What am I missing here? (If I create the model with HiddenMarkovModel.from_samples() instead, predict(), fit(), etc. result in a segfault, but I didn't file a bug report yet, since I think I am just doing something wrong.

@jmschrei
Copy link
Owner

jmschrei commented Nov 6, 2022

The issue is that my implementation of dirichlet distributions is wrong and bad. I am currently rewriting pomegranate using torch and fixing this issue, among several others. For now, please avoid the DirichletDistribution implementation.

@mhavu
Copy link
Author

mhavu commented Nov 7, 2022

What would you recommend for multinomial distribution instead?

@jmschrei
Copy link
Owner

jmschrei commented Nov 7, 2022

Is your data categories or is it probability vectors over classes? If the data are categories, i.e., integers specifying class, you should use DiscreteDistribution.

@mhavu
Copy link
Author

mhavu commented Nov 7, 2022

It's probability vectors over classes.

@mhavu
Copy link
Author

mhavu commented Dec 11, 2022

I didn't get very far with this. I used argmax to convert the probability vectors over classes to categories. However, there are way more state transitions than should be likely, given the probabilities in the transition matrix, and changing the transition matrix doesn't seem to have any effect. Incorporating the observations that come from a wrapped normal distribution would probably help, but I'm not quite sure how to build a mixed model where each hidden state corresponds to both a DiscreteDistribution and a VonMisesDistribution (from #1002).

@mhavu
Copy link
Author

mhavu commented Dec 18, 2022

According to #458, it is not possible to mix DiscreteDistribution with any other type of distribution in IndependentComponentsDistribution. Is there a type of distribution in pomegranate I could use for multinomial distribution (probability vectors over classes)?

@jmschrei
Copy link
Owner

Thank you for opening an issue. pomegranate has recently been rewritten from the ground up to use PyTorch instead of Cython (v1.0.0), and so all issues are being closed as they are likely out of date. Please re-open or start a new issue if a related issue is still present in the new codebase.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants