-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AssertionError: Probabilities don't sum to 1 #79
Comments
Hi Artur, see my response in this other github issue and let me know if that doesn’t fix it! #75 |
Hi Avanti, Great insight. I've looked through all the peak files I'm passing to So it's not that. Unless maybe Thanks! Edit: I would like to note that I do not get any division by 0 errors. |
Hmm, I am not very familiar with the "interpret" function of basepairmodels. Can you link me to the relevant function definition if you have it handy? I assume your hunch is correct that there might be some manipulation of the regions under-the-hood. Technically this is easy to work around by simply renormalizing the PPM that is passed to the "information content" function so that the rows sum to 1, and I wrote a workaround to that effect (the code now prints a warning rather than throwing an error: #80), but it's worth tracking down exactly what's causing the behavior. |
@zahoorz are you able to point me to the lines of code corresponding to the "interpret" function of basepairmodels? |
@BeyondTheProof in the meantime, can you try out the workaround and see if it satisfies your needs? |
@AvantiShri The workaround works! I've narrowed down the problem. The one-hot encoding function I was using converts Ns to So this doesn't solve the problem. There may, however, be a step where the probabilities are normalized in certain peaks but not others. It's likely it's due to something involving |
Glad the workaround works - I think I'm a bit confused why you said "However, after checking, plenty of the regions have Ns in them, so I don't think it's that" - do you say "I don't think it's that" because this is the first time you are getting the error, and your previous runs didn't get the error? The error only occurs if a region containing N's winds up inside or near a motif (I write "near a motif" because tfmodico includes "flank expansion" steps where the seqlets are expanded to include potentially relevant signal in the flanks). In most cases, your motif instances won't be near the N's, and so it makes sense to me that you didn't see this error for most runs! |
I see, that makes a lot of sense. So yes, I think it only came up with the H3K9me3 because of the abundance of repeats. These are going to be much more likely to be near |
Seems like you managed to nail it down! |
Hi Avanti, I've reopened this issue to fix a small bug. You had a |
Hi Avanti, thanks so much for making TF-MoDISco available to the community. We're finding some interesting things already!
I've run it multiple times on different datasets, and never had a problem until now. Do you know how to address this error?
I tried updating to the latest version, but the error persists. If need be, I can share the importances file with you directly. Thanks so much!
Artur
The text was updated successfully, but these errors were encountered: