-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Casanovo predicts for invalid spectra #56
Comments
One solution would be to move peak processing from A disadvantage, however, is that when spectrum processing options change, the index would have to be recreated. Nevertheless, I think our spectrum processing based on best practices and we don't really vary it, so for Casanovo at least it should be pretty fixed. Additionally, it destroys the link between the indexes of the output PSMs and the input spectra. Although this is a broader issue (#70). @wfondrie What do you think? |
I'm open to this. I also think that the current way of tracking the spectrum index is pretty fragile and not ideal. Instead, that information should be saved in the index itself. Ayse started a PR in depthcharge to solve the spectrum tracking issue, but it was incomplete - I'll see if I can get it updated and integrated. |
#105 will address spectrum index tracking once completed for future reference |
Invalid spectra after preprocessing are replaced by a dummy spectrum, but Casanovo still predicts a peptide for them. The resulting predictions are naturally incorrect, consisting of long peptide sequences with low(ish) scores (but not obviously wrong).
Instead invalid spectra should be filtered out or no prediction should be given. The former is probably better, because it might be a factor during training as well. I haven't fully been able to figure out how to skip items in the dataloader though.
The text was updated successfully, but these errors were encountered: