Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ModificationWithMass hashing seems odd #219

Closed
acesnik opened this issue Aug 3, 2017 · 5 comments
Closed

ModificationWithMass hashing seems odd #219

acesnik opened this issue Aug 3, 2017 · 5 comments

Comments

@acesnik
Copy link
Collaborator

acesnik commented Aug 3, 2017

The hash codes for two ModificationWithMass objects with different IDs, different accessions, and different masses are the same.

e.g.
new ModificationWithMass("mod", new Tuple<string, string>("acc", "acc"), motif, TerminusLocalization.Any, 1, null, null, null, "type")
and
new ModificationWithMass("mod2", new Tuple<string, string>("acc2", "acc2"), motif, TerminusLocalization.Any, 10, null, null, null, "type")

@stefanks
Copy link
Collaborator

stefanks commented Aug 4, 2017

Hm, looks like a bug. Good catch! Do you have a fix in mind?

@acesnik
Copy link
Collaborator Author

acesnik commented Aug 4, 2017

I'm a little uncertain about the history here. If I remember correctly, we decided not to hash on the ID for some reason, maybe basing the uniqueness on other properties. As for the monoisotopic mass, it seems like you want a little tolerance (1e-9), so I'm not sure how to handle that in a hash code. Rounding seems like an option, but that's not the same as a tolerance.

@stefanks
Copy link
Collaborator

stefanks commented Aug 4, 2017

I looked into this, and now I'm not sure this is a concern, since hash codes do not need to be unique. Unless this poses an actual problem, I vote to close this issue.

@acesnik
Copy link
Collaborator Author

acesnik commented Aug 4, 2017

The problem I'm encountering is in merging databases. Consider 2 databases both derived from the UniProt database but with additional sets of PTMs. They will have many of the same mods at positions along proteins in the databases. In merging, it would be nice to make a hash set of modifications at each position and call it a day. Instead, the hash set behaves weirdly, getting rid of ModificationWithMass
objects that have very different masses and different IDs.

We already use HashSets of Modification objects (here and here), so we should be careful here.

@acesnik
Copy link
Collaborator Author

acesnik commented Aug 4, 2017

The test I wrote yesterday works now... I think the tests weren't building properly at the end of the day yesterday. Thanks!

@acesnik acesnik closed this as completed Aug 4, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants