Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import MS/MS spectrum data from various sources #23

Open
2 of 3 tasks
jorainer opened this issue Jul 5, 2018 · 8 comments
Open
2 of 3 tasks

Import MS/MS spectrum data from various sources #23

jorainer opened this issue Jul 5, 2018 · 8 comments

Comments

@jorainer
Copy link
Member

jorainer commented Jul 5, 2018

Import MS/MS spectrum data from different databases:

Seems ChEBI, PubChem and Lipid maps don't provide spectra; can you confirm @stanstrup @michaelwitting ?

jorainer added a commit that referenced this issue Jul 5, 2018
- Add function to import MS/MS spectra from spectrum xml files from HMDB (issue
  #23).
- Add related unit tests, test file and documentation.
@stanstrup
Copy link
Collaborator

stanstrup commented Jul 5, 2018

Yes, those don't have spectra.

@jorainer
Copy link
Member Author

jorainer commented Aug 8, 2018

Little complication from HMDB: HMDB provides one xml file for each spectrum associated with a compound. Now it can be that the same spectrum (same values) are associated to different compounds. HMDB uses the same spectrum_id, but provides two (or more) xml files, one for each compound_id.

Complicated solution to handle this would be:

  • insert only unique spectra to the spectrum table.
  • Add an additional table providing the mapping between spectrum and compound tables (to handle the n:m mapping).

Disadvantage: queries are more complicated, possibly slower.

Simple solution:

  • insert each spectrum as it is provided, but assigning own, internal and unique, IDs to each spectrum.

@michaelwitting
Copy link
Collaborator

I have at least a function that can read from MassBank records. Well that doesn't help with MoNA but with all other MassBank related records.
Check my masstrixR package later that day. There is a branch called masstrixR_RaMoNA_merge. It is based on our in-house tool MassTRIX [1]. There might be also some other usefull functions we can use / reuse.

[1] http://dx.plos.org/10.1371/journal.pone.0039860

@jorainer
Copy link
Member Author

jorainer commented Nov 9, 2018

Cool! thanks for your input @michaelwitting . I had a look at the MoNa SDF file and it should be straight forward to extract all relevant information (compound annotations and spectra) from that. It's just a bummer that every database/resource uses own identifiers and nomenclature.

@michaelwitting
Copy link
Collaborator

Well, this is why I'm working here with the MassBank record format. It is rich in metadata and human readable, but also easy to parse due to partially controlled variables. The functions I wrote are reading from this format to a Spectra object and then will also write to a MassBank record.

By the way: I implemented two new spectra comparison methods in masstrixR. One is a standards forward score aka dot product, but aligns the spectra instead of binning and the second one is a reverse score (reverse dot product), which uses only peaks that are in the library spectrum. If the match is good both should be quite high, if the forward is low and the backward high then you have a lot "contaminating" peaks in your query spectrum or it is just the wrong hit.

@jorainer
Copy link
Member Author

jorainer commented Nov 9, 2018

when you talk MassBank record format - where do you get that data? Is it from https://github.com/MassBank/MassBank-data ? apparently not MoNa...

@michaelwitting
Copy link
Collaborator

Yes, for example. We use also the MassBank format for our internal database.
Regarding the MoNA JSON: It is very inconsistent. When I read some data from their webservice I'm having difficulties to get the entries I would like to access. Not every library they have has exactly the same format. Maybe it is different with the JSON files...

@jorainer
Copy link
Member Author

Import of open data from MassBank is discussed in issue #34.

@jorainer jorainer mentioned this issue Dec 3, 2018
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants