Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

isolatePrec argument filtering MSMS peaks #56

Closed
drewszabo opened this issue Oct 26, 2022 · 3 comments
Closed

isolatePrec argument filtering MSMS peaks #56

drewszabo opened this issue Oct 26, 2022 · 3 comments

Comments

@drewszabo
Copy link

Hey Rick,

I'm trying to reduce the complexity (and file size) of my mslists by using the isolatePrec argument in patRoon::filter(mslists, ...). However, I have found that it actually isolates the precursor in both MS and MSMS peak lists, including the averagedPeakList. I wonder if this was intended, and if there is a way to only filter the MS lists alone, leaving the complete MSMS list for further analysis. Perhaps by using getDefIsolatePrecParams()?

Perhaps you can tell me if having a large MS peak list is adding any compute time to my generateFormulasSIRIUS() or generateCompoundsSIRIUS()? It would be great to reduce my compute time too.

Cheers,

@drewszabo
Copy link
Author

On the file size problem. In my project, I have >9000 features in fGroups. The mslists ends up with almost 8mil elements after filtering and a file size of 1.8GB. For features with higher m/z, the MS list is enormous, but I only require the precursor and isotopes for analysis. This takes a substantial amount of time to save and load the mslists object, I suppose this is a Windows single-threaded file system thing. Anything to help the process would be great.

@rickhelmus
Copy link
Owner

Hi Drew,

Many thanks for bringing this up, it seems you caught a recent regression, and I just pushed out a fix so that only MS data is filtered again.

For the size of peak lists: personally I always try to prioritize the features as much as possible before going to any of the annotation steps. (You have to be a bit inventive sometimes with this, and it can be quite specific to the type of data and study you are working with.) But if there are almost 10k feature groups I can imagine you end up with a large object. There is of course also the possibility to apply other filter steps, usually I apply the topMost filter and perhaps some relative minimum intensity. Did you already apply any of these? You could also think of applying the annotatedBy with formula annotation data, which may help a bit with subsequent compound annotation.

I am not sure how much time 'rich' MS/MS data will add to SIRIUS, but my feeling is that other steps (eg retrieving data from CSI) may take more time.

Thanks,
Rick

@drewszabo
Copy link
Author

Thanks for the fix.

And yes, I have been experimenting with different filters to reduce the number of features. I am having trouble with noisy MS peaks getting through my initial filters. I am going to try and run the extracted features through the MetaClean and NeatMS ML-approaches next (https://github.com/bihealth/NeatMS/). NeatMS has the advantage of being pre-trained and has three categories, compared to MetaClean's 2 categories.

Closing off the issue. Thanks, DS

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants