isolatePrec argument filtering MSMS peaks #56

drewszabo · 2022-10-26T09:21:09Z

Hey Rick,

I'm trying to reduce the complexity (and file size) of my mslists by using the isolatePrec argument in patRoon::filter(mslists, ...). However, I have found that it actually isolates the precursor in both MS and MSMS peak lists, including the averagedPeakList. I wonder if this was intended, and if there is a way to only filter the MS lists alone, leaving the complete MSMS list for further analysis. Perhaps by using getDefIsolatePrecParams()?

Perhaps you can tell me if having a large MS peak list is adding any compute time to my generateFormulasSIRIUS() or generateCompoundsSIRIUS()? It would be great to reduce my compute time too.

Cheers,

The text was updated successfully, but these errors were encountered:

drewszabo · 2022-10-26T09:43:34Z

On the file size problem. In my project, I have >9000 features in fGroups. The mslists ends up with almost 8mil elements after filtering and a file size of 1.8GB. For features with higher m/z, the MS list is enormous, but I only require the precursor and isotopes for analysis. This takes a substantial amount of time to save and load the mslists object, I suppose this is a Windows single-threaded file system thing. Anything to help the process would be great.

rickhelmus · 2022-10-26T13:18:04Z

Hi Drew,

Many thanks for bringing this up, it seems you caught a recent regression, and I just pushed out a fix so that only MS data is filtered again.

For the size of peak lists: personally I always try to prioritize the features as much as possible before going to any of the annotation steps. (You have to be a bit inventive sometimes with this, and it can be quite specific to the type of data and study you are working with.) But if there are almost 10k feature groups I can imagine you end up with a large object. There is of course also the possibility to apply other filter steps, usually I apply the topMost filter and perhaps some relative minimum intensity. Did you already apply any of these? You could also think of applying the annotatedBy with formula annotation data, which may help a bit with subsequent compound annotation.

I am not sure how much time 'rich' MS/MS data will add to SIRIUS, but my feeling is that other steps (eg retrieving data from CSI) may take more time.

Thanks,
Rick

drewszabo · 2022-10-26T13:36:56Z

Thanks for the fix.

And yes, I have been experimenting with different filters to reduce the number of features. I am having trouble with noisy MS peaks getting through my initial filters. I am going to try and run the extracted features through the MetaClean and NeatMS ML-approaches next (https://github.com/bihealth/NeatMS/). NeatMS has the advantage of being pre-trained and has three categories, compared to MetaClean's 2 categories.

Closing off the issue. Thanks, DS

rickhelmus added a commit that referenced this issue Oct 26, 2022

don't do precursor isolation for MS/MS data (regression, issue #56)

bd024b0

drewszabo closed this as completed Oct 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

isolatePrec argument filtering MSMS peaks #56

isolatePrec argument filtering MSMS peaks #56

drewszabo commented Oct 26, 2022

drewszabo commented Oct 26, 2022

rickhelmus commented Oct 26, 2022

drewszabo commented Oct 26, 2022

isolatePrec argument filtering MSMS peaks #56

isolatePrec argument filtering MSMS peaks #56

Comments

drewszabo commented Oct 26, 2022

drewszabo commented Oct 26, 2022

rickhelmus commented Oct 26, 2022

drewszabo commented Oct 26, 2022