# Detection of true features on Yeast2021 dataset by different software algorithms 

This is on the Yeast Neg data from Chen, Li, et al. "Metabolite discovery through global annotation of untargeted metabolomics data." Nature methods 18.11 (2021): 1377-1385.

- XCMS v3.18.0 (R 4.2.0)
- MZmine 2.53 (peak detection via centwave and local mininum); 3.3.0.
- MS-Dial v4.90
- asari 1.10.6

SL 2023-01-20

In [1]:
!pip install --upgrade -q asari-metabolomics

In [2]:
from asari.tools import match_features as mf

In [4]:
# get feature lists from diff tools
# use seconds for retention time

true_ = mf.get_featureList('manual_certified.txt', start_row=1, mz_col=1, rt_col=2, sep='\t')
print(len(true_), '\n', true_[3])
for x in true_:
    x['rtime'] = x['rtime']*60

asari_ = mf.get_featureList('asari1.10.6_default_full_Feature_table.tsv', start_row=1, mz_col=1, rt_col=2, sep='\t')
print(len(asari_), '\n', asari_[3])

xcms_ = mf.get_featureList('YeastNeg2021_NetID_XCMS_featureTable.csv', start_row=1, mz_col=1, rt_col=4, sep=',')
print(len(xcms_), '\n', xcms_[3])

mzmine_ = mf.get_featureList('yeast2021_MZmine2.53_wavelets_featureTable.csv', start_row=1, mz_col=1, rt_col=2, sep=',')
print(len(mzmine_), '\n', mzmine_[3])
for x in mzmine_:
    x['rtime'] = x['rtime']*60

mzmine3_ = mf.get_featureList('yeast2021_mzmine3.csv', start_row=1,mz_col=4, rt_col=1, sep=',')
print(len(mzmine3_), '\n', mzmine3_[3])
for x in mzmine3_:
    x['rtime'] = x['rtime']*60    
    
mzmine_L_ = mf.get_featureList('yeast2021_MZmine2.53_localminium_featureTable.csv', start_row=1, mz_col=1, rt_col=2, sep=',')
print(len(mzmine_L_), '\n', mzmine_L_[3])
for x in mzmine_L_:
    x['rtime'] = x['rtime']*60
    
msdial_ = mf.get_featureList('yeast2021-Area_0_20231191747.txt', start_row=6, mz_col=2, rt_col=1, sep='\t')
print(len(msdial_), msdial_[3])
for x in msdial_:
    x['rtime'] = x['rtime']*60

314 
 {'id': 'row5', 'mz': 88.04032, 'rtime': 13.01224}
5341 
 {'id': 'row5', 'mz': 108.0217, 'rtime': 175.23}
6043 
 {'id': 'row5', 'mz': 71.0114441447573, 'rtime': 191.529067993164}
11290 
 {'id': 'row5', 'mz': 71.05022430419922, 'rtime': 10.932333333333334}
11256 
 {'id': 'row5', 'mz': 112.9224, 'rtime': 1.01}
18153 
 {'id': 'row5', 'mz': 71.01144409179688, 'rtime': 3.1790000000000003}
4166 {'id': 'row10', 'mz': 71.01381, 'rtime': 4.709}


In [6]:
def compare(list1, list2):
    '''compare matches and print unmatched in list1.
    '''
    print("\n  List based inclusive comparisons:")
    dict1, dict2 = mf.bidirectional_match(list1, list2, mz_ppm=5, rt_tolerance=6)

    print("\n  Best match comparisons:")
    valid_matches, dict1, dict2 = mf.bidirectional_best_match(list1, list2, mz_ppm=5, rt_tolerance=6)

    print("Unmatched features: ")
    unmatched = [p for p in list1 if p['id'] not in [x[0] for x in valid_matches]]
    print(unmatched)

**For detection, unique match is not required. It happened in asari that 310 is best anyway**

In [16]:
compare(true_, asari_)


  List based inclusive comparisons:
Of 314 list1 features, number of uni-direction matched features is 310.
Of 5341 list1 features, number of uni-direction matched features is 319.
    ~~~ match_numbers ~~~     

Unique Number of matched features in table 1:  302
Unique Number of matched features in table 2:  319
Biodirectional, unique Number of matched feature pairs:  302

  Best match comparisons:

    ~~~ By best rtime matches ~~~     

Of 314 list1 features, number of uni-direction matched features is 310.
Of 5341 list1 features, number of uni-direction matched features is 319.
~~~ Biodirectional, unique Number of matched feature pairs: ~~~
 310


########################################################################
    ~~~ By best m/z matches ~~~     

Of 314 list1 features, number of uni-direction matched features is 310.
Of 5341 list1 features, number of uni-direction matched features is 319.
~~~ Biodirectional, unique Number of matched feature pairs: ~~~
     310
##########

In [17]:
compare(true_, xcms_)


  List based inclusive comparisons:
Of 314 list1 features, number of uni-direction matched features is 301.
Of 6043 list1 features, number of uni-direction matched features is 303.
    ~~~ match_numbers ~~~     

Unique Number of matched features in table 1:  299
Unique Number of matched features in table 2:  303
Biodirectional, unique Number of matched feature pairs:  299

  Best match comparisons:

    ~~~ By best rtime matches ~~~     

Of 314 list1 features, number of uni-direction matched features is 301.
Of 6043 list1 features, number of uni-direction matched features is 303.
~~~ Biodirectional, unique Number of matched feature pairs: ~~~
 301


########################################################################
    ~~~ By best m/z matches ~~~     

Of 314 list1 features, number of uni-direction matched features is 301.
Of 6043 list1 features, number of uni-direction matched features is 303.
~~~ Biodirectional, unique Number of matched feature pairs: ~~~
     301
##########

In [18]:
compare(true_, mzmine_)


  List based inclusive comparisons:
Of 314 list1 features, number of uni-direction matched features is 291.
Of 11290 list1 features, number of uni-direction matched features is 292.
    ~~~ match_numbers ~~~     

Unique Number of matched features in table 1:  290
Unique Number of matched features in table 2:  292
Biodirectional, unique Number of matched feature pairs:  290

  Best match comparisons:

    ~~~ By best rtime matches ~~~     

Of 314 list1 features, number of uni-direction matched features is 291.
Of 11290 list1 features, number of uni-direction matched features is 292.
~~~ Biodirectional, unique Number of matched feature pairs: ~~~
 291


########################################################################
    ~~~ By best m/z matches ~~~     

Of 314 list1 features, number of uni-direction matched features is 291.
Of 11290 list1 features, number of uni-direction matched features is 292.
~~~ Biodirectional, unique Number of matched feature pairs: ~~~
     291
#######

In [7]:
compare(true_, mzmine3_)


  List based inclusive comparisons:
Of 314 list1 features, number of uni-direction matched features is 291.
Of 11256 list1 features, number of uni-direction matched features is 292.
    ~~~ match_numbers ~~~     

Unique Number of matched features in table 1:  290
Unique Number of matched features in table 2:  292
Biodirectional, unique Number of matched feature pairs:  290

  Best match comparisons:

    ~~~ By best rtime matches ~~~     

Of 314 list1 features, number of uni-direction matched features is 291.
Of 11256 list1 features, number of uni-direction matched features is 292.
~~~ Biodirectional, unique Number of matched feature pairs: ~~~
 291


########################################################################
    ~~~ By best m/z matches ~~~     

Of 314 list1 features, number of uni-direction matched features is 291.
Of 11256 list1 features, number of uni-direction matched features is 292.
~~~ Biodirectional, unique Number of matched feature pairs: ~~~
     291
#######

In [19]:
compare(true_, mzmine_L_)


  List based inclusive comparisons:
Of 314 list1 features, number of uni-direction matched features is 271.
Of 18153 list1 features, number of uni-direction matched features is 274.
    ~~~ match_numbers ~~~     

Unique Number of matched features in table 1:  268
Unique Number of matched features in table 2:  274
Biodirectional, unique Number of matched feature pairs:  268

  Best match comparisons:

    ~~~ By best rtime matches ~~~     

Of 314 list1 features, number of uni-direction matched features is 271.
Of 18153 list1 features, number of uni-direction matched features is 274.
~~~ Biodirectional, unique Number of matched feature pairs: ~~~
 271


########################################################################
    ~~~ By best m/z matches ~~~     

Of 314 list1 features, number of uni-direction matched features is 271.
Of 18153 list1 features, number of uni-direction matched features is 274.
~~~ Biodirectional, unique Number of matched feature pairs: ~~~
     271
#######

In [20]:
compare(true_, msdial_)


  List based inclusive comparisons:
Of 314 list1 features, number of uni-direction matched features is 265.
Of 4166 list1 features, number of uni-direction matched features is 266.
    ~~~ match_numbers ~~~     

Unique Number of matched features in table 1:  264
Unique Number of matched features in table 2:  266
Biodirectional, unique Number of matched feature pairs:  264

  Best match comparisons:

    ~~~ By best rtime matches ~~~     

Of 314 list1 features, number of uni-direction matched features is 265.
Of 4166 list1 features, number of uni-direction matched features is 266.
~~~ Biodirectional, unique Number of matched feature pairs: ~~~
 265


########################################################################
    ~~~ By best m/z matches ~~~     

Of 314 list1 features, number of uni-direction matched features is 265.
Of 4166 list1 features, number of uni-direction matched features is 266.
~~~ Biodirectional, unique Number of matched feature pairs: ~~~
     265
##########

## End

The numbers were used to construct Figure 4B.


## Extra: get matched features btw two lists

In [21]:
valid_matches, dict1, dict2 = mf.bidirectional_best_match(asari_, xcms_, mz_ppm=5, rt_tolerance=6)
print(len(valid_matches), valid_matches[:5])


    ~~~ By best rtime matches ~~~     

Of 5341 list1 features, number of uni-direction matched features is 3105.
Of 6043 list1 features, number of uni-direction matched features is 3048.
~~~ Biodirectional, unique Number of matched feature pairs: ~~~
 3013


########################################################################
    ~~~ By best m/z matches ~~~     

Of 5341 list1 features, number of uni-direction matched features is 3105.
Of 6043 list1 features, number of uni-direction matched features is 3048.
~~~ Biodirectional, unique Number of matched feature pairs: ~~~
     3016
########################################################################


3016 [('row2', 'row3'), ('row3', 'row5'), ('row4', 'row6'), ('row19', 'row774'), ('row20', 'row1118')]
