# Detection of true features on HZV dataset by different software algorithms 

Data using 3 samples only:
batch4_MT_20210729_003G, batch4_MT_20210729_003C, batch4_MT_20210729_003K

- XCMS v3.18.0 (R 4.2.0)
- MZmine 2.53 (peak detection via centwave and local mininum); 3.3.0.
- MS-Dial v4.90
- asari 1.10.6

SL 2023-01-27

In [1]:
!pip install --upgrade -q asari-metabolomics

In [2]:
from asari.tools import match_features as mf

In [3]:
true_ = mf.get_featureList('hzv029_manual_certified.txt', start_row=1, mz_col=1, rt_col=2, sep='\t')
print(len(true_), '\n', true_[:3])
for x in true_:
    x['rtime'] = x['rtime']*60

402 
 [{'id': 'row2', 'mz': 91.05447, 'rtime': 0.592}, {'id': 'row3', 'mz': 94.06543, 'rtime': 1.85}, {'id': 'row4', 'mz': 96.04469, 'rtime': 0.537}]


In [4]:
mzmine_ = mf.get_featureList('mt01_MZmine2.53_wavelets_featureTable.csv', start_row=1, mz_col=1, rt_col=2, sep=',')
print(len(mzmine_), '\n', mzmine_[:3])
for x in mzmine_:
    x['rtime'] = x['rtime']  * 60

xcms_ = mf.get_featureList('mt01_XCMS_featureTable.csv', start_row=1, mz_col=1, rt_col=4, sep=',')
print(len(xcms_), '\n', xcms_[:3])

asari_ = mf.get_featureList('mt01_full_Feature_table.tsv', start_row=1, mz_col=1, rt_col=2, sep='\t')
print(len(asari_), asari_[:3])



10199 
 [{'id': 'row2', 'mz': 81.07012176513672, 'rtime': 0.385160884566667}, {'id': 'row3', 'mz': 82.06537755330403, 'rtime': 0.34872880143333335}, {'id': 'row4', 'mz': 82.06532287597656, 'rtime': 3.85791659573333}]
6423 
 [{'id': 'row2', 'mz': 81.0701044009376, 'rtime': 22.7766265869141}, {'id': 'row3', 'mz': 82.0653690543, 'rtime': 20.441068649292}, {'id': 'row4', 'mz': 83.0606593293675, 'rtime': 170.006408691406}]
6319 [{'id': 'row2', 'mz': 81.0701, 'rtime': 15.47}, {'id': 'row3', 'mz': 81.0701, 'rtime': 23.45}, {'id': 'row4', 'mz': 140.0711, 'rtime': 130.08}]


In [5]:
mzmine_L_ = mf.get_featureList('mt01_MZmine2.53_local_featureTable.csv', start_row=1, mz_col=1, rt_col=2, sep=',')
print(len(mzmine_L_), '\n', mzmine_L_[:3])
for x in mzmine_L_:
    x['rtime'] = x['rtime']*60



6434 
 [{'id': 'row2', 'mz': 81.07012176513672, 'rtime': 0.385160884566667}, {'id': 'row3', 'mz': 83.06066004435222, 'rtime': 2.8252457976166667}, {'id': 'row4', 'mz': 84.0446891784668, 'rtime': 2.8705494704000003}]


In [6]:
mzmine3_ = mf.get_featureList('mt01_mzmine3.csv', start_row=1, mz_col=4, rt_col=1, sep=',')
print(len(mzmine3_), '\n', mzmine3_[:3])
for x in mzmine3_:
    x['rtime'] = x['rtime']  * 60

10177 
 [{'id': 'row2', 'mz': 184.0742, 'rtime': 0.01}, {'id': 'row3', 'mz': 506.3626, 'rtime': 0.01}, {'id': 'row4', 'mz': 507.3661, 'rtime': 0.01}]


In [7]:
msdial_ = mf.get_featureList('MT01-MSDIAL-Height_0_20231271135.txt', start_row=6, mz_col=2, rt_col=1, sep='\t')
print(len(msdial_), msdial_[:3])
for x in msdial_:
    x['rtime'] = x['rtime']*60

29924 [{'id': 'row7', 'mz': 80.04777, 'rtime': 1.114}, {'id': 'row8', 'mz': 80.04943, 'rtime': 0.311}, {'id': 'row9', 'mz': 80.16291, 'rtime': 2.304}]


In [8]:
def compare(list1, list2):
    '''compare matches and print unmatched in list1.
    '''
    print("\n  Best match comparisons:")
    valid_matches, dict1, dict2 = mf.bidirectional_best_match(list1, list2, mz_ppm=5, rt_tolerance=10)

    print("\n  List based inclusive comparisons:")
    dict1, dict2 = mf.bidirectional_match(list1, list2, mz_ppm=5, rt_tolerance=10)

    
    unmatched = [p for p in list1 if p['id'] not in dict1]
    print("\n\nUnmatched features ****** ", len(unmatched), "*******\n")
    # [p for p in list1 if p['id'] not in [x[0] for x in valid_matches]]
    print(unmatched)
    
def compare2(list1, list2):
    '''compare matches and print unmatched in list1.
    '''
    print("\n  List based inclusive comparisons:")
    dict1, dict2 = mf.bidirectional_match(list1, list2, mz_ppm=5, rt_tolerance=6)


    

In [9]:
compare(true_, asari_)


  Best match comparisons:

    ~~~ By best rtime matches ~~~     

Of 402 list1 features, number of uni-direction matched features is 386.
Of 6319 list1 features, number of uni-direction matched features is 467.
~~~ Biodirectional, unique Number of matched feature pairs: ~~~
 385


########################################################################
    ~~~ By best m/z matches ~~~     

Of 402 list1 features, number of uni-direction matched features is 386.
Of 6319 list1 features, number of uni-direction matched features is 467.
~~~ Biodirectional, unique Number of matched feature pairs: ~~~
     372
########################################################################



  List based inclusive comparisons:
Of 402 list1 features, number of uni-direction matched features is 386.
Of 6319 list1 features, number of uni-direction matched features is 467.
    ~~~ match_numbers ~~~     

Unique Number of matched features in table 1:  281
Unique Number of matched features in table 2:  

In [11]:
compare(true_, xcms_)


  Best match comparisons:

    ~~~ By best rtime matches ~~~     

Of 402 list1 features, number of uni-direction matched features is 340.
Of 6423 list1 features, number of uni-direction matched features is 360.
~~~ Biodirectional, unique Number of matched feature pairs: ~~~
 332


########################################################################
    ~~~ By best m/z matches ~~~     

Of 402 list1 features, number of uni-direction matched features is 340.
Of 6423 list1 features, number of uni-direction matched features is 360.
~~~ Biodirectional, unique Number of matched feature pairs: ~~~
     329
########################################################################



  List based inclusive comparisons:
Of 402 list1 features, number of uni-direction matched features is 340.
Of 6423 list1 features, number of uni-direction matched features is 360.
    ~~~ match_numbers ~~~     

Unique Number of matched features in table 1:  298
Unique Number of matched features in table 2:  

In [12]:
compare(true_, mzmine_)


  Best match comparisons:

    ~~~ By best rtime matches ~~~     

Of 402 list1 features, number of uni-direction matched features is 365.
Of 10199 list1 features, number of uni-direction matched features is 350.
~~~ Biodirectional, unique Number of matched feature pairs: ~~~
 350


########################################################################
    ~~~ By best m/z matches ~~~     

Of 402 list1 features, number of uni-direction matched features is 365.
Of 10199 list1 features, number of uni-direction matched features is 350.
~~~ Biodirectional, unique Number of matched feature pairs: ~~~
     350
########################################################################



  List based inclusive comparisons:
Of 402 list1 features, number of uni-direction matched features is 365.
Of 10199 list1 features, number of uni-direction matched features is 350.
    ~~~ match_numbers ~~~     

Unique Number of matched features in table 1:  365
Unique Number of matched features in table 2

In [13]:
compare(true_, mzmine3_)


  Best match comparisons:

    ~~~ By best rtime matches ~~~     

Of 402 list1 features, number of uni-direction matched features is 365.
Of 10177 list1 features, number of uni-direction matched features is 349.
~~~ Biodirectional, unique Number of matched feature pairs: ~~~
 349


########################################################################
    ~~~ By best m/z matches ~~~     

Of 402 list1 features, number of uni-direction matched features is 365.
Of 10177 list1 features, number of uni-direction matched features is 349.
~~~ Biodirectional, unique Number of matched feature pairs: ~~~
     349
########################################################################



  List based inclusive comparisons:
Of 402 list1 features, number of uni-direction matched features is 365.
Of 10177 list1 features, number of uni-direction matched features is 349.
    ~~~ match_numbers ~~~     

Unique Number of matched features in table 1:  365
Unique Number of matched features in table 2

In [14]:
compare(true_, mzmine_L_)


  Best match comparisons:

    ~~~ By best rtime matches ~~~     

Of 402 list1 features, number of uni-direction matched features is 372.
Of 6434 list1 features, number of uni-direction matched features is 403.
~~~ Biodirectional, unique Number of matched feature pairs: ~~~
 364


########################################################################
    ~~~ By best m/z matches ~~~     

Of 402 list1 features, number of uni-direction matched features is 372.
Of 6434 list1 features, number of uni-direction matched features is 403.
~~~ Biodirectional, unique Number of matched feature pairs: ~~~
     357
########################################################################



  List based inclusive comparisons:
Of 402 list1 features, number of uni-direction matched features is 372.
Of 6434 list1 features, number of uni-direction matched features is 403.
    ~~~ match_numbers ~~~     

Unique Number of matched features in table 1:  319
Unique Number of matched features in table 2:  

In [15]:
compare(true_, msdial_)


  Best match comparisons:

    ~~~ By best rtime matches ~~~     

Of 402 list1 features, number of uni-direction matched features is 394.
Of 29924 list1 features, number of uni-direction matched features is 675.
~~~ Biodirectional, unique Number of matched feature pairs: ~~~
 392


########################################################################
    ~~~ By best m/z matches ~~~     

Of 402 list1 features, number of uni-direction matched features is 394.
Of 29924 list1 features, number of uni-direction matched features is 675.
~~~ Biodirectional, unique Number of matched feature pairs: ~~~
     386
########################################################################



  List based inclusive comparisons:
Of 402 list1 features, number of uni-direction matched features is 394.
Of 29924 list1 features, number of uni-direction matched features is 675.
    ~~~ match_numbers ~~~     

Unique Number of matched features in table 1:  187
Unique Number of matched features in table 2

## End

The numbers were used to construct Figure 4.
