In [1]:
from lcms_align import *

First, we will load LC-MS data from example CSV files. These files contain detected features in centroid mode. The corresponding raw files are available in the PXD000484 dataset on the PRIDE repository. Each feature includes information such as the mass-to-charge ratio (m/z), retention time, and intensity, providing a concise summary of the detected compounds in each sample.

In [2]:
filename1 = '100825O2c1_MT-AU-0044-2010-08-15_038.csv' 
filename2 = '100820O2c1_MT-AU-0044-2010-08-1_030.csv'
# filename3 = '121004OTc2_TDM-AU-0324-EMC-1_011.csv'

sp1 = pd.read_csv(f'data/{filename1}')
sp2 = pd.read_csv(f'data/{filename2}')
# sp3 = pd.read_csv(f'data/{filename3}')

To perform alignment and further analysis, we need to convert the loaded dataframes (`sp1` and `sp2`) into `Spectrum` objects. The `Spectrum` class is designed to represent LC-MS features in a structured format, making it easier to process and compare the data from different samples. 

In [3]:
# create Spectrum objects
S1 = Spectrum(np.array([sp1['m/z'].values, sp1['Retention time'].values]), np.array(sp1['Intensity'].values))
S2 = Spectrum(np.array([sp2['m/z'].values, sp2['Retention time'].values]), np.array(sp2['Intensity'].values))

Now we need to define the alignment parameters `max_mz_shift` and `max_rt_shift`. These parameters specify the maximum allowed differences in mass-to-charge ratio (m/z) and retention time (RT) when matching features between spectra. Setting appropriate values for these thresholds is crucial for accurate alignment.
The optimal values for these parameters can vary depending on the resolution and accuracy of the mass spectrometer used for data acquisition.

In [4]:
# define parameters
max_mz_shift = 0.005
max_rt_shift = 800

Now we align the spectra using the defined parameters. The alignment process matches features between the two `Spectrum` objects (`S1` and `S2`) based on their mass-to-charge ratio (m/z) and retention time (RT) within the specified thresholds. The result is a dataframe containing pairs of matching features from each spectrum, allowing for direct comparison of corresponding compounds across the samples.

In [5]:
align_spectra(S1, S2, max_mz_shift, max_rt_shift)

Unnamed: 0,m/z_S1,Retention time_S1,Intensity_S1,m/z_S2,Retention time_S2,Intensity_S2
0,415.212107,11247.284257,1.077727e+09,415.211922,11223.972534,1.212435e+09
1,421.758811,3469.870316,1.072969e+09,421.758814,3379.488346,1.818847e+09
2,422.293553,11247.875185,1.640759e+09,422.293496,11226.669059,9.239702e+08
3,714.345984,5521.682102,8.947495e+08,714.345924,5380.917067,7.991051e+08
4,710.378424,7330.143344,1.183985e+09,710.378374,7197.065563,7.264030e+08
...,...,...,...,...,...,...
25498,667.314245,3638.255458,5.854923e+05,667.310929,2921.298990,1.044441e+06
25499,401.290060,10640.795713,2.099917e+04,401.290974,10916.691308,3.209108e+06
25500,407.742778,3151.654479,4.165283e+05,407.744432,3870.966262,1.247250e+06
25501,505.299007,9790.396524,6.644074e+04,505.299451,10391.065797,1.417621e+04
