# Beer dataset

In the paper where the first version of the IPA method was intially introduced ([Del Carratore et al., 2019](https://pubs.acs.org/doi/full/10.1021/acs.analchem.9b02354)), a LC/MS-based untargeted metabolomics experiment on 21 different beers (7 indian pale ales, 7 lagers, and 7 porters) was introduced.
The new version of the IPA method was applied to the datasets (positive and negative) obtained from this experiment.

## Positive dataset
The positive dataset can be found within this library:

In [1]:
import pandas as pd
dfpos = pd.read_csv('ExampleDatasets/Beer/Beer_pos.csv')
dfpos[dfpos['rel.ids']==9]

Unnamed: 0,ids,rel.ids,mzs,RTs,Int
71,10,9,182.080964,67.367644,752484700.0
72,174,9,183.084172,67.386181,76511290.0
73,248,9,165.054445,67.381211,56638180.0
74,303,9,136.075532,67.299333,46015640.0
75,385,9,311.123492,66.831756,33858540.0
76,959,9,123.043945,67.4053,14597330.0
77,1859,9,119.049026,67.433861,6920786.0
78,1996,9,147.043864,67.365218,6399960.0
79,2204,9,473.176365,66.906002,5715022.0
80,2225,9,166.057784,67.444651,5636496.0


In order to run the method this dataset, it is necessary to load the package, the MS1 database and the adducts information.

In [2]:
from ipaPy2 import ipa
DB=pd.read_csv('DB/IPA_MS1.csv')
adducts = pd.read_csv('DB/adducts.csv')

As described in the original paper, a set of standard mixes was analysed with the same analytical setting. Everything learned from theses samples was recorded in the .csv file shown below.

In [3]:
updates = pd.read_csv('ExampleDatasets/Beer/update_based_on_standards.csv')

This information can be used in the annotation process by updating the database:

In [4]:
for k in range(0,len(updates.index)):
    DB.iloc[DB['id']==updates.iloc[k,0],5] = updates.iloc[k,7]
    DB.iloc[DB['id']==updates.iloc[k,0],6] = updates.iloc[k,8]
    DB.iloc[DB['id']==updates.iloc[k,0],7] = updates.iloc[k,10]

A dataframe containing all possible biochemical connections among all metabolites present in the IPA_MS1.csv database has been pre-computed and it is available in the library. Using this instead of computing the connections will strongly speed up the pipeline.

In [5]:
Bio = pd.read_csv('DB/allBIO_reactions.csv')

Finally, we can run the whole pipeline with the simpleIPA() function.

WARNING! running the whole pipeline including the Gibbs sampler for such big dataset/database will take several hours.

In [6]:
annotationsPos = ipa.simpleIPA(df=dfpos,ionisation=1,DB=DB,adductsAll=adducts,ppm=5, Bio=Bio,
                            delta_add=0.1,delta_bio=0.5,burn=1000,noits=5000,ncores=1)

mapping isotope patterns ....
6.6 seconds elapsed
computing all adducts ....
109.3 seconds elapsed
annotating based on MS1 information....
121.6 seconds elapsed
computing posterior probabilities including biochemical and adducts connections
initialising sampler ...


Gibbs Sampler Progress Bar: 100%|██████████| 5000/5000 [8:07:35<00:00,  5.85s/it]  


parsing results ...
Done -  29270.6 seconds elapsed


The annotation for the same feature considered in the original paper as an example (id=10, m/z=182.080964, RT= 67.37s) and shown in [Figure 5](https://pubs.acs.org/cms/10.1021/acs.analchem.9b02354/asset/images/large/ac9b02354_0005.jpeg) is shown below.

In [7]:
annotationsPos[10]

Unnamed: 0,id,name,formula,adduct,m/z,charge,RT range,ppm,isotope pattern score,fragmentation pattern score,prior,post,post Gibbs,chi-square pval
1,C00082,L-Tyrosine,C9H12NO3,M+H,182.081169,1.0,,-1.126904,0.099999,,0.097486,0.097561,0.24525,1.168653e-231
0,C06420,D-Tyrosine,C9H12NO3,M+H,182.081169,1.0,50;90,-1.126904,0.099999,,0.097486,0.121951,0.1315,1.168653e-231
4,C04368,3-Amino-3-(4-hydroxyphenyl)propanoate,C9H12NO3,M+H,182.081169,1.0,,-1.126904,0.099999,,0.097486,0.097561,0.11775,1.168653e-231
5,C19579,gamma-Hydroxy-3-pyridinebutanoate,C9H12NO3,M+H,182.081169,1.0,,-1.126904,0.099999,,0.097486,0.097561,0.07875,1.168653e-231
3,C03290,L-threo-3-Phenylserine,C9H12NO3,M+H,182.081169,1.0,,-1.126904,0.099999,,0.097486,0.097561,0.07725,1.168653e-231
9,NPA027085,2-((2-hydroxyethyl)amino)benzoic acid,C9H12NO3,M+H,182.081169,1.0,,-1.126904,0.099999,,0.097486,0.097561,0.07175,1.168653e-231
2,C01536,Tyrosine,C9H12NO3,M+H,182.081169,1.0,,-1.126904,0.099999,,0.097486,0.097561,0.0715,1.168653e-231
6,C19712,N-Hydroxy-L-phenylalanine,C9H12NO3,M+H,182.081169,1.0,,-1.126904,0.099999,,0.097486,0.097561,0.0695,1.168653e-231
8,C21308,(S)-beta-Tyrosine,C9H12NO3,M+H,182.081169,1.0,,-1.126904,0.099999,,0.097486,0.097561,0.06925,1.168653e-231
7,C20807,3-Hydroxy-L-phenylalanine,C9H12NO3,M+H,182.081169,1.0,,-1.126904,0.099999,,0.097486,0.097561,0.0675,1.168653e-231


The whole annotation dictonary for this dataset can be saved as a pickle file.

In [8]:
import pickle
file = open("ExampleDatasets/Beer/annotationsPos.pickle", "wb")
pickle.dump(annotationsPos, file)
file.close()

## Negative dataset
The negative dataset can also be found within this library:

In [9]:
Bio = pd.read_csv('DB/allBIO_reactions.csv')
dfneg = pd.read_csv('ExampleDatasets/Beer/Beer_neg.csv')
dfneg.head()

Unnamed: 0,ids,rel.ids,mzs,RTs,Int
0,1,0,96.958464,48.370133,1449591000.0
1,35,0,98.954139,48.395777,69215420.0
2,215,0,705.192297,50.280204,15274240.0
3,250,0,925.235146,49.086028,13603920.0
4,301,0,786.218066,50.939859,11542660.0


In the same way it can be annotated with the IPA method.

WARNING! running the whole pipeline including the Gibbs sampler for such big dataset/database will take several hours.

In [10]:
annotationsNeg = ipa.simpleIPA(df=dfneg,ionisation=-1,DB=DB,adductsAll=adducts,ppm=10,Bio=Bio,
                            delta_add=0.1,delta_bio=0.5,burn=1000,noits=5000,ncores=1)

mapping isotope patterns ....
12.0 seconds elapsed
computing all adducts ....
676.9 seconds elapsed
annotating based on MS1 information....
234.0 seconds elapsed
computing posterior probabilities including biochemical and adducts connections
initialising sampler ...


Gibbs Sampler Progress Bar: 100%|██████████| 5000/5000 [20:22:19<00:00, 14.67s/it]   


parsing results ...
Done -  73363.6 seconds elapsed


The whole annotation dictonary for this dataset can be saved as a pickle file.

In [11]:
file = open("ExampleDatasets/Beer/annotationsNeg.pickle", "wb")
pickle.dump(annotationsNeg, file)
file.close()