# Integration with mzMatch and PeakMLViewerPy

The new ipaPy2 package is fully integrated with [mzMatch](https://github.com/UoMMIB/mzmatch.R).
mzMatch is a data processing pipeline for metabolomics LC/MS data. This pipeline revolves around the PeakML file format, which is designed to contains all the information obtained from the processing of a LC/MS-based untargeted metabolomics experiment.  

As shown below, the new IPA implementation includes the necessary functions to read a .peakml file and extracts the information required to run the IPA method.

In [1]:
from ipaPy2 import PeakMLIO
from ipaPy2 import ipa
import pandas as pd

All the information about the ReadPeakML() function can be found in the help:

In [2]:
help(PeakMLIO.ReadPeakML)

Help on function ReadPeakML in module ipaPy2.PeakMLIO:

ReadPeakML(filename)
    Loading data from a PeakML file
    
    Parameters
    ----------
    filename : string with the name (including path) of the .peakml file.
                It is important to use the allpeaks file generated by the
                mzmatch anlysis, for the proper use of the IPA method.
    
    Returns
    -------
    df : pandas dataframe (necessary)
         A dataframe containing the MS1 data including the following columns:
            -ids: an unique id for each feature
            -rel.ids:   relation ids. In a previous step of the data processing
                        pipeline, features are clustered based on peak shape
                        similarity/retention time. Features in the same
                        cluster are likely to come from the same metabolite.
                        All isotope patterns must be in the same rel.id
                        cluster.
            -mzs: mass-to-cha

Using this function, one can read the .peakml file and obtain the dataframe needed to run the IPA pipeline.
For this example, we used a rather small example .peakml file that can be downloaded from [here](https://drive.google.com/file/d/123bDs8kMTlDbjd1gETSAEe6uCW_fgsSW/view?usp=share_link).

In [3]:
df = PeakMLIO.ReadPeakML('Example_allpeaks.peakml')
df.head()

loading Example_allpeaks.peakml...
parsing 959 peaks...


Unnamed: 0,ids,rel.ids,mzs,RTs,Int
0,1,0,116.070547,45.770423,2170017000.0
1,88,0,117.073691,45.787586,125652000.0
2,372,0,70.065197,45.795901,34876560.0
3,501,0,231.133686,46.183948,25192230.0
4,2,1,104.106842,40.843309,1889172000.0


In order to run the IPA method on this dataset, it is necessary to load the necessary databases:

In [4]:
DB=pd.read_csv('DB/IPA_MS1.csv')
adducts = pd.read_csv('DB/adducts.csv')
Bio = pd.read_csv('DB/allBIO_reactions.csv')

Finally, we can run the whole pipeline with the simpleIPA() function.

In [5]:
annotations = ipa.simpleIPA(df=df,ionisation=1,DB=DB,adductsAll=adducts,ppm=3,ppmthr=5,Bio=Bio,
                            delta_add=0.1,delta_bio=0.1,burn=1000,noits=5000,ncores=70)

mapping isotope patterns ....
1.8 seconds elapsed
computing all adducts - Parallelized ....
36.5 seconds elapsed
annotating based on MS1 information - Parallelized ...
35.0 seconds elapsed
computing posterior probabilities including biochemical and adducts connections
initialising sampler ...


Gibbs Sampler Progress Bar: 100%|██████████| 5000/5000 [44:33<00:00,  1.87it/s]


parsing results ...
Done -  2677.8 seconds elapsed


As an example, below it is shown the results for the annotation of the mass spectrometry feature associated with id=9

In [6]:
annotations[9]

Unnamed: 0,id,name,formula,adduct,m/z,charge,RT range,ppm,isotope pattern score,fragmentation pattern score,prior,post,post Gibbs,chi-square pval
0,C08307,Hordatine A,C28H40N8O4,M+2H,276.158077,2.0,,-0.584944,0.487753,,0.486508,0.654075,0.772,2.5278600000000002e-54
1,NPA028941,Microginin 550,C27H44N4O8,M+2H,276.157408,2.0,,1.836519,0.503315,,0.248086,0.344176,0.2275,2.5278600000000002e-54
2,Unknown,Unknown,,,,,,3.0,0.008931,,0.071043,0.001749,0.0005,2.5278600000000002e-54
3,C21776,2-({[(4-Methoxyphenyl)methyl](methyl)amino}met...,C14H23NNaO3,M+Na,276.157012,1.0,,3.272514,0.0,,0.048591,0.0,0.0,2.5278600000000002e-54
4,NPA001615,(Z)-N-(4-decenoyl)-L-homoserine lactone,C14H23NNaO3,M+Na,276.157012,1.0,,3.272514,0.0,,0.048591,0.0,0.0,2.5278600000000002e-54
5,NPA031672,Laricinin A,C14H23NNaO3,M+Na,276.157012,1.0,,3.272514,0.0,,0.048591,0.0,0.0,2.5278600000000002e-54
6,ET28010x_1,MPL-dm,C14H23NNaO3,M+Na,276.157012,1.0,,3.272514,0.0,,0.048591,0.0,0.0,2.5278600000000002e-54


The IPA annotations obtained can be added directly to a .peakml file via the add_IPA_to_PeakML() function

In [7]:
help(PeakMLIO.add_IPA_to_PeakML)

Help on function add_IPA_to_PeakML in module ipaPy2.PeakMLIO:

add_IPA_to_PeakML(file, IPA_Data, out_File)
    Adding IPA annotation to PeakML file
    
    Parameters
    ----------
    file : string with the name (including path) of the .peakml file.
    IPA_Data : Dictionary containing the IPA annotation
    out_File : string with the name (including path) where the annotated
                .peakml file will be saved.



In [8]:
PeakMLIO.add_IPA_to_PeakML("Example_allpeaks.peakml",annotations,"Example_allpeaks_IPA_annotated.peakml")

Adding IPA annotation to Example_allpeaks.peakml and saving it as Example_allpeaks_IPA_annotated.peakml


The dictionary containing all annotations, can be also saved in a .pickle file.

This file can be read by the [PeakMLViewerpy](https://github.com/UoMMIB/PeakMLViewerPy), allowing the exploration of the .peakml file together with the IPA annotation.
PeakMLViewerpy is availabe on GitHub [here](https://github.com/UoMMIB/PeakMLViewerPy) together with a detailed installation guide.

![screenshot](PeakMLViewerpy_screenshot.png)