# Mssuite example workflow 1

In this workbook we explore how to use the mssuite python package for proteomics data analysis. I will explain how to test the different modules and perform differential expression analysis.
First we need to import some packages:

In [None]:

import pandas as pd #We need pandas to read in our data
import mssuite.mssuite as ms

Then we load our example data. Here we use a TMTpro 12-plex containing different spike-ins of E.coli proteins. 

In [None]:
psms = pd.read_csv('Example_data_PSMs.txt',header=0,sep='\t')
print(psms.head())

The first step is the preprocessing step, where we will filter our dataset if needed and perform normalisations. Therefore we need to initialize the needed modules:

In [None]:
defaults = ms.Defaults()
process = ms.Preprocessing()

The normalisation functions need an array of columns to use for normalisation. We can get this array by calling the `Defaults.get_channels()` method on our input data:

In [None]:
channels = defaults.get_channels(psms)
print(channels)

Now we first plot the data as boxplots before normalisation:

In [None]:
psms[channels].plot.box(logy=True,showfliers=False)

Now we perform normalisation:

In [None]:
psms = process.total_intensity(psms,channels=channels)

Now we look at the data after normalisation:

In [None]:
psms[channels].plot.box(logy=True,showfliers=False)

This looks much better now. Now we can perform differential expression analysis. We need to specify our different conditions for the replicates and the pairs we want to test. If we do not specify the pairs, all possible combinations will be tested. To simplify the downstream analysis i add a '0'prefix before my control, since the order within statsmodels is always alphabetically and now we get always MixX versus control, which makes interpretation more intuitive.

In [None]:
hypo = ms.HypothesisTesting()
conditions = ['0Control','0Control','0Control','Mix1','Mix1','Mix1','Mix2','Mix2','Mix2','Mix3','Mix3','Mix3']
pairs=[['0Control','Mix1'],['0Control','Mix2'],['0Control','Mix3']]
results = hypo.peptide_based_lmm(psms,conditions=conditions,norm=None,pairs=pairs)
'''
 I specified norm = None since we already normalized our data. Alternatively we can give the function a function from the Preprocssing module.
'''

After we performed statistical analysis we can examine which comparisons have been performed and use this information to extract significant hits from the data or create plots for the different comparisons.