# Quick Start

## Workflow

`msmu` processes LC-MS/MS search outputs and produces an analysis-ready protein matrix.
Each processing step is modular, and normalization / filtering / aggregation can be applied optionally at any level depending on your analysis design

    1. Load DB search result (read functions)
    2. (optional) PSM-level filtering
    3. Log2 Transformation
    4. (optional) PSM normalization
    5. Summarize to peptides
    6. Protein inference
    7. Summarize to protein groups
    8. Analyze
    9. Save

Functions can be called from submodules:

- `pp`: preprocessing (filter, normalization, summarization, etc,)
- `tl`: tools (pca, umap, fasta annotation, DE analysis, etc,)
- `pl`: plotting (bar plot for ID, charges, and histograms, etc,)

<br><br>

Basic usages of `msmu` can be found down below:


### 0. Import msmu


In [None]:
import msmu as mm

### 1. Load DB search result

- Ingest outputs from DB search tools in to a unified MuData object.


In [None]:
mdata = mm.read_sage("sage/output/dir/", label="tmt")

mdata

### 2. (optional) PSM-level filtering

- Remove low-confidence PSMs / precursors (q-value, etc.).


In [None]:
mdata = mm.pp.add_filter(mdata, modality="psm", column="q_value", keep="lt", value=0.01)
mdata = mm.pp.apply_filter(mdata, modality="psm")

### 3. Log2 Transformation

- Apply log2 transformation for quantification matrix
- Further steps will be proceed with assumption of log2 transformed values.


In [None]:
mdata = mm.pp.log2_transform(mdata, modality="psm")

mdata["psm"].to_df().T

### 4. (optional) PSM normalization

- Apply observation (sample) wise normalization


In [None]:
mdata = mm.pp.normalize(mdata, modality="psm", method="median", rescale=True)

### 5. Aggregate to peptides

- Summarize PSMs (or precursors) to peptide level.
- (optional) filtering or normalization can be also applied at peptide level.
- Peptide-level q-values will be calculated based on their PEP.


In [None]:
mdata = mm.pp.to_peptide(mdata, **summarization_args)

### 6. Protein inference

- Map peptides to protein groups


In [None]:
mdata = mm.pp.infer_protein(mdata)

### 7. Aggregate to protein groups

- Generate protein group level matrix.
- Only unique peptides will be used for protein summarization.
- Protein group-level q-values will be calculated based on their PEP.


In [None]:
mdata = mm.pp.to_protein(mdata, **summarization_args)

mm.pl.plot_bar(mdata, modality="protein", )

### 8. Analyse

- Perform differential expression, PCA/UMAP, QC, missingness analysis, and other statistical workflows.


In [None]:
# PCA / UMAP
mdata = mm.tl.pca(mdata, modality="protein") # mdata = mm.tl.umap(mdata, modality="protein")
mm.pl.plot_pca(mdata, modality="protein")    # mm.pl.plot_umap(mdata, modality="protein")

# DEA
de_res = mm.tl.run_de(mdata, modality="protein", ctrl="control", expr="expr")
de_res.to_df()  # show result in pandas dataframe
de_res.plot_volcano()   # show result with volcanoplot

### 9. Save & Load h5mu


In [None]:
mdata.write_h5mu("file/name/to/save.h5mu")

mdata = mm.read_h5mu("file/name/mudata.h5mu)