# PathIntegrate Unsupervised Extension Guide

Package Documentation: https://cwieder.github.io/PathIntegrate/ 

Unsupservised Extension: https://github.com/judepops/MultiomicsML/PathIntegrate_JP

## Dependencies

In [1]:
# Classic Dependencies

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import sspa

# PathIntegrate Package
import pathintegrate_v3

## Loading the Multi-omics Datasets and Metadata

## Data Source

The dataset we will be working with is from Su et al. "Multi-Omics Resolves a Sharp Disease-State Shift between Mild and Moderate COVID-19".


* Plasma metabolomics (Metabolon UHPLC-MS/MS)
* Proteomics (Olink) datasets

with matched samples, of which 45 samples had ‘mild’ COVID (WHO status 1-2), and 82 had ‘moderate-severe’ COVID19 (WHO status 3-7), totalling 127 samples

Su Y, Chen D, Yuan D, Lausted C, Choi J, Dai CL, et al. Multi-Omics Resolves a Sharp Disease-State Shift between Mild and Moderate COVID-19. Cell. 2020;183: 1479-1495.e20. [DOI](doi:10.1016/j.cell.2020.10.037)



## Identifier harmonisation

*   Feature IDs have been converted to ChEBI for metabolites and UniProt for proteins
*   sspa package provides metabolite ID conversion utility
* IDs must match those of the pathway database
  * Reactome uses ChEBI, UniProt, and ENSEMBL
  * KEGG uses KEGG compound and KEGG gene

## Data pre-treatment

* Outlying samples should be removed
* Missing data should be imputed
* Features should be roughly normally distributed
* Features *do not need* to be scaled, this is applied internally in the model using StandardScaler (mean = 0, SD =1)

## Unsuspervised Workflow

### Loading Example Data (within package)


In [None]:
metab = pathintegrate_v3.load_example_data('metabolomics')
prot = pathintegrate_v3.load_example_data('proteomics')

### Loading your own data (Optional)

In [None]:
metab = pd.read_csv('')
prot = pd.read_csv('')

### Loading the Reactome multi-omics Pathways

In [None]:
mo_paths = sspa.process_reactome(
    organism='Homo sapiens',
    download_latest=True,
    omics_type='multiomics',
    filepath='.' # save to current directory
)

### Initiating a PathIntegrate Object (same protocol as supervised)

In [None]:
pi_model = pathintegrate.PathIntegrate(
    omics_data={'Metabolomics': metab, 'Proteomics':prot.iloc[:, :-1]}, # dictionary of multi-omics DataFrames and names for each omics
    metadata=metadata_binary, # metadata column
    pathway_source=mo_paths, # pathways dataframe
    sspa_scoring=sspa.sspa_SVD, # ssPA method, see ssPA package for options
    min_coverage=4) # minimum number of molecules mapping per pathway to be included

### 1) Fitting a Dimensionality Reduction Unsupervised Model

In [None]:
covid_dimred = 