# Step-by-step Guide: Running Cellmaps Pipeline

### Installation

It is highly recommended to create conda virtual environment and run jupyter from there.

`conda create -n cm4ai python=3.8`

`conda activate cm4ai`

To install Cellmaps Pipeline run:

`pip install cellmaps_pipeline`

### Input Data

The cell maps pipeline requires the following input files for building MuSIC maps by integrating IF images with an AP-MS interaction network:

- samples file: CSV file with list of IF images to download (see sample samples file in examples folder)

- unique file: CSV file of unique samples (see sample unique file in examples folder)

- bait list file: TSV file of baits used for AP-MS experiments

- edge list file: TSV file of edges for protein interaction network

- provenance: file containing provenance information about input files in JSON format (see sample provenance file in examples folder, or create one directly as described above)

## Step 1: Download ImmunoFluorescent image data

Detailed documentation available [here](https://cellmaps-imagedownloader.readthedocs.io/).

In [None]:
from cellmaps_imagedownloader.runner import CellmapsImageDownloader
from cellmaps_imagedownloader.runner import MultiProcessImageDownloader
from cellmaps_imagedownloader.gene import ImageGeneNodeAttributeGenerator as IGen 
from cellmaps_imagedownloader.proteinatlas import ProteinAtlasReader, ProteinAtlasImageUrlReader, ImageDownloadTupleGenerator
import json

u_list = IGen.get_unique_list_from_csvfile('../examples/unique.csv')
s_list=IGen.get_samples_from_csvfile('../examples/samples.csv')
with open('../examples/provenance.json', 'r') as f:
    json_prov = json.load(f)

imagegen = IGen(unique_list=u_list, samples_list=s_list)

outdir='1.image_download'
dloader = MultiProcessImageDownloader(poolsize=4)
proteinatlas_reader = ProteinAtlasReader(outdir)
proteinatlas_urlreader = ProteinAtlasImageUrlReader(reader=proteinatlas_reader)
imageurlgen = ImageDownloadTupleGenerator(reader=proteinatlas_urlreader,
                                          samples_list=imagegen.get_samples_list(),
                                          valid_image_ids=imagegen.get_samples_list_image_ids())

x = CellmapsImageDownloader(outdir=outdir, imagedownloader=dloader, imgsuffix='.jpg', imagegen=imagegen, 
                            imageurlgen=imageurlgen, provenance=json_prov)
x.run()

## Step 2: Download Affinity-Purification mass spectrometry (AP-MS) data as a Protein-Protein Interaction network

Detailed documentation available [here](https://cellmaps-ppidownloader.readthedocs.io/).

In [None]:
from cellmaps_ppidownloader.runner import CellmapsPPIDownloader
from cellmaps_ppidownloader.gene import APMSGeneNodeAttributeGenerator

with open('../examples/provenance.json', 'r') as f:
    json_prov = json.load(f)

apmsgen = APMSGeneNodeAttributeGenerator(
    apms_edgelist=APMSGeneNodeAttributeGenerator.get_apms_edgelist_from_tsvfile('../examples/edgelist.tsv'),
    apms_baitlist=APMSGeneNodeAttributeGenerator.get_apms_baitlist_from_tsvfile('../examples/baitlist.tsv'))

x = CellmapsPPIDownloader(outdir='2.ppi_download', apmsgen=apmsgen, provenance=json_prov, input_data_dict={})
x.run()

## Step 3: Generate embeddings from ImmunoFluorescent image data

Detailed documentation available [here](https://cellmaps-image-embedding.readthedocs.io/).

## Step 4: Generate embeddings from Protein-Protein interaction networks

Detailed documentation available [here](https://cellmaps-ppi-embedding.readthedocs.io/).

## Step 5: Generate co-embedding from image and Protein-Protein Interaction (PPI) embeddings

Detailed documentation available [here](https://cellmaps-coembedding.readthedocs.io/).

## Step 6: Generate hierarchy from coembeddings using HiDeF.

Detailed documentation available [here](https://cellmaps-generate-hierarchy.readthedocs.io/).

## Step 7: Annotate a hierarchy by performing enrichment against three NDEx networks HPA, CORUM, and GO-CC

Detailed documentation available [here](https://cellmaps-hierarchyeval.readthedocs.io/).