# Setup

Section to set up Jupyter Notebook and intialize experimental settings

### Give Jupyter Notebook access to relative import

In [1]:
import os
import sys
module_path = os.path.abspath(os.path.join('..'))
if module_path not in sys.path:
    sys.path.append(module_path)

### Create GTMDecon object

For ease of user access, we use the GTMDecon python wrapper, built around the gtm-decon C executable files.

In [2]:
from PythonWrapper.GTM_decon import GTM_decon

Initialize GTMDecon wrapper object:

Basic Constructor Arguments:
- **experiment_name** : str [optional]
- **n_topics** : int [optional, default=1]
    - number of topics we wish to set per celltype
- **engine_path** : str
    - path to GTM-decon C executable
    Here we only set the experiment name and engine path, the n_topics parameter will be by default set to 1.

In [3]:
GTM = GTM_decon(
    experiment_name = "gtm-example",
    engine_path = "/home/mcb/users/slaksh1/projects/revision_gb/gtm-decon-phinorm/gtm-decon-plus-noupd-ab-phinorm"
)

We can see the parameters set for our GTM wrapper, including the number of topics per celltype and the engine_path (path to C executable).

We can see that the **experiment_name**, **n_topics**, and **engine_path** attributes have been set as we intended, while the remaining attributes have been left unfilled. The **genes**, **celltypes**, and **bulk_samples** parameters will be populated as we provide our input reference and bulk data.

In [4]:
print(GTM)

GTM-decon wrapper object with attributes:
  - experiment_name: gtm-example
  - n_topics: 1
  - engine_path: /home/mcb/users/slaksh1/projects/revision_gb/gtm-decon-phinorm/gtm-decon-plus-noupd-ab-phinorm
  - genes: []
  - celltypes: []
  - bulk_samples: []
  - verbose: True
  - output_intermediates: False
  - override_geneset: False



# Example Deconvolution Pipeline

In order to infer cell-type proportions for a given bulk dataset and given single cell reference matrix, we can use the **GTMDecon.pipeline** function to process the input information, run it through the gtm-decon C executables, and output the predicted cell-type proportions of our bulk.

### Loading DataFrames

In [5]:
import pandas as pd
import anndata as ad

Load our reference and bulk dataframes from the example csvs.

The **reference_DataFrame** should be a pandas DataFrame object, the rows are cells, the columns are the genes, with one additional column named *Celltype* containing the cell-type labels associated with each row.

The **bulk_DataFrame** should be a pandas DataFrame, where the rows represent genes, with the genes stored as the index, and the columns represent the bulk batches.

In [6]:
bulk_DataFrame = pd.read_csv("../data/bulk.csv", index_col=0)
reference_DataFrame = pd.read_csv("../data/ref.csv")

### Running our Pipeline

GTMDecon.pipeline arguments:
- **bulk_data** : pd.DataFrame
- **reference_data** : pd.DataFrame
- **directory** : str
    - directory where we want to save the model parameters and inferred cell-type proportions 
    - we expect the inferred propotions to end up here: **/vignette_results/gatheredResults.csv**


We make a directory to store the results for this vignette

In [12]:
!mkdir tutorial_results

Here we run our pipeline, including processing data to GTM-decon format, training, and cell-type proportion inference.

If we want to suppress print statements, set GTM.verbose = False

In [8]:
GTM.pipeline(
    bulk_data = bulk_DataFrame,
    reference_data = reference_DataFrame,
    directory = os.path.join(os.getcwd(), 'tutorial_results'),
)

Running GTM Deconvolution Pipeline
Writing results to /home/mcb/users/zhuang35/projects/gtm-decon/vignettes/tutorial_results
**********************************

Saving genes file to /home/mcb/users/zhuang35/projects/gtm-decon/vignettes/tutorial_results/genes.txt ...
Successfully wrote genes file to /home/mcb/users/zhuang35/projects/gtm-decon/vignettes/tutorial_results/genes.txt
Saving meta file to /home/mcb/users/zhuang35/projects/gtm-decon/vignettes/tutorial_results/meta.txt ...
Successfully wrote meta file to /home/mcb/users/zhuang35/projects/gtm-decon/vignettes/tutorial_results/meta.txt
Saving training file to /home/mcb/users/zhuang35/projects/gtm-decon/vignettes/tutorial_results/trainData.txt ...
Successfully wrote training file to /home/mcb/users/zhuang35/projects/gtm-decon/vignettes/tutorial_results/trainData.txt
Saving prior file to /home/mcb/users/zhuang35/projects/gtm-decon/vignettes/tutorial_results/priorData.txt ...
Successfully wrote prior file to /home/mcb/users/zhuang35/p

Upon completion we should be able to obtain the predicted proportions in **/tutorial_results/gatheredResults.csv**

This file contains the inferred cell-type proportions of our provided bulk data given the provided refernce data. The sample names are the index and the celltypes are the columns of this file.

In [14]:
predicted_props = pd.read_csv("../vignettes/tutorial_results/gatheredResults.csv", index_col=0)

In [15]:
predicted_props.head()

Unnamed: 0,MHC class II cell,PSC cell,acinar cell,alpha cell,beta cell,co-expression cell,delta cell,ductal cell,endothelial cell,epsilon cell,gamma cell,mast cell
H1,0.092261,0.008316,0.13631,0.095074,0.177341,0.011476,0.111319,0.026478,0.015136,0.08083,0.10441,0.141049
D1,0.085596,0.052764,0.023833,0.187011,0.197139,0.020065,0.106696,0.066362,0.038764,0.01889,0.043128,0.159753
