# üêä GATOR 
## Assign phenotypes to each cell. Clustering data may not always be ideal, so we developed a cell type assignment algorithm that does a hierarchical assignment process iteratively.
#### Please keep in mind that the sample data is used for demonstration purposes only and has been simplified and reduced in size. It is solely intended for educational purposes on how to execute `Gator` and will not yeild any meaningful results.

#### Download the exemplar dataset and executable notebooks from HERE
#### Make sure you have completed `Build Model, Apply Model and Run Gator Algorithm` Tutorial before you try to execute this Jupyter Notebook!

In [2]:
# import packages
import gatorpy as ga
import os
import pandas as pd

### We need `two` basic input to run the third module of the gator algorithm
- The Gator Object
- A Phenotyping workflow based on prior knowledge

In [3]:
# set the working directory & set paths to the example data
cwd = '/Users/aj/Desktop/gatorExampleData'
# Module specific paths
gatorObject = cwd + '/GATOR/gatorObject/exampleImage_gatorPredict.ome.h5ad'

In [4]:
# load the phenotyping workflow
phenotype = pd.read_csv(str(cwd) + '/phenotype_workflow.csv')
# view the table:
phenotype.style.format(na_rep='')

Unnamed: 0.1,Unnamed: 0,Unnamed: 1,ECAD,CD45,CD4,CD3D,CD8A,KI67
0,all,Immune,,anypos,anypos,anypos,anypos,
1,all,ECAD+,pos,,,,,
2,ECAD+,KI67+ ECAD+,,,,,,pos
3,Immune,CD4+ T,,,allpos,allpos,,
4,Immune,CD8+ T,,,,allpos,allpos,
5,Immune,Non T CD4+ cells,,,pos,neg,,


### As it can be seen from the table above,
### (1) The `first column` has to contain the cell that are to be classified.
### (2) The `second column` indicates the phenotype a particular cell will be assigned if it satifies the conditions in the row.
### (3) `Column three` and onward represent protein markers. If the protein marker is known to be expressed for that cell type, then it is denoted by either `pos`, `allpos`. If the protein marker is known to not express for a cell type it can be denoted by `neg`, `allneg`. If the protein marker is irrelevant or uncertain to express for a cell type, then it is left empty. `anypos` and `anyneg` are options for using a set of markers and if any of the marker is positive or negative, the cell type is denoted accordingly.

### To give users maximum flexibility in identifying desired cell types, we have implemented various classification arguments as described above for strategical classification. They include

- allpos
- allneg
- anypos
- anyneg
- pos
- neg

### `pos` : "Pos" looks for cells positive for a given marker. If multiple markers are annotated as `pos`, all must be positive to denote the cell type. For example, a Regulatory T cell can be defined as `CD3+CD4+FOXP3+` by passing `pos` to each marker. If one or more markers don't meet the criteria (e.g. CD4-), the program will classify it as `Likely-Regulatory-T cell`, pending user confirmation. This is useful in cases of technical artifacts or when cell types (such as cancer cells) are defined by marker loss (e.g. T-cell Lymphomas).

### `neg` : Same as `pos` but looks for negativity of the defined markers. 

### `allpos` : "Allpos" requires all defined markers to be positive. Unlike `pos`, it doesn't classify cells as `Likely-cellType`, but strictly annotates cells positive for all defined markers.

### `allneg` : Same as `allpos` but looks for negativity of the defined markers. 

### `anypos` : "Anypos" requires only one of the defined markers to be positive. For example, to define macrophages, a cell could be designated as such if any of `CD68`, `CD163`, or `CD206` is positive.

### `anyneg` : Same as `anyneg` but looks for negativity of the defined markers. 

In [5]:
adata = ga.gatorPhenotype ( gatorObject=gatorObject,
                            phenotype=phenotype,
                            midpoint = 0.5,
                            label="phenotype",
                            imageid='imageid',
                            pheno_threshold_percent=None,
                            pheno_threshold_abs=None,
                            fileName=None,
                            outputDir=cwd)

Phenotyping Immune
Phenotyping ECAD+
-- Subsetting ECAD+
Phenotyping KI67+ ECAD+
-- Subsetting Immune
Phenotyping CD4+ T
Phenotyping CD8+ T
Phenotyping Non T CD4+ cells
Consolidating the phenotypes across all groups


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  allpos_score['score'] = allpos_score.max(axis=1)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  allpos_score['score'] = allpos_score.max(axis=1)


**Same function if the user wants to run it via Command Line Interface**
```
python gatorPhenotype.py --gatorObject /Users/aj/Desktop/gatorExampleData/GATOR/gatorObject/exampleImage_gatorPredict.ome.h5ad --phenotype /Users/aj/Desktop/gatorExampleData/phenotype_workflow.csv --outputDir /Users/aj/Desktop/gatorExampleData
```

#### If you had provided `outputDir` the object would be stored in `GATOR/gatorPhenotyped/`, else, the object will be returned to memory.

In [6]:
# check the identified phenotypes
adata.obs['phenotype'].value_counts()

KI67+ ECAD+    6159
CD4+ T         5785
CD8+ T          816
Name: phenotype, dtype: int64

#### We also provide some helper functions to vizualize the identified postive and negative cells for each marker. 

#### The `addPredictions` function serves as a link between `gatorpy` and `scimap` package. It's useful for evaluating model performance. The function transforms results stored in `anndata.uns` to `anndata.obs` so they can be visualized using the `scimap` package's `sm.pl.image viewer` function. This displays `positive` and `negative` cells overlaid on the raw image.
    
#### The `addPredictions` function can take in two methods.  `gatorOutput` displays the result of running the `gator` function,  while `gatorScore` shows the raw output produced by the `gatorScore`  function, which returns a probability score. The `midpoint` parameter,  with a default value of 0.5, can be adjusted to define what is considered a `positive` result, when method is set to `gatorScore`.

In [10]:
# set the working directory & set paths to the example data
cwd = '/Users/aj/Desktop/gatorExampleData'
# Module specific paths
gatorObject = cwd + '/GATOR/gatorOutput/exampleImage_gatorPredict.ome.h5ad'

adata = ga.addPredictions (gatorObject, 
                    method='gatorOutput',
                    gatorOutput='gatorOutput',
                    gatorScore='gatorScore', 
                    midpoint=0.5,
                    outputDir=cwd + '/GATOR/gatorOutput/')

In [6]:
# check the results
adata.obs.columns

Index(['X_centroid', 'Y_centroid', 'Area', 'MajorAxisLength',
       'MinorAxisLength', 'Eccentricity', 'Solidity', 'Extent', 'Orientation',
       'CellID', 'imageid', 'p_CD45', 'p_CD4', 'p_CD8A', 'p_CD45R', 'p_KI67',
       'p_ECAD', 'p_CD3D'],
      dtype='object')

#### As it can be seen the addition of `p_CD45, p_CD4, p_CD8A, p_CD45R, p_KI67, p_ECAD, p_CD3D` to `adata.obs`. These columns can be vizualized with `scimap`. 

## We recommend creating a new environment to install scimap

#### Download and install the scimap package. We recommend creating a new conda/python environment

```
# create new conda env (assuming you have conda installed): executed in the conda command prompt or terminal
conda create --name scimap -y python=3.8
conda activate scimap

```

#### Install `scimap` within the conda environment.

```
pip install scimap

# install jupyter notebook if you want to simply execute this notebook.
pip install notebook

```

### Once `scimap` is installed the following function can be used to vizualize the results

In [None]:
# import
import scimap as sm

# import the gatorObject
cwd = '/Users/aj/Desktop/gatorExampleData'
gatorObject = cwd + '/GATOR/gatorOutput/exampleImage_gatorPredict.ome.h5ad'
adata = ad.read(gatorObject)

# Path to the raw image
image_path = '/Users/aj/Desktop/gatorExampleData/image/exampleImage.tif'
image_viewer(image_path, adata, overlay='p_CD45')
