# Benign Prostate Cell Atlas

## Introduction
With an atlas of multiple single-cell RNA samples, visualize the cell types, the distribution of cell types within each sample, and some of the marker gene characteristics. Basic Figure 1 for an atlas paper.

Cell atlases should describe the basic cell types found, how the are distrbuted in individual samples, and key defining marker genes for each of the cell types. The goal is generally to allow future researchers to better navigate the cell types and expressed genes in similar types of samples.

Though these plots are with benign prostate, the `.h5ad` should be easily swappable with any number of atlases from CZI cellxgene or other sources.

## Workflow Steps
1. Visualize UMAP features of cell clustering to determine if cell types are well-separated.
2. Visualize cell type distribution within each sample to detect sample-associated differences.
3. Visualize marker genes for each cell type to validate cell type assignments.

## Workflow Input Data
* pre-processed AnnData atlas, with `.obsm` for:
  * UMAP embeddings
  * cell typings for each cell
  * sample name for each cell
* Mapping of cell type --> list of marker genes. This could be from a subject matter expert, or automatically generated from a stats tool or an SCVerse tool.


## Figure A: UMAP of Cell Type and Sample Source

### Questions:
- How are different cell types distributed in our spatial embedding, and do they form distinct clusters?
- Are there any unexpected mixing patterns between cell types?

### Features:
- Switch the cell point coloring between assigned cell type and sample source of each cell.

### Inputs:
- UMAP coordinates for each cell
- Cell type annotations
- Sample source identifiers

### Expected Output/Evaluation:
Well-defined cell type clusters with minimal batch effects would show as distinct color regions in cell type view, but evenly distributed sample colors in sample view.

![image.png](attachment:0093b7b5-7744-478b-bfbe-2ae93c58d352.png)

## Figure B: Cell Type Distribution Per Sample

### Questions:
- What is the relative abundance of each cell type across samples?
- Are there any notable sample-specific variations?

### Features:
- Linked selection with Figure A UMAP plot to provide context for where specific sample populations exist in UMAP feature space.
    - [TODO: Add xaction to click on a sample in the bar plot to highlight the corresponding UMAP points.]
- Linked selection with Figure C DotMap plot to provide context for marker gene specificity of selected sample.
- Toggle between absolute and percentage-based cell counts to facilitate comparison and avoid contextualizing differences in sample size.

### Inputs:
- Cell type annotations
- Sample identifiers
- Cell counts per type per sample

### Expected Output/Evaluation:
* While allowing for biological variation, look for highly inconsistent cell type proportions in samples.
* Mark outlier samples or cell types for more detailed investigation
  * For example: if immune cells in particular ore off, look for additional marker genes to be added to the set
* [TODO: Maybe add a metric (chi-square test?) to quantify the consistency of cell type proportions across samples and maybe sort the samples by this metric. There may be metrics in existing atlas papers, otherwise this could be a good quick pub]

![image.png](attachment:c40c214a-0812-49ec-b8ff-a1d2b570f2d2.png)

## Figure C: Marker Gene Distribution

### Questions:
- Do canonical markers adequately define our cell types?
- Are there any marker genes showing unexpected expression patterns?

### Features:
- Toggle heatmap to collapse fraction of cell in group.
- Toggle/Tab tracksplot to expand view to include expression level of every cell in assigned cell-type cluster.

### Inputs:
- Gene expression matrix
- Cell type annotations
- Marker gene list per cell type

### Expected Output/Evaluation:
Expect high expression of markers in their assigned cell types with minimal expression elsewhere.

![image.png](attachment:9b88d114-f30f-4da4-92e7-c97d9f22d082.png)

## Putting it all together:

Make an interactive dashboard by laying out and interlinking the plots. Add a template and make as servable so it can be deployed as a standalone web application.

[TODO: make app]


## Next Steps

* Compare novel marker genes to expression in HCA or other large sets in cellxgene
* 