# Evalutaion of a scoring metric across cell types and cells

## Motivation

Often times in analysis there will be a scalar scoring of each cell for various attributes. In this example, the score is for benign prostatic hyperplasia (BPH), and is a function of a small number of gene's expression.

The goal here is to find which cell types exhibit high score and to examine those cells.

## Overview of Workflow Steps

1. Select the cell types with the strongest signal
2. Reanalyze the subset of data, looking for finer-grained cell types and associated marker genes
3. Evaluate the score across the new cell types
4. Evaluate the distribution of these new cell types across patient samples, looking for correlation with other clinical factors associated with the cell type prevalance.

## Input

* pre-processed AnnData atlas, with `.obsm` for:
  * score for each cell (BPH in these figures)
  * cell typings for each cell
  * sample name for each cell
  * a `.uns` section with clinical attributes about each sample (here, prostate size)

## Figure A: Violin plots of signature across cell types

### Questions:
* Which cell types show particularly high or low levels of signal?
* Does the signal distinguish cell types?

### Expected output/evaluation
* Action by the user: selection of cell types for further analysis.
* In this example, "Fibroblast" and "Smooth_Muscle" are selected.

![image.png](attachment:6fac2f60-7d81-488f-b81e-28cdfccb1066.png)

## Figure B: Cluster the subset, derive cluster names

### Computational steps performed outside the visualization:

1) subset data & clustering
2) differential gene expression of the clusters
3) manual/automatic selection of key marker genes per cluster

## Paired visualization
* visualization of the clustering via UMAP
* visualization of the clustering by dot plot and key marker genes

### Questions:
* What genes define the clusters?
* What cell types contribute most to each cluster?
* What is a descriptive name for the finer cell types?

## Expected output
* New `.obsm` factor describing the new cell types

![image.png](attachment:e2b45efc-31cd-4b8c-9776-522bde9c1b8d.png)
![image.png](attachment:e69a6d5d-d7fe-46af-9ebe-0e1758b41958.png)

## Figure C: Evaluate the new cell types against the original BPH signal

### Questions:
* Do the new clusters explain much of the BPH signal variance?
* Which cell types have particularly high or low signal

### Features
* swappable UMAP point color (signal, new clusterings, original clusterings)

### Evaluation/Output
* Did this clustering provide additonal explanatory power, or were the original clusters just as useful?

![image.png](attachment:6529c28d-5297-43b0-a435-6f925c41a761.png)

## Figure D: Evaluate the subtypings aganist samples and clinical attributes of the samples

### Questions:
* In general, how does the new typing correlate with other clinical factors in the atlas?
* Does each cell type show up in all the samples? Only some of the samples?
* Does one of the new cell types explain/correlate with prostate hyperplasia (> 100g)?

### Features:
* Todo: can interactivity improve this plot pair?

### Evaluation/Expected Output
* If there is interesting clinical signal in the new subtypes, generate static figs for the manuscript
* Potentially back-propagate the new celltype partioning back to the full dataset

![image.png](attachment:3df0c10d-c33f-4422-bc09-6508fd8105e0.png)