## Reading Flow Cytometry data into R

v1.0 (2021-03-29)  
Lucas Graybuck  

### Purpose

In this notebook, we'll see how to read gating counts and flow cytometry .fcs files into R for analysis.

This notebook was generated using the `R` language, running in the Jupyter Notebook environment on a HISE IDE instance. See the end of the document for the [Session Info](#Session-Info) for additional software version details.

<a name = "contents"></a>

### Contents

- [Importing packages](#Importing-packages)
- [Reading gating counts](#Reading-gating-counts)
    - [Counts from hise](#Counts-from-hise)
- [Reading .fcs files](#Reading-.fcs-files)
- [Session Info](#Session-Info)

### Importing packages

Gating counts are simply stored as .csv files, so we can process those using base R packages. We'll load the `hise` package to demonstrate direct loading of these results.

For .fcs files, we'll use the `flowCore` package to read data.

In [1]:
library(hise)
library(flowCore)

### Reading gating counts

As part of our automated flow cytometry pipelines, we count cells that are found in a set of predefined gates. These can be read in directly using `hise`.

#### Counts from `hise`

As shown in the notebook, `01-R Retrieving data from HISE.ipynb`, we can use the `getFileDescriptors()` function to locate gating counts:

In [2]:
cu1_filter_list <- list(
    cohort.cohortGuid = "CU1",
    file.panel = "PS1"
)

cu1_count_desc <- getFileDescriptors(
    fileType = "FlowCytometry-supervised-stats", 
    filter = cu1_filter_list)

In [3]:
length(cu1_count_desc)

We can then read in these files using `readCytometryFile()`:

In [4]:
cu1_count_list <- lapply(
    cu1_count_desc,
    function(desc) {
        readCytometryFile(desc$file$id,
                          format = "values")
    }
)

We now have a list where every entry is a data.frame of gate counts.

In [5]:
head(cu1_count_list[[3]])

Unnamed: 0_level_0,name,Population,Parent,Count,ParentCount
Unnamed: 0_level_1,<chr>,<chr>,<chr>,<int>,<int>
1,B006_PS1_PB00068-01_QC.fcs,/Cells,root,392441,418976
2,B006_PS1_PB00068-01_QC.fcs,/Cells/Singlets-H,/Cells,366205,392441
3,B006_PS1_PB00068-01_QC.fcs,/Cells/Singlets-H/Singlets-W,/Cells/Singlets-H,344261,366205
4,B006_PS1_PB00068-01_QC.fcs,/Cells/Singlets-H/Singlets-W/Cleanup,/Cells/Singlets-H/Singlets-W,343465,344261
5,B006_PS1_PB00068-01_QC.fcs,/Cells/Singlets-H/Singlets-W/Cleanup/Non-viable,/Cells/Singlets-H/Singlets-W/Cleanup,15480,343465
6,B006_PS1_PB00068-01_QC.fcs,/Cells/Singlets-H/Singlets-W/Cleanup/Viable,/Cells/Singlets-H/Singlets-W/Cleanup,327985,343465


For downstream analysis, we may want to convert these results to a matrix:

In [6]:
cu1_count_matrix <- matrix(nrow = length(cu1_count_list),
                           ncol = nrow(cu1_count_list[[1]]))
colnames(cu1_count_matrix) <- sub(".+/","",cu1_count_list[[1]]$Population)
rownames(cu1_count_matrix) <- unlist(lapply(cu1_count_list, function(counts) { counts$name[1] }))

for (i in 1:length(cu1_count_list)) {
    cu1_count_matrix[i,] <- cu1_count_list[[i]]$Count
}

In [7]:
cu1_count_matrix[1:5,1:10]

Unnamed: 0,/Cells,Singlets-H,Singlets-W,Cleanup,Non-viable,Viable,Leukocytes,CD19+ Cells,B Cells,CD11c+ B Cells
B006_PS1_PB00067-01_QC.fcs,233728,216385,206407,205398,19726,185672,184902,27261,26436,549
B006_PS1_PB00090-01_QC.fcs,321007,299096,278908,278638,12633,266005,265576,14251,13773,517
B006_PS1_PB00068-01_QC.fcs,392441,366205,344261,343465,15480,327985,325936,16697,15756,1098
B006_PS1_PB00073-01_QC.fcs,366031,346260,318704,317998,15387,302611,301891,23191,22303,508
B009_PS1_PB00095-01_QC.fcs,242446,227653,217468,217115,14102,203013,201182,7293,6996,598


We could then quickly normalize this data to the fraction of the "Leukocytes" gate, and drop the QC columns:

In [8]:
cu1_norm_matrix <- apply(
    cu1_count_matrix,
    1, # 1 for Row-wise analysis
    function(counts) {
        counts / counts[colnames(cu1_count_matrix) == "Leukocytes"]
    }
)
# apply transposes the data, so we need to switch back
cu1_norm_matrix <- t(cu1_norm_matrix)
cu1_norm_matrix <- cu1_norm_matrix[,7:ncol(cu1_norm_matrix)]

In [9]:
cu1_norm_matrix[1:5,1:10]

Unnamed: 0,Leukocytes,CD19+ Cells,B Cells,CD11c+ B Cells,CD123+ B Cells,CD27-IgD- B Cells,Naive B Cells,Plasmablasts,Post Switch Memory B Cells,Pre Switch Memory B Cells
B006_PS1_PB00067-01_QC.fcs,1,0.14743486,0.14297303,0.00296914,0.0027852592,0.002055143,0.12781365,0.0010708375,0.007019935,0.006084304
B006_PS1_PB00090-01_QC.fcs,1,0.05366072,0.05186086,0.001946712,0.0114204597,0.002180167,0.02237778,0.0003351206,0.012956743,0.014346176
B006_PS1_PB00068-01_QC.fcs,1,0.05122785,0.04834078,0.00336876,0.0011321241,0.002055618,0.02379608,0.0011229198,0.013005621,0.009483457
B006_PS1_PB00073-01_QC.fcs,1,0.07681912,0.07387766,0.001682727,0.0115372767,0.001835099,0.04389995,0.0007850516,0.022726746,0.005415862
B009_PS1_PB00095-01_QC.fcs,1,0.03625076,0.03477448,0.002972433,0.0005964748,0.001451422,0.01934567,0.0007704467,0.006581106,0.007396288


### Reading .fcs files

In the notebook `01-R Retrieving data from HISE.ipynb`, we searched for and retrieved some .fcs data to the cache.

Here, we'll load this dataset into R using `flowCore` for analysis

In [10]:
cache_file_info <- read.csv("cache_info.csv")

In [11]:
fcs_file_ids <- cache_file_info$file.id[cache_file_info$file.fileType == "FlowCytometry"]

In [12]:
fcs_file_ids

In [13]:
fcs_file <- list.files(paste0("cache/", fcs_file_ids[1]), full.names = TRUE)
fcs_file

In [14]:
fcs_data <- read.FCS(fcs_file)

In [15]:
fcs_data

flowFrame object 'd967c76b-60be-434c-9f46-734f8357c7e4'
with 381484 cells and 32 observables:
                  name      desc  range minRange maxRange
$P1              FSC-A      <NA> 262144        0   262144
$P2              FSC-H      <NA> 262144        0   262144
$P3              FSC-W      <NA> 262144        0   262144
$P4              SSC-A      <NA> 262144        0   262144
$P5              SSC-H      <NA> 262144        0   262144
$P6              SSC-W      <NA> 262144        0   262144
$P7           BUV395-A       CD3 262144     -111   262144
$P8           BUV496-A      CD45 262144     -111   262144
$P9           BUV563-A      CD15 262144     -111   262144
$P10          BUV615-A    CD45RA 262144     -111   262144
$P11          BUV661-A      CD14 262144     -111   262144
$P12          BUV737-A       CD8 262144     -111   262144
$P13          BUV805-A     CD11c 262144     -111   262144
$P14           BV421-A      CD25 262144     -111   262144
$P15           BV480-A       CD4 262

### Session Info

In [16]:
sessionInfo()

R version 4.0.2 (2020-06-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3

locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
 [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
 [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
[10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] flowCore_2.0.1 hise_1.0.3    

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.6          BiocGenerics_0.36.0 uuid_0.1-4         
 [4] R6_2.5.0            rlang_0.4.10        fansi_0.4.2        
 [7] httr_1.4.2          tools_4.0.2         parallel_4.0.2     
[10] Biobase_2.50.0      utf8_1.2.1      