## Selection of progenitor cells from Raj 2020 datasets

Here I load the previously downloaded datasets from Raj 2020 paper (Raj, B. et al. Emergence of Neuronal Diversity during Vertebrate Brain Development. Neuron 1–17 (2020)) that had been deposited as Seurat objects, and manually select clusters of interest for each timepoint. I focus on the late embryonic and larval progenitors which I want to compare with the adult radial glia, so I merge them into the same file here.
(previously I ran this as part of Harmony integration, now saving as separate script)

In [None]:
library(Seurat)
library(ggplot2)
library(dplyr)
library(RColorBrewer)

### Load and update Seurat objects

In [5]:
# load Raj 2020 datasets - separate for each timepoint
# ommit some of the early embryonic datasets with low overlap of cell types

larvae_2dpf_in <- readRDS(file = "/local/Nina/download_data/raj_2020/GSE158142_zf2dpf_cc_filt.cluster.rds")
larvae_3dpf_in <- readRDS(file = "/local/Nina/download_data/raj_2020/GSE158142_zf3dpf_cc_filt.cluster.rds")
larvae_5dpf_in <- readRDS(file = "/local/Nina/download_data/raj_2020/GSE158142_zf5dpf_cc_filt.cluster.rds")
larvae_8dpf_in <- readRDS(file = "/local/Nina/download_data/raj_2020/GSE158142_zf8dpf_cc_filt.cluster4.rds")
larvae_15dpf_in <- readRDS(file = "/local/Nina/download_data/raj_2020/GSE158142_zf15dpf_PCAALL.rds")

In [6]:
# datasets were deposited in old Seurat version, update to current
larvae_2dpf <- SeuratObject::UpdateSeuratObject(larvae_2dpf_in)
larvae_3dpf <- SeuratObject::UpdateSeuratObject(larvae_3dpf_in)
larvae_5dpf <- SeuratObject::UpdateSeuratObject(larvae_5dpf_in)
larvae_8dpf <- SeuratObject::UpdateSeuratObject(larvae_8dpf_in)
larvae_15dpf <- SeuratObject::UpdateSeuratObject(larvae_15dpf_in)

Updating from v2.X to v3.X

Validating object structure

Updating object slots

Ensuring keys are in the proper strucutre

Ensuring feature names don't have underscores or pipes

Object representation is consistent with the most current Seurat version

Updating from v2.X to v3.X

Validating object structure

Updating object slots

Ensuring keys are in the proper strucutre

Ensuring feature names don't have underscores or pipes

Object representation is consistent with the most current Seurat version

Updating from v2.X to v3.X

Validating object structure

Updating object slots

Ensuring keys are in the proper strucutre

Ensuring feature names don't have underscores or pipes

Object representation is consistent with the most current Seurat version

Updating from v2.X to v3.X

Validating object structure

Updating object slots

Ensuring keys are in the proper strucutre

Ensuring feature names don't have underscores or pipes

Object representation is consistent with the most current Seur

In [7]:
rm(larvae_2dpf_in, larvae_3dpf_in, larvae_5dpf_in,
  larvae_8dpf_in, larvae_15dpf_in)

### Add cell type annotation to metadata

Since cell type annotation is not included in the Seurat object, I add it from the information provided in the supplemental table.

In [8]:
# first add stage-specific metadata column based on which integration will later be done
larvae_2dpf$stage <- "larvae_2dpf"
larvae_3dpf$stage <- "larvae_3dpf"
larvae_5dpf$stage <- "larvae_5dpf"
larvae_8dpf$stage <- "larvae_8dpf"
larvae_15dpf$stage <- "larvae_15dpf"

In [9]:
# load file with cell type annotation for all stages & subset by stage to avoid mistakes / name overlaps when adding names

larvae_annotation <- read.csv(file = "/local/Nina/jupyterlab/larvae_adult_int/raj_2020_assigned_clusters.csv",
                             sep = ';')

In [10]:
head(larvae_annotation)

Unnamed: 0_level_0,nr,stage,cluster,assigned_cell_type
Unnamed: 0_level_1,<int>,<chr>,<int>,<chr>
1,1,12 hpf,0,optic vesicle
2,2,12 hpf,1,optic vesicle
3,3,12 hpf,2,midbrain
4,4,12 hpf,4,neural crest
5,5,12 hpf,5,mid-hind boundary/ant hindbrain
6,6,12 hpf,7,midbrain neural rod


In [11]:
table(larvae_annotation$stage)


12 hpf 14 hpf 15 dpf 16 hpf 18 hpf  2 dpf 20 hpf 24 hpf  3 dpf 36 hpf  5 dpf 
    44     41     99     54     59     69     70     59     73     65     94 
 8 dpf 
    89 

In [13]:
larvae_annotation_2dpf <- larvae_annotation[which(larvae_annotation$stage == "2 dpf"),]
larvae_annotation_3dpf <- larvae_annotation[which(larvae_annotation$stage == "3 dpf"),]
larvae_annotation_5dpf <- larvae_annotation[which(larvae_annotation$stage == "5 dpf"),]
larvae_annotation_8dpf <- larvae_annotation[which(larvae_annotation$stage == "8 dpf"),]
larvae_annotation_15dpf <- larvae_annotation[which(larvae_annotation$stage == "15 dpf"),]

In [12]:
# create new metadata vectors with cluster names (currently numbers)
clusters_numeric_vector_2dpf <- larvae_2dpf@active.ident
clusters_numeric_vector_3dpf <- larvae_3dpf@active.ident
clusters_numeric_vector_5dpf <- larvae_5dpf@active.ident
clusters_numeric_vector_8dpf <- larvae_8dpf@active.ident
clusters_numeric_vector_15dpf <- larvae_15dpf@active.ident

In [14]:
# use vectors of cluster numbers from each object as template for mapping corresponding cluster names
celltype_assigned_vector_2dpf <- plyr::mapvalues(clusters_numeric_vector_2dpf,
                                        from = as.vector(larvae_annotation_2dpf$cluster),
                                        to = as.vector(larvae_annotation_2dpf$assigned_cell_type))
celltype_assigned_vector_3dpf <- plyr::mapvalues(clusters_numeric_vector_3dpf,
                                        from = as.vector(larvae_annotation_3dpf$cluster),
                                        to = as.vector(larvae_annotation_3dpf$assigned_cell_type))
celltype_assigned_vector_5dpf <- plyr::mapvalues(clusters_numeric_vector_5dpf,
                                        from = as.vector(larvae_annotation_5dpf$cluster),
                                        to = as.vector(larvae_annotation_5dpf$assigned_cell_type))
celltype_assigned_vector_8dpf <- plyr::mapvalues(clusters_numeric_vector_8dpf,
                                        from = as.vector(larvae_annotation_8dpf$cluster),
                                        to = as.vector(larvae_annotation_8dpf$assigned_cell_type))
celltype_assigned_vector_15dpf <- plyr::mapvalues(clusters_numeric_vector_15dpf,
                                        from = as.vector(larvae_annotation_15dpf$cluster),
                                        to = as.vector(larvae_annotation_15dpf$assigned_cell_type))

In [18]:
# add metadata to seurat object
larvae_2dpf$celltype_assigned <- celltype_assigned_vector_2dpf
larvae_3dpf$celltype_assigned <- celltype_assigned_vector_3dpf
larvae_5dpf$celltype_assigned <- celltype_assigned_vector_5dpf
larvae_8dpf$celltype_assigned <- celltype_assigned_vector_8dpf
larvae_15dpf$celltype_assigned <- celltype_assigned_vector_15dpf

### Select for progenitor cell types

Select for progenitor cell types and radial glia. Ommit unrelated cell types (eg retina) which are not present in any of my adult samples. I'm unclear on what exactly "glial progenitors" are and the marker genes indicate at least some overlap with radial glia, so I include them for now.  

In [19]:
# based on the supplemental table, select progenitor cells for each time points
larvae_2dpf_prog <- subset(larvae_2dpf, subset = celltype_assigned %in% 
                            c("progenitors", "progenitors (midbrain)", "radial glia",
                             "progenitors/neurons (differentiating)", "glial progenitors"))

In [20]:
larvae_3dpf_prog <- subset(larvae_3dpf, subset = celltype_assigned %in% 
                            c("progenitors", "progenitors/neurons (differentiating)", "radial glia",
                             "progenitors (midbrain)", "glial progenitors"))

In [21]:
larvae_5dpf_prog <- subset(larvae_5dpf, subset = celltype_assigned %in% 
                            c("progenitors", "radial glia", "progenitors/differentiating granule cells (hindbrain)",
                             "progenitors/neurons (differentiating)", "progenitors (cycling)", "glial progenitors"))

In [22]:
larvae_8dpf_prog <- subset(larvae_8dpf, subset = celltype_assigned %in% 
                            c("progenitors", "progenitors/neurons (differentiating)", "radial glia",
                             "progenitors (cycling)", "glial progenitors"))

In [23]:
larvae_15dpf_prog <- subset(larvae_15dpf, subset = celltype_assigned %in% 
                            c("radial glia", "progenitors/neurons (differentiating)", 
                              "progenitors", "URL progenitors", "progenitors (ventral)", 
                              "progenitors (cycling)", "progenitors/differentiating ", "glia progenitors"))

In [29]:
rm(larvae_2dpf, larvae_3dpf, larvae_5dpf, larvae_8dpf, larvae_15dpf)

### Load adult dataset and combine all progenitors

In [30]:
# load cleaned up version of RG dataset (doublets excluded)
rg_pool_sub <- readRDS(file = "/local/Nina/jupyterlab/brains_trans/rg_pool_reg_sub.rds")

In [31]:
dim(rg_pool_sub)

In [32]:
rg_pool_sub$stage <- "adult"

In [33]:
progenitors <- merge(larvae_2dpf_prog, c(larvae_3dpf_prog, larvae_5dpf_prog,
                                        larvae_8dpf_prog, larvae_15dpf_prog, rg_pool_sub))

In [34]:
dim(progenitors)

In [35]:
table(progenitors$stage)


       adult larvae_15dpf  larvae_2dpf  larvae_3dpf  larvae_5dpf  larvae_8dpf 
       15829         9782         1619         2049         2988         4181 

In [37]:
rm(rg_pool_sub, larvae_2dpf_prog, larvae_3dpf_prog,
  larvae_5dpf_prog, larvae_8dpf_prog, larvae_15dpf_prog)

In [38]:
saveRDS(progenitors, file = "/local/Nina/jupyterlab/larvae_adult_int/prog_2dpf-adult_in.rds")