Cell atlas of the developing human brain

Data and code related to our manuscript Comprehensive cell atlas of the first-trimester developing human brain (Emelie Braun, Miri Danan-Gotthold et al. 2022, in review).

Preprint (bioRxiv)

https://www.biorxiv.org/content/10.1101/2022.10.24.513487v1

Code

We used the Shoji tensor database and the cytograph-shoji pipeline.

Code for making many of the figures is available as Jupyter notebooks

Data

Complete dataset

Metadata per sample: table_S1.xlsx

Metadata per cluster: table_S2.xlsx

Raw data: EGAS00001004107

Complete processed dataset: HumanFetalBrainPool.h5

See further below for a description of the content of the .h5 files

Alternative expression matrices generated with the "standard" cellranger + velocyto pipeline using cellranger GRCh38-3.0.0 annotations are available in loom and anndata formats:

human_dev_GRCh38-3.0.0.loom

human_dev-GRCh38-3.0.0.h5ad (Annotations basically follow CELLxGENE standards.)

human_dev-GRCh38-3.0.0_all_layers.h5ad (The same but including 'ambiguous', 'spliced', and 'unspliced' layers.)

These files contain exactly the same cells as the HumanFetalBrainPool.h5 file. Some ~8000 cells that were filtered out by this procedure have zero total UMI count.

Working datasets

(coming soon)

Spatial EEL FISH datasets

Section 1 Z=970um
Section 2 Z=810um
Section 3 Z=640um
3 spatial EEL FISH datasets of sagittaly cut full human embryo at 5 weeks post conception. Data is in the .parquet format and can be opened by FISHscale, Python Pandas or any other Parquet reader.
r_px_microscope_stitched and c_px_microscope_stitched contain the RNA molecule coordinates in pixels (pixel size of 0.18um).
r_transformed and c_transformed contain the RNA molecule coordinates in pixels (pixel size of 0.27um).
Tissue and Brain columns indicate if the detected molecules are in the tissue or in the brain respectively.

Description of tensors

The datasets are provided as HDF5 files containing the tensors listed below. In Python, they can be accessed using h5py (other languages have similary libraries).

The most important tensors are Expression (the expression matrix; sum of spliced and unspliced UMIs), Gene (gene names), Accession (Ensembl accessions), Clusters (cluster labels), Embedding (tSNE), Factors (PCA components), ManifoldIndices (KNN graph edges) and ManifoldWeights (KNN graph edge weights).

	dtype	rank	dims	shape	(values)
Accession	string	1	genes	59,480	["pCAG-DsRed2_101-650", "pCS-Cherry-DEST_101-850", "pCAG ···
Age	float32	1	cells	1,665,937	[8.0, 8.0, 8.0, 8.0, 8.0, ...]
AnnotationDefinition	string	1	annotations	51	["+MPZ", "+EYA1 +ISL1", "+NHLH1", "+MEIS2 +ISL1 +SIX3", ···
AnnotationDescription	string	1	annotations	51	["Schwann cell-like (E-SCHWL; +MPZ)", "Otic vesicle of t ···
AnnotationName	string	1	annotations	51	["E-SCHWL", "HB-OTV", "NBL", "TH-RETN", "CB-PURK", ...]
AnnotationPosterior	float32	2	clusters ✕ annotations	617 ✕ 51	[[-1.8189894e-12, 6.617445e-24, 1.0, 3.3087225e-24, 3.30 ···
CellClass	string	1	cells	1,665,937	["Erythrocyte", "Erythrocyte", "Erythrocyte", "Erythrocy ···
CellCycleFraction	float32	1	cells	1,665,937	[0.0, 0.0001071352, 0.0, 0.00095663266, 0.0, ...]
CellID	string	1	cells	1,665,937	["10X89_1:AAACGGGAGGCTACGA", "10X89_1:ACGAGGAAGAGCCTAG", ···
Chemistry	string	1	cells	1,665,937	["v2", "v2", "v2", "v2", "v2", ...]
Chromosome	string	1	genes	59,480	["chrEXTRA", "chrEXTRA", "chrEXTRA", "chrEXTRA", "chrEXT ···
Class	string	1	clusters	617	["Neuroblast", "Radial glia", "Radial glia", "Glioblast" ···
ClusterID	uint32	1	clusters	617	[0, 1, 2, 3, 4, ...]
Clusters	uint32	1	cells	1,665,937	[240, 240, 236, 240, 233, ...]
Donor	string	1	cells	1,665,937	["BRC2006", "BRC2006", "BRC2006", "BRC2006", "BRC2006", ...]
DoubletFlag	bool	1	cells	1,665,937	[False, False, False, False, False, ...]
DoubletScore	float32	1	cells	1,665,937	[0.02, 0.02, 0.03, 0.01, 0.02, ...]
DropletClass	uint8	1	cells	1,665,937	[0, 0, 0, 0, 0, ...]
Embedding	float32	2	cells ✕ 2	1,665,937 ✕ 2	[[22.061909, 11.055673], [23.594717, 10.600938], [25.339 ···
End	string	1	genes	59,480	["550", "1320", "2090", "3610", "4730", ...]
Enrichment	float32	2	clusters ✕ genes	617 ✕ 59,480	[[1.0, 1.0, 1.0, 1.0, 1.0, ...], [1.0, 1.0, 1.0, 1.0, 1. ···
Expression	uint16	2	cells ✕ genes	1,665,937 ✕ 59,480	[[0, 0, 0, 0, 0, ...], [0, 0, 0, 0, 0, ...], [0, 0, 0, 0 ···
Factors	float32	2	cells ✕ __	1,665,937 ✕ 50	[[-1.5914472, 1.524089, 0.21222332, -4.3109193, -5.85292 ···
Gene	string	1	genes	59,480	["marker-DsRed", "marker-Cherry", "marker-GFP", "marker- ···
GeneNonzeros	uint32	1	genes	59,480	[0, 0, 0, 0, 0, ...]
GeneTotalUMIs	uint32	1	genes	59,480	[0, 0, 0, 0, 0, ...]
Linkage	float32	2	__ ✕ 4	616 ✕ 4	[[238.0, 239.0, 0.0016231078, 2.0], [237.0, 617.0, 0.002 ···
Loadings	float32	2	genes ✕ __	59,480 ✕ 50	[[0.0, 0.0, 0.0, 0.0, 0.0, ...], [0.0, 0.0, 0.0, 0.0, 0. ···
ManifoldIndices	uint32	2	__ ✕ 2	40,164,783 ✕ 2	[[0, 6], [0, 106], [0, 208], [0, 225], [0, 246], ...]
ManifoldRadius	float32	0	()	()	1.0
ManifoldWeights	float32	1	__	40,164,783	[0.9746674, 0.9753966, 0.97435904, 0.9760038, 0.98073715 ···
MeanAge	float64	1	clusters	617	[10.651846331718932, 10.967863210449874, 10.768960981864 ···
MeanCellCycle	float64	1	clusters	617	[0.002357223176804402, 0.003319249633509612, 0.023186484 ···
MeanDoubletScore	float64	1	clusters	617	[0.09462042097992746, 0.11769588179965942, 0.19775236498 ···
MeanExpression	float64	2	clusters ✕ genes	617 ✕ 59,480	[[0.0, 0.0, 0.0, 0.0, 0.0, ...], [0.0, 0.0, 0.0, 0.0, 0. ···
MeanTotalUMI	float64	1	clusters	617	[5449.63220088626, 5258.164957264958, 7567.301298701311, ···
MitoFraction	float32	1	cells	1,665,937	[0.0, 0.0038568673, 0.008797339, 0.0015943878, 0.0018687 ···
NCells	uint64	1	clusters	617	[1354, 1170, 770, 1232, 1536, ...]
NGenes	uint32	1	cells	1,665,937	[121, 271, 674, 101, 113, ...]
Nonzeros	uint64	2	clusters ✕ genes	617 ✕ 59,480	[[0, 0, 0, 0, 0, ...], [0, 0, 0, 0, 0, ...], [0, 0, 0, 0 ···
OverallTotalUMIs	uint64	0	()	()	13029800607
PrevClusters	uint32	1	cells	1,665,937	[658, 658, 662, 658, 669, ...]
Recipe	string	1	__	2	["{'InitializeWorkspace': {'from_workspace': 'samples202 ···
Region	string	1	cells	1,665,937	["Telencephalon", "Telencephalon", "Telencephalon", "Tel ···
SampleID	string	1	cells	1,665,937	["10X89_1", "10X89_1", "10X89_1", "10X89_1", "10X89_1", ...]
SelectedFeatures	bool	1	genes	59,480	[False, False, False, False, False, ...]
Sex	string	1	cells	1,665,937	["", "", "", "", "", ...]
Species	string	0	()	()	"Homo sapiens"
Start	string	1	genes	59,480	["1", "571", "1341", "2111", "3631", ...]
StdevExpression	float32	1	genes	59,480	[0.0, 0.0, 0.0, 0.0, 0.0, ...]
Subdivision	string	1	cells	1,665,937	["Cortex", "Cortex", "Cortex", "Cortex", "Cortex", ...]
Subregion	string	1	cells	1,665,937	["Cortex", "Cortex", "Cortex", "Cortex", "Cortex", ...]
Tissue	string	1	cells	1,665,937	["Cortex", "Cortex", "Cortex", "Cortex", "Cortex", ...]
TopLevelCluster	uint32	1	cells	1,665,937	[25, 25, 25, 25, 25, ...]
TotalUMIs	uint32	1	cells	1,665,937	[4630, 9334, 9321, 3136, 4281, ...]
Trinaries	float32	2	clusters ✕ genes	617 ✕ 59,480	[[-1.8189894e-12, -1.8189894e-12, -1.8189894e-12, -1.818 ···
UnsplicedFraction	float32	1	cells	1,665,937	[0.3514039, 0.33833298, 0.3174552, 0.32589287, 0.3585611 ···
ValidCells	bool	1	cells	1,665,937	[True, True, True, True, True, ...]
ValidGenes	bool	1	genes	59,480	[False, False, False, False, False, ...]

Genes and transcripts annotation

Our gene and transcripts annotation is based on Based on GRCh38.p13 gencode V35 primary sequence assembly.

We discarded genes or transcripts that overlapped or mapped to other genes or non-coding RNAs 3’ UTR.

The GTF file used for read counts: gb_pri_annot_filtered.gtf

The genes and transcripts that were discarded: filtered_transcripts.txt

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
notebooks		notebooks
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

notebooks

notebooks

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

Repository files navigation

Cell atlas of the developing human brain

Preprint (bioRxiv)

Code

Data

Complete dataset

Working datasets

Spatial EEL FISH datasets

Description of tensors

Genes and transcripts annotation

About

Releases 1

Packages

Contributors 5

Languages

License

linnarsson-lab/developing-human-brain

Folders and files

Latest commit

History

Repository files navigation

Cell atlas of the developing human brain

Preprint (bioRxiv)

Code

Data

Complete dataset

Working datasets

Spatial EEL FISH datasets

Description of tensors

Genes and transcripts annotation

About

Resources

License

Stars

Watchers

Forks

Languages