- Code for filtering is available in
code/filtering/
- The output of this sequence is the raw data file available for download from GEO
- Raw fetal cell atlas data can be downloaded from
https://atlas.brotmanbaty.org/bbi/human-gene-expression-during-development/
(5,000 sampled cells from each cell type, cell annotations, and gene annotations) - The workflow for generating fetal cell type gene signatures is at
rules/fca_annotation.py
. We explored three versions of the classifier: one with all 77 fetal cell atlas subtypes, one with a subset of 33 cell types curated to be more distinctive and found to offer greater classification accuracy on held-out test data, and one augmented set including the 33 fetal cell atlas cell types along with an IPSC signature from Plurinet- The relevant code is found in the
code/annotation
folder - Note that the 33 cell type classifier was used for the annotation of cells that was ultimately used for eQTL calling across cell types
- The relevant code is found in the
- The workflow for generating the fetal cell atlas classifier and classifying HDC cells is at
rules/fca_annotation.py
- Code for this part of the analysis is in the folder
code/annotation
rules/sc_preprocessing.py
contains the rules used to generate the UMAP embedding, with associated code incode/sc_preprocessing/
- UMI counts for each donor and cell type are first aggregated in
analysis/annotation/assign_cellid.ipynb
- Initial pseudobulk normalization and generation of a few QC metrics is done in
code/static_qtl_calling/pseudobulk_tmm-basic-qc.R
, followed by manual QC (based on inspection of PC plots) inanalysis/static_qtl_calling/eb_cellid/pseudobulk_tmm/basic/pseudobulk_qc.Rmd
. After removing outlier samples based on this manual step, we preprocess the pseudobulk data incode/static_qtl_calling/pseudobulk_tmm-basic-agg.R
- After data wrangling (spread over several rules in
rules/static_qtl_calling.py
), we call eQTLs using tensorQTL incode/static_qtl_calling/static_qtl_calling/tensorqtl_permutations.py
- We control the FDR across all genes in
code/static_qtl_calling/static_qtl_calling/tensorqtl_fdr.py
- Snakemake rules for the multivariate (cross-celltype) mash analysis are in
rules/mash_qtl_calling.py
, with associated code incode/mash_qtl_calling/
- Snakemake rules for trajectory isolation and pseudotime estimation are in
rules/trajectory_inference.py
, with associated code incode/trajectory_inference/
- Snakemake rules for pseudobulk aggregation (by pseudotime binning) and dynamic eQTL calling are in
rules/dynamic_qtl_calling.py
, with associated code incode/dynamic_qtl_calling/
- Snakemake rules for pseudocell aggregation, topic modeling, and the topic DE analysis are in
rules/fast_topics.py
, with associated code incode/fast_topics/
- Snakemake rules for topic eQTL calling with CellRegMap, including multiple testing correction and post hoc estimation of effect sizes, are in
rules/cellregmap_eqtl_calling.py
, with associated code incode/cellregmap_eqtl_calling/
- Snakemake rules for the comparison of cell type eQTLs to schizophrenia GWAS loci are in
rules/scz_analysis.py
with code in `
- To run the
mash
analysis, we usedflashier
, which was not available on anaconda at the time- We therefore installed the package manually from https://github.com/willwerscheid/flashier
into the conda environment specified at
slurmy/r-mashr.yml
- We therefore installed the package manually from https://github.com/willwerscheid/flashier
into the conda environment specified at