Skip to content

parklab/spatial_sampling_analysis

Repository files navigation

Introduction

This repository contains pipelines for the single-cell and bulk analyses presented in Kim et al 2023 "Cell lineage analysis with somatic mutations reveals late divergence of neuronal cell types and cortical areas in human cerebral cortex" (preprint). If you use any of the code here, please cite

Kim, Sonia Nan, Vinayak V. Viswanadham, Ryan N. Doan, Yanmei Dou, Sara Bizzotto, Sattar Khoshkoo, August Yue Huang, et al. 2023. “Cell Lineage Analysis with Somatic Mutations Reveals Late Divergence of Neuronal Cell Types and Cortical Areas in Human Cerebral Cortex.” bioRxiv. https://doi.org/10.1101/2023.11.06.565899.

Code written by Vinay Viswanadham (Park lab, HMS) with significant contributions by Sonia Kim (BCH/HHMI, now Stanford), Sara Bizzotto (BCH/HHMI, now ICM), Yanmei Dou (HMS, now Westlake University), Emre Caglayan (BCH/HHMI), and Ryan Doan (BCH).

Contents

  1. src/ contains the code used in paper.
  2. Cell_type_annotation/ contains Seurat scripts for analyzing the single-cell RNA-seq and ATAC-seq datasets

Contents of src/

All variant-handling code is in src/, and the most important folder there is src/Snakemake, which contains various workflows and pipelines

  1. 'src/Snakemake/pscmda_analysis', src/scMDA_alignment_pipelines, src/genotyping_pipeline, src/phylogenetic_tree: A workflow that achieves three things: a. src/scMDA_alignment_pipelines and src/genotyping_pipeline align scMDA panel data, call variants, conduct quality control of reads and bases, call variants, genotype sites. b. src/phylogenetic_tree and src/Snakemake/pscmda_analysis: phylogenetic reconstruction of the cells' lineage tree from pscMDA (or theoretically any kind of single-cell genomic data) c. src/Snakemake/pscmda_analysis: conducts coalescent analysis to estimate times-of-origin
  2. single_cell_variant_enrichment_analysis: Analysis of variants in 10X single-cell/nucleus RNA/ATAC-seq data, specifically a workflow to analyze the subpopulation structure of cells using a set of input variants
  3. bulk_wgs_preprocess: Workflows for pre-processing BAM files from WGS (for input into MosaicForecast and other downstream analyses).
  4. summarize_mutation_counts: Scripts to analyze ratios of mutation counts between 2 regions
  5. src/ folders besides Snakemake contain a variety of other Simulation, benchmarking, and power analysis scripts are also provided

Setup

The following is required (set this up in a conda environment)

Adapter trimming tools

  1. UMI tools (https://github.com/CGATOxford/UMI-tools)
  2. Cutadapt (https://cutadapt.readthedocs.io/en/latest/installation.html)

Alignment, QC, and variant-calling tools

  1. bwa
  2. GATK4
  3. samtools/bcftools
  4. bedtools
  5. htslib
  6. vcftools
  7. fastz
  8. Picard
  9. MosaicForecast

Other utilities

module load gcc/6.2.0
#module load samtools/1.3.1
module load samtools
module load fastx/0.0.13
module load python/2.7.12 # using python 3 now...
# module load cutadapt/1.14
module load java/jdk-1.8u112
module load vcftools
module load picard/2.8.0
module load gatk/3.7

Acknowledgements

  • The alignment/adapter trimming code came from Ryan Doan (BCH), whose script served as the template for the data processing portion of this code.
  • The genotyping model came from Yanmei Dou (HMS) and was first reported in Bizotto, Dou, Ganz et al Science 2021.
  • Seurat code is from Emre Caglayan (BCH/HHMI)

About

This repository contains analysis of spatially-resolved somatic mutations datasets.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •