Skip to content

shishenyxx/Sperm_control_cohort_mosaicism

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Sperm control cohort mosaicism

This repository collects pipelines, codes, and some intermediate results for the study of mosaic SNV/Indels for sperm, blood, and other samples of a control cohort. Raw data of this study is available here and here. The first and second dataset can be accessed through SRA Run Selector.

Sperm_Mosaic_Cover


1. Pipelines for the process of whole-genome sequencing data

1.1 Pipelines for WGS data process and quality control

Pipelines for pre-processing of the bams.

Codes for depth of coverage and insertsize distribution.

1.2 Codes for the population origin analysis

Pipeline for population analysis, and codes for plot.

1.3 Pipelines for mosaic SNV/indel calling and variant annotations

Pipelines for MuTect2 (paired mode) and Strelka2 (somatic mode) variant calling from WGS data

Pipelines for MuTect2 (single mode) has a "Leave One Out" version for the YA cohort, and a "Full Panel of Normal" version for the AA and ASD cohort. The MuTect2 (single mode) result is followed by MosaicForecast, and the variant annotation pipeline.

Codes to plot the calls of different methods on simulated variants.

Codes and data for different CIRCOS plots.


2. Pipelines for the process of Targeted Amplicon Sequencing (TAS)

2.1 Pipelines for TAS data alignment and processing

Pipelines for alignment, processing, and germline variant calling of TAS reads.

2.2 Pipelines for AF quantification and variant annotations

Pipelines for AF quantification and variant anntations.

Codes to filter and annotate on TAS data.


3. Pipelines for the data analysis, variant filtering, comprehensive annotations, and statistical analysis

3.1 Pipelines for mosaic variant determination, annotations, and plotting

After variant calling from different strategies, variants were annotated and filtered by a python script and positive mosaic variants as well as the corresponding samples and additional information were annotated.

Codes for permutation analysis from gnomAD and codes for plotting the permutation result.

UpSet plot is generated from an online tool.

3.2 Pipelines for statistically analysis, and the related plotting

Codes for the estimation of accumulation of mutations through a stepwise exponential regression regression model.

Codes for the analysis of accuracy of number of variants and estimate limit of sampling with age in different groups.


4. Contact:

📧 Xiaoxu Yang: xiy010@health.ucsd.edu, yangxiaoxu-shishen@hotmail.com

📧 Martin Breuss: martin.breuss@cuanschutz.edu

📧 Joseph Gleeson: jogleeson@health.ucsd.edu, or the Gleeson lab gleesonlab@health.ucsd.edu


5. Cite the data and codes:

Yang X & Breuss MW, et al., Gleeson JG. Developmental and temporal characteristics of clonal sperm mosaicism. 2021. (Cell, DOI:10.1016/j.cell.2021.07.024, PMID:34388390)

Sperm_Mosaic_Cover

About

This repository collects pipelines, codes, and some intermediate results for the study of mosaic SNV/Indels for sperm, blood, and other samples of a control cohort.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published