Skip to content

All scripts and code necessary to recreate analyses and figures from Goh et al 2024

License

Notifications You must be signed in to change notification settings

harmstonlab/Goh_et_al_deaf1

Repository files navigation

deaf1_paper

All scripts and code necessary to recreate figures from Goh et al 2023


R 4.2.2 Quarto v1.3.340 Status:in-progress

  • ChIP-seq: Deaf1 in C2C12 myoblasts
  • Expression: RNA-seq data from 4 GEO datasets

-----------------------------------------------------

➤ Table of Contents

-----------------------------------------------------

➤ ChIP-seq

Code for ChIP-seq figure. Contains analysis of Deaf-1 ChIP-seq data for C2C12 mouse myoblasts.

Important files

Code files

Brief description of pipeline

  1. Quick QC with fastqc
  2. Align fastqc files to genome with bwa.
  3. Flag duplicates and unmapped reads with samblaster
  4. Filter to remove duplicates and unmapped reads with sambamba
  5. Check quality of filtered reads with samtools stats and sambamba
  6. QC using ChIPQC in R
  7. Call peaks with macs2
  8. Download files for visualization in IGV
  9. Keep only high quality, reproducible peaks with idr.
  10. Download files for downstream analysis in R.

Peak stats

Using macs2 q = 0.1:

  • IP1: 1860 peaks
  • IP2: 3126 peaks
  • 1550 total peaks upon merging with IDR
  • 608 peaks that are 0 <= IDR <= 0.05
  • 595 unique genes bound by 608 IDR peaks
  • 12 genes have >1 binding site; max number of binding sites in a gene is 3.

SHA checksums

  • Mus_musculus.GRCm38.102.chr.gtf: e211ecb4ee8b735630a57c32a18715d0
  • mm10.fa: 7e329b2bf419a9f5a7dc42c625c884ac

Version info

  • FastQC: v0.11.8
  • BWA: 0.7.17-r1198-dirty
  • samblaster: 0.1.24
  • sambamba: 0.7.0
  • samtools: 1.9
  • macs2: 2.1.2
  • ChIPQC: 1.32.2

-----------------------------------------------------

➤ Expression (RNA-seq)

Code for Supplementary Figure 1. Contains RNA-seq data from GEO.

Data

Counts for the three sarcoatlas (mouse) datasets were downloaded from GEO. For gse133979_pdac (human), raw data was downloaded from SRA, realigned and then analyzed.

Title GEO accession Annotation file used
sarcoatlas_agingcrrm GSE139209 Mus_musculus.GRCm38.94.chr.gtf.gz
sarcoatlas_timecourse_mouse GSE145480 Mus_musculus.GRCm38.94.chr.gtf.gz
sarcoatlas_tscmko GSE139213 Mus_musculus.GRCm38.94.chr.gtf.gz
gse133979_pdac GSE133979 Homo_sapiens.GRCh38.109.chr.gtf.gz

Checksums

Annotation file MD5 checksum
Homo_sapiens.GRCh38.109.chr.gtf.gz 4fbfbb5c5fadf35f50f8f7134d7a2412
Mus_musculus.GRCm38.94.chr.gtf.gz 9aa004b6c98fc1aec6973af98e22b822

Version info

  • FastQC: v0.11.8
  • STAR: 2.7.1a
  • RSEM: v1.3.1
  • MultiQC: version 1.14 (931327d)

-----------------------------------------------------

➤ Analysis

! Ensure that the annotation files are in the expression/annotation folder before running the analysis.

If running from scratch, these scripts can be found as .qmd files in the expression folder. If the repository is cloned as is, and the .Rproj file is used, there should have no issues running the scripts.

-----------------------------------------------------

➤ Acknowledgements

This table of contents was built with https://github.com/andreasbm/readme

About

All scripts and code necessary to recreate analyses and figures from Goh et al 2024

https://www.biorxiv.org/content/10.1101/2024.01.12.575306v1.abstract

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages