Please cite the following manuscript in press:
Liu, Y., Zou, R.S., He, S., Nihongaki, Y., Li, X., Razavi, S., Wu, B. and Ha, T., 2020. Very fast CRISPR on demand. Science, 368(6496), pp.1265-1269. https://doi.org/10.1126/science.aay8204
- python 2.7 (Anaconda's python distribution comes with the required numpy and scipy libraries)
- pysam
- bowtie2
- samtools
- Ensure that both
samtools
andbowtie2
are added to path and can be called directly from bash
- The data for MRE11 and γH2AX ChIP-seq before/after Cas9 activation with light can be downloaded from Here.
- The data from DISCOVER-seq (Wienert & Wyman et al, Science, 2019) for comparison can be downloaded from Here.
- Download sequencing reads in FASTQ format from SRA
- Download either the human or mouse prebuilt bowtie2 indices
- Human hg38
- Mouse mm10 (ftp://ftp.ccb.jhu.edu/pub/data/bowtie2_indexes/mm10.zip)
- move to the corresponding folders named
hg38_bowtie2/
ormm10_bowtie2/
- Download either human (hg38) or mouse (mm10) genome assembly in FASTA format
- Generate FASTA file indices
samtools faidx hg38_bowtie2/hg38.fa
samtools faidx mm10_bowtie2/mm10.fa
- Modify the first lines of bash script
process_reads.sh
- Fill the list variable
filelist
with paths to each sample name for processing. Note that the actual file paths include the sample name followed by "_1.fastq" or "_2.fastq", denoting read1 or read2 of paired-end reads, respectively. - Fill in the path to the indexed genome denoted by variable
genomepath
.
- Fill the list variable
- Run the following code snippet, where
-p
denotes the number of samples to process in parallel; modify accordingly. This performs genome alignment, filtering, sorting, removal of PCR duplicates, indexing, and sample statistics output. This step takes less than one day to complete on our Intel i7-8700K, 32GB RAM desktop, though speed appears to be bottlenecked by read/writes to disk.bash process_reads.sh -p 6
- (optional) For fair comparison between different time points in a timeseries, we subset reads from all relevant samples to the sample with the fewest reads of the set. Run the following code snippet, where
-s
inputs the number of mapped reads to subset for each sample; modify accordingly. This step takes less than 10 minutes.bash process_reads.sh -p 6 -s 24400000
- for MRE11 timeseries, subsetted to 24,400,000 reads for both replicates.
- for γH2AX timeseries, subsetted to 43,275,829 reads for replicate 1 and 62,271,079 reads for replicate 2.
- for MRE11 DNA-PKcs inhibitor experiments, subsetted to 11,923,070 reads for both replicates.
In addition to raw paired-end reads in FASTQ format, we have also uploaded pre-processed sequencing reads in BAM format to SRA. These are the output of the previous section. It is highly recommended to start from these BAM files.
- Download pre-processed paired-end reads in BAM format from SRA.
- Move the downloaded data for MRE11 and γH2AX ChIP-seq to the desired folder.
- If not already done, index the downloaded BAM files:
samtools index /path/to/output.bam
- Set
base
variable to be the path to the directory that holds the BAM files. - Create a new folder to hold the output of the analysis, set its path to
base_a
. - Ensure that all file names are correct (if the BAM files were directly downloaded from SRA, they should be), then run script.
Analyze MRE11 ChIP-seq data from Wienert & Wyman et al (Science, 2019)
- FASTQ reads with the following SRA run accession codes (SRR) were downloaded from Here.
SRR8550692, SRR8550673, SRR8550703, SRR8550680, SRR8550681, SRR8550704, SRR8550684, SRR8550705, SRR8550693, SRR8550695, SRR8553800, SRR8553810, SRR8553804, SRR8553806
- Generate BAM files from raw FASTQ reads following instructions from the previous sections. These files will be used in this script.
- Ensure that the file names are correctly referenced in script, then run script.
Analyze ChIP-seq against multiple repair factors from Wienert & Wyman et al (Science, 2019)
- FASTQ reads with the following SRA run accession codes (SRR) were downloaded from Here.
SRR8550677, SRR8550696, SRR8550679, SRR8550694, SRR8550682, SRR8550699, SRR8550678, SRR8550697, SRR8550698, SRR8550690
- Generate BAM files from raw FASTQ reads following instructions from the previous sections. These files will be used in this script.
- Ensure that the file names are correctly referenced in script, then run script.