Code and processed data reported in Senturk et al.(2016), Nat. Commun. [in press]
DATA
The raw sequence data for this study is available on the Sequence Read Archive under accession number SRP078612. The first 10K reads for each of the 8 samples are provided in the FASTQ files in the data_tiny/
directory. The Cas9-targeted region of p53, as well as the primers use to amplify this locus for sequencing, are given in data/regions.txt
. The content of each of the 8 sequencing samples is described in data/samples.txt
.
PUBLISHED RESULTS
The published results are provided in the directory published/
.
all_unique_seqs.txt
shows all of the unique reconstructed sequences for the p53 locus described in this paper.all_summarized_seqs.txt
shows summary information for each unique sequence.stats.txt
shows the number and percentage of sequences in each sample that could be successfully reconstructed.alignment.txt
shows an alignment of the most prevalent sequences, along with their observed counts in the 8 samples.rates.pdf
is the bar chart shown in Fig. 2Dmutations.pdf
is the bar chart shown in Fig. 2E
RUNNING THE PIPELINE ON SAMPLE DATA
- Download this repository and change to the top-level directory
15_sordella/
- Exectue
$ ./run_tiny.py
, which will run the pipeline on the small sample datasets indata_tiny/
. The results will be stored inoutput/results/
- Exectue
$ ./make_plots.py
, which will create the correspondign plots and store them inoutput/results/
- The results of this pipeline will be in
output/results/
RUNNING THE PIPELINE ON THE FULL DATASETS.
- Download the eight paired-end read datasets from SRP078612. Save the resulting 16 files (8 for read1, 8 for read2) in fastq format in the directory
data/
. - Edit
run.py
so that the variablesread1_file_glob
,read1_file_glob
, andsplit_file_globs
point to these fastq files. - Set the variable
use_multiple_nodes = False
inrun.py
to run the analysis in single node mode. To run analysis on multiple nodes, setuse_multiple_nodes = True
. To get the analysis on multiple nodes working in your cluster environment, however, you might have to edit the functionsubmit_and_complete_jobs()
, defined inpipleine.py
- Exectue
$ ./run.py
, which will run the pipeline on the small sample datasets indata_tiny/
. The results will be stored inoutput/results/
- Exectue
$ ./make_plots.py
, which will create the correspondign plots and store them inoutput/results/
- The results of this pipeline will be in
output/results/