15_sordella

Code and processed data reported in Senturk et al.(2016), Nat. Commun. [in press]

DATA

The raw sequence data for this study is available on the Sequence Read Archive under accession number SRP078612. The first 10K reads for each of the 8 samples are provided in the FASTQ files in the data_tiny/ directory. The Cas9-targeted region of p53, as well as the primers use to amplify this locus for sequencing, are given in data/regions.txt. The content of each of the 8 sequencing samples is described in data/samples.txt.

PUBLISHED RESULTS

The published results are provided in the directory published/.

all_unique_seqs.txt shows all of the unique reconstructed sequences for the p53 locus described in this paper.
all_summarized_seqs.txt shows summary information for each unique sequence.
stats.txt shows the number and percentage of sequences in each sample that could be successfully reconstructed.
alignment.txt shows an alignment of the most prevalent sequences, along with their observed counts in the 8 samples.
rates.pdf is the bar chart shown in Fig. 2D
mutations.pdf is the bar chart shown in Fig. 2E

RUNNING THE PIPELINE ON SAMPLE DATA

Download this repository and change to the top-level directory 15_sordella/
Exectue $ ./run_tiny.py, which will run the pipeline on the small sample datasets in data_tiny/. The results will be stored in output/results/
Exectue $ ./make_plots.py, which will create the correspondign plots and store them in output/results/
The results of this pipeline will be in output/results/

RUNNING THE PIPELINE ON THE FULL DATASETS.

Download the eight paired-end read datasets from SRP078612. Save the resulting 16 files (8 for read1, 8 for read2) in fastq format in the directory data/.
Edit run.py so that the variables read1_file_glob, read1_file_glob, and split_file_globs point to these fastq files.
Set the variable use_multiple_nodes = False in run.py to run the analysis in single node mode. To run analysis on multiple nodes, set use_multiple_nodes = True. To get the analysis on multiple nodes working in your cluster environment, however, you might have to edit the function submit_and_complete_jobs(), defined in pipleine.py
Exectue $ ./run.py, which will run the pipeline on the small sample datasets in data_tiny/. The results will be stored in output/results/
Exectue $ ./make_plots.py, which will create the correspondign plots and store them in output/results/
The results of this pipeline will be in output/results/

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
data		data
data_tiny		data_tiny
output		output
published		published
.DS_Store		.DS_Store
.gitignore		.gitignore
.junk		.junk
LICENSE.txt		LICENSE.txt
README.md		README.md
make_plots.py		make_plots.py
pipeline.py		pipeline.py
routine_collate_stats.py		routine_collate_stats.py
routine_make_alignments.py		routine_make_alignments.py
routine_parse_seqs.py		routine_parse_seqs.py
routine_summarize_seqs.py		routine_summarize_seqs.py
routine_tally_seqs.py		routine_tally_seqs.py
run.py		run.py
run_tiny.py		run_tiny.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

15_sordella

About

Releases 2

Packages

Languages

License

jbkinney/15_sordella

Folders and files

Latest commit

History

Repository files navigation

15_sordella

About

Resources

License

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages