SkipGuide Analysis

The code used for the analysis and production of the results described in the paper Machine learning based CRISPR gRNA design for therapeutic exon skipping.

Dependencies

The analysis was performed using Python 3.7.5 and Jupyter. The dependencies are listed in environment.yml. We recommend using the conda package manager from Anaconda Python to create an environment for running the analysis:

conda env create -f environment.yml

Activate the environment by:

conda activate skipguide_data_processing

Data Files

The provided Jupyter notebooks (see Usage section) can produce all the results starting from the raw sequencing data. However, computations can take a very long time, on the order of hours or days depending on computational resources. The notebooks are configured to skip certain long computations if pre-computed files are available. We recommend you instead download the pre-computed files before running the notebooks.

Raw Data Files

If you opt to not use the pre-computed files, the raw sequencing data needs to be available. Download them from here (raw/archive.tar.bz2), extract, and place the *.fastq files in the data/reads directory before running the provided notebooks. Alternatively, the same *.fastq files are available on NCBI SRA, BioProject accession PRJNA647416. Running all the notebooks may take on the order of hours or days depending on computational resources.

Pre-Computed Files

If you opt to use the pre-computed files, the raw sequencing data is not necessary. Download the pre-computed files from here (precomputed/cache.tar.xz), extract, and replace the cache directory with the extracted cache directory. Running all the notebooks should then take less than half an hour.

Usage

You can open the provided Jupyter notebooks under src and view the outputs. This section details how you can run the notebooks from scratch.

See Data Files section to include the necessary data files.

If pre-computed files are not used, modify the NUM_PROCESSES variable in config.py to specify the number of cores for multiprocessing.

Start a Jupyter notebook server, e.g.:

jupyter notebook --port=8888

Run the provided notebooks under src in the following order:

Inspect the comments and markdown in the notebooks for more context.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
cache		cache
data		data
output		output
src		src
tools		tools
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cache

cache

data

data

output

output

src

src

tools

tools

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

environment.yml

environment.yml

Repository files navigation

SkipGuide Analysis

Dependencies

Data Files

Raw Data Files

Pre-Computed Files

Usage

About

Releases

Packages

Languages

License

gifford-lab/skipguide-analysis

Folders and files

Latest commit

History

Repository files navigation

SkipGuide Analysis

Dependencies

Data Files

Raw Data Files

Pre-Computed Files

Usage

About

Resources

License

Stars

Watchers

Forks

Languages