## Google doc with required deps here:


https://docs.google.com/document/d/1fUM2bguzMHy6NOR1kX_zfABbEWaU9nwBfdWSabp0PDA/edit

## Is Docker installed?

In [1]:
docker run hello-world


Hello from Docker!
This message shows that your installation appears to be working correctly.

To generate this message, Docker took the following steps:
 1. The Docker client contacted the Docker daemon.
 2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
 3. The Docker daemon created a new container from that image which runs the
    executable that produces the output you are currently reading.
 4. The Docker daemon streamed that output to the Docker client, which sent it
    to your terminal.

To try something more ambitious, you can run an Ubuntu container with:
 $ docker run -it ubuntu bash

Share images, automate workflows, and more with a free Docker ID:
 https://cloud.docker.com/

For more examples and ideas, visit:
 https://docs.docker.com/engine/userguide/



Let's check we have at least 8GB of memory allocated to docker

In [2]:
docker run -it ubuntu free -h

              total        used        free      shared  buff/cache   available
Mem:           7.8G        129M        6.6G        161M        1.1G        7.3G
Swap:          1.0G        468K        1.0G


## Let's download the required docker images

In [3]:
#docker image for crispor and crispresso
docker pull pinellolab/crispor_crispresso_nat_prot


Using default tag: latest
latest: Pulling from pinellolab/crispor_crispresso_nat_prot
Digest: sha256:5ba8c77818b949466848955f8b46aa7c507c6889803ba6f8450b829146f26ffb
Status: Image is up to date for pinellolab/crispor_crispresso_nat_prot:latest


In [4]:
#docker image for cas-offinder
docker pull lucapinello/cas-offinder

Using default tag: latest
latest: Pulling from lucapinello/cas-offinder
Digest: sha256:e211892414e053f9ed0b975f484b7120c3bf19e0357656d01bd98e70f02156f4
Status: Image is up to date for lucapinello/cas-offinder:latest


In [5]:
#let's check the current folder
pwd

/Users/luca/nanocourse_examples


In [6]:
#let's create a folder that will contain the genome files for CRISPOR and cas-offinder

In [7]:
mkdir crispor_genomes

In [8]:
ls

Nanocourse_CRISPR_2017.ipynb	crispor_genomes
README.md


In [None]:
#download the human genome assembly hg19 (this may take long!)
docker run -v $PWD/crispor_genomes:/crisporWebsite/genomes pinellolab/crispor_crispresso_nat_prot downloadGenome hg19 /crisporWebsite/genomes



## 1. Crispor Demo

In [10]:
echo '>BCL11A' > crispor_input.fa
echo 'CCGAGCCTCTTGAAGCCATTCTTACAGATGATGAACCAGACCACGGCCCGTTGGGAGCTCCAGAAGGGGATCATGACCTCCTCACCTGTGGGCAGTGCCAGATGAACTTCCCATTGGGGGACATTCTTATTTTTATCGAGCACAAACGGAAACAATGCAATGGCAGCCTCTGCTTAGAAAAAGCTGTGGATAAGCCACCTTCCCCTTCACCAATCGAGATGAAAAAAGCATCCAATCCCGTGGAGGTTGGCATCCAGGTCACGCCAGAGGATGACGATTGTTTATCAACGTCATCTAGAGGAATTTGCCCCAAACAGGAACACATAGCAG' >> crispor_input.fa

In [11]:
cat crispor_input.fa

>BCL11A
CCGAGCCTCTTGAAGCCATTCTTACAGATGATGAACCAGACCACGGCCCGTTGGGAGCTCCAGAAGGGGATCATGACCTCCTCACCTGTGGGCAGTGCCAGATGAACTTCCCATTGGGGGACATTCTTATTTTTATCGAGCACAAACGGAAACAATGCAATGGCAGCCTCTGCTTAGAAAAAGCTGTGGATAAGCCACCTTCCCCTTCACCAATCGAGATGAAAAAAGCATCCAATCCCGTGGAGGTTGGCATCCAGGTCACGCCAGAGGATGACGATTGTTTATCAACGTCATCTAGAGGAATTTGCCCCAAACAGGAACACATAGCAG


In [12]:
#design guides
docker  run  \
-v  $PWD/crispor_genomes:/crisporWebsite/genomes \
-v $PWD/:/DATA \
-w /DATA pinellolab/crispor_crispresso_nat_prot \
crispor.py hg19 crispor_input.fa crispor_output.tsv --satMutDir=./


INFO:root:running on sequence ID 'BCL11A'
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[bsw2_aln] read 1 sequences/pairs (330 bp) ...
[main] Version: 0.7.15-r1140
[main] CMD: /crisporWebsite/bin/Linux/bwa bwasw -T 20 /crisporWebsite/genomes/hg19/hg19.fa /tmp/crisporBestMatchfE_x2K.fa
[main] Real time: 32.085 sec; CPU: 15.360 sec
INFO:root:Progress ekg35N12AoucYNOFrDed - effScores - Calculating guide efficiency scores
INFO:root:Wrote eff scores to /tmp/crisporR_HNIr/ekg35N12AoucYNOFrDed.effScores.tab
INFO:root:Progress ekg35N12AoucYNOFrDed - bwa - Alignment of potential guides, mismatches <= 4
[bwa_aln_core] calculate SA coordinate... 7.13 sec
[bwa_aln_core] write to the disk... 0.00 sec
[bwa_aln_core] 49 sequences have been processed.
[main] Version: 0.7.15-r1140
[main] CMD: /crisporWebsite/bin/Linux/bwa aln -o 0 -m 1980000 -n 4 -k 4 -N -l 20 /crisporWebsite/genomes/hg19/hg19.fa /tmp/crisporR_HNIr/ekg35N12AoucYNOFrDed.fa
[main] Real time: 22.185 sec; CPU: 11.700 sec
INFO:root:Progres

In [13]:
!ls 

ls
BCL11A_ontargetAmplicons.tsv	README.md
BCL11A_ontargetPrimers.tsv	crispor_genomes
BCL11A_satMutOligos.tsv		crispor_input.fa
BCL11A_targetSeqs.tsv		crispor_output.tsv
Nanocourse_CRISPR_2017.ipynb


In [14]:
#simulate filtering, in this case we take the top 10 guides
docker run \
-v $PWD/:/DATA \
-w /DATA pinellolab/crispor_crispresso_nat_prot \
bash -c "head -n 10 BCL11A_satMutOligos.tsv > BCL11A_satMutOligos_filtered.tsv" 


In [15]:
head BCL11A_satMutOligos_filtered.tsv

#guideId	targetSeq	mitSpecScore	offtargetCount	targetGenomeGeneLocus	Doench '16EffScore	Moreno-MateosEffScore	Oligonucleotide	AdapterHandle+PrimerFw	AdapterHandle+PrimerRev
1rev	AAGAATGGCTTCAAGAGGCTCGG	22	1525	exon:BCL11A	58	64	GGAAAGGACGAAACACCGAAGAATGGCTTCAAGAGGCTGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGC	TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGCCGTGGTCTGGTTCATCAT	GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGGATGGCATGGGGTTGAGAT
6rev	TCTGTAAGAATGGCTTCAAGAGG	65	211	exon:BCL11A	65	36	GGAAAGGACGAAACACCGTCTGTAAGAATGGCTTCAAGGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGC	TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCCACAGGTGAGGAGGTCATG	GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGTTTCTCCAACCACAGCCGAG
16rev	TGGTTCATCATCTGTAAGAATGG	27	1058	exon:BCL11A	51	37	GGAAAGGACGAAACACCGTGGTTCATCATCTGTAAGAAGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGC	TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCCACAGGTGAGGAGGTCATG	GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGTTTCTCCAACCACAGCCGAG
36rev	GCTCCCAACGGGCCGTGGTCTGG	87	101	exon:BCL11A	38	37	GGAAAGGACGAAACACCGGCTCCCAACGGGCCGTGGTCGTTTTAGAGCTAGAAATAGC

In [16]:
#create filtered files for the experiment and for CRISPResso analysis
docker run -v $PWD:/DATA -w /DATA pinellolab/crispor_crispresso_nat_prot bash -c "join -1 1 -2 1 BCL11A_satMutOligos_filtered.tsv BCL11A_ontargetAmplicons.tsv -o 2.1,2.2,2.3 > CRISPRessoPooled_amplicons.tsv"



In [17]:
#create filtered files for CRISPResso analysis
docker run -v $PWD:/DATA -w /DATA pinellolab/crispor_crispresso_nat_prot bash -c "join -1 1 -2 1 BCL11A_satMutOligos_filtered.tsv BCL11A_ontargetPrimers.tsv -o 2.1,2.2,2.3,2.4,2.5,2.6,2.7 > BCL11A_ontargetPrimers_filtered.tsv"

In [18]:
head  BCL11A_ontargetPrimers_filtered.tsv

#guideId forwardPrimer leftPrimerTm revPrimer revPrimerTm ampliconSequence guideSequence
1rev GCCGTGGTCTGGTTCATCAT 60.393 GGATGGCATGGGGTTGAGAT 59.813 GCCGTGGTCTGGTTCATCATCTGTAAGAATGGCTTCAAGAGGCTCGGCTGTGGTTGGAGAAACAAAAGCACAATTATTAGAGTGCCAGAGAGGACAGAAAGGGGAGAAGCACATCTCAACCCCATGCCATCC AAGAATGGCTTCAAGAGGCT
6rev CCACAGGTGAGGAGGTCATG 59.749 TTTCTCCAACCACAGCCGAG 60.250 CCACAGGTGAGGAGGTCATGATCCCCTTCTGGAGCTCCCAACGGGCCGTGGTCTGGTTCATCATCTGTAAGAATGGCTTCAAGAGGCTCGGCTGTGGTTGGAGAAA TCTGTAAGAATGGCTTCAAG
16rev CCACAGGTGAGGAGGTCATG 59.749 TTTCTCCAACCACAGCCGAG 60.250 CCACAGGTGAGGAGGTCATGATCCCCTTCTGGAGCTCCCAACGGGCCGTGGTCTGGTTCATCATCTGTAAGAATGGCTTCAAGAGGCTCGGCTGTGGTTGGAGAAA TGGTTCATCATCTGTAAGAA
36rev CCACAGGTGAGGAGGTCATG 59.749 TTTCTCCAACCACAGCCGAG 60.250 CCACAGGTGAGGAGGTCATGATCCCCTTCTGGAGCTCCCAACGGGCCGTGGTCTGGTTCATCATCTGTAAGAATGGCTTCAAGAGGCTCGGCTGTGGTTGGAGAAA GCTCCCAACGGGCCGTGGTC
41rev CCACAGGTGAGGAGGTCATG 59.749 TTTCTCCAACCACAGCCGAG 60.250 CCACAGGTGAGGAGGTCATGATCCCCTTCTGGAGCTCCCAACGGGCCGTGGTCTGGTTCATCATC

## 2. CRISPResso Demo

Let's now try to run CRIPSResso on data from the last guide in this table:  


52forw CCACAGGTGAGGAGGTCATG 59.749 TTTCTCCAACCACAGCCGAG 60.250 CCACAGGTGAGGAGGTCATGATCCCCTTCTGGAGCTCCCAACGGGCCGTGGTCTGGTTCATCATCTGTAAGAATGGCTTCAAGAGGCTCGGCTGTGGTTGGAGAAA TGAACCAGACCACGGCCCGT

The amplicon for the experiment was a little bit different since it was not designed with CRISPRO:

AMPLICON:
AATGTCCCCCAATGGGAAGTTCATCTGGCACTGCCCACAGGTGAGGAGGTCATGATCCCCTTCTGGAGCTCCCAACGGGCCGTGGTCTGGTTCATCATCTGTAAGAATGGCTTCAAGAGGCTCGGCTGTGGTT 

GUIDE:
TGAACCAGACCACGGCCCGT


In [12]:
#let's download the data
wget https://github.com/lucapinello/cripsr_nanocourse_hms/blob/master/reads1.2000.fastq.gz?raw=true -O reads1.2000.fastq.gz
wget https://github.com/lucapinello/cripsr_nanocourse_hms/blob/master/reads2.2000.fastq.gz?raw=true -O reads2.2000.fastq.gz

--2017-10-17 13:35:31--  https://github.com/lucapinello/cripsr_nanocourse_hms/blob/master/reads1.2000.fastq.gz?raw=true
Resolving github.com... 192.30.253.113, 192.30.253.112
Connecting to github.com|192.30.253.113|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://github.com/lucapinello/cripsr_nanocourse_hms/raw/master/reads1.2000.fastq.gz [following]
--2017-10-17 13:35:31--  https://github.com/lucapinello/cripsr_nanocourse_hms/raw/master/reads1.2000.fastq.gz
Reusing existing connection to github.com:443.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/lucapinello/cripsr_nanocourse_hms/master/reads1.2000.fastq.gz [following]
--2017-10-17 13:35:31--  https://raw.githubusercontent.com/lucapinello/cripsr_nanocourse_hms/master/reads1.2000.fastq.gz
Resolving raw.githubusercontent.com... 151.101.116.133
Connecting to raw.githubusercontent.com|151.101.116.133|:443... connected.
HTTP request sent, awaiting respo

In [13]:
#finally we run CRISPResso
docker  run \
-v $PWD:/DATA \
-w /DATA pinellolab/crispor_crispresso_nat_prot  \
CRISPResso -r1 reads1.2000.fastq.gz -r2 reads2.2000.fastq.gz \
-a AATGTCCCCCAATGGGAAGTTCATCTGGCACTGCCCACAGGTGAGGAGGTCATGATCCCCTTCTGGAGCTCCCAACGGGCCGTGGTCTGGTTCATCATCTGTAAGAATGGCTTCAAGAGGCTCGGCTGTGGTT \
-g TGAACCAGACCACGGCCCGT \
-s 20 \
-q 30 \
-n BCL11A_exon2


INFO  @ Tue, 17 Oct 2017 17:35:39:
	 Cut Points from guide seq:[76] 

	 Folder CRISPResso_on_BCL11A_exon2 already exists. 

INFO  @ Tue, 17 Oct 2017 17:35:39:
	 Filtering reads with average bp quality < 30 and single bp quality < 20 ... 

INFO  @ Tue, 17 Oct 2017 17:35:39:
	 Estimating average read length... 

INFO  @ Tue, 17 Oct 2017 17:35:39:
	 Merging paired sequences with Flash... 

INFO  @ Tue, 17 Oct 2017 17:35:39:
	 Done! 

INFO  @ Tue, 17 Oct 2017 17:35:39:
	 Preparing files for the alignment... 

INFO  @ Tue, 17 Oct 2017 17:35:39:
	 Done! 

INFO  @ Tue, 17 Oct 2017 17:35:39:
	 Aligning sequences... 

INFO  @ Tue, 17 Oct 2017 17:35:39:
	 Align sequences to reverse complement of the amplicon... 

INFO  @ Tue, 17 Oct 2017 17:35:39:
	 Done! 

INFO  @ Tue, 17 Oct 2017 17:35:39:
	 Quantifying indels/substitutions... 

INFO  @ Tue, 17 Oct 2017 17:35:40:
	 Done! 

INFO  @ Tue, 17 Oct 2017 17:35:40:
	 Calculating indel distribution based on the length of the reads... 

INFO  @ Tue, 17 

## 3. CasOffinder Demo

Finally let's compute all the potential off-target of our guide with cas-offinder with up to 6 mismatches

In [21]:
#let's take a look to the command syntax
docker run lucapinello/cas-offinder cas-offinder --help

Cas-OFFinder v2.4 (Aug 17 2016)

Copyright (c) 2013 Jeongbin Park and Sangsu Bae
Website: http://github.com/snugel/cas-offinder

Usage: cas-offinder {input_file} {C|G|A}[device_id(s)] {output_file}
(C: using CPUs, G: using GPUs, A: using accelerators)

Example input file:
/var/chromosomes/human_hg19
NNNNNNNNNNNNNNNNNNNNNRG
GGCCGACCTGTCGCTGACGCNNN 5
CGCCAGCGTCAGCGACAGGTNNN 5
ACGGCGCCAGCGTCAGCGACNNN 5
GTCGCTGACGCTGGCGCCGTNNN 5

Available device list:
Type: CPU, ID: 0, <Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz> on <Intel(R) OpenCL>


In [22]:
#now we create the required input file

In [23]:
pwd

/Users/luca/nanocourse_examples


In [24]:
echo '/DATA/crispor_genomes/hg19/hg19.2bit' > cas_offinder_input.txt
echo 'NNNNNNNNNNNNNNNNNNNNNGG' >> cas_offinder_input.txt
echo 'TGAACCAGACCACGGCCCGTNGG 5' >> cas_offinder_input.txt

In [25]:
cat cas_offinder_input.txt

/DATA/crispor_genomes/hg19/hg19.2bit
NNNNNNNNNNNNNNNNNNNNNGG
TGAACCAGACCACGGCCCGTNGG 5


In [26]:
#finally we can run cas-offinder 
docker run \
-v $PWD:/DATA \
-w /DATA  \
lucapinello/cas-offinder  \
cas-offinder cas_offinder_input.txt C cas_offinder_output.txt

Total 1 device(s) found.
Loading input file...
Reading /DATA/crispor_genomes/hg19/hg19.2bit...
Sending data to devices...
Chunk load started.
1 devices selected to analyze...
Finding pattern in chunk #1...
Comparing patterns in chunk #1...
1 devices selected to analyze...
Finding pattern in chunk #2...
Comparing patterns in chunk #2...
1 devices selected to analyze...
Finding pattern in chunk #3...
Comparing patterns in chunk #3...
1 devices selected to analyze...
Finding pattern in chunk #4...
Comparing patterns in chunk #4...
1 devices selected to analyze...
Finding pattern in chunk #5...
Comparing patterns in chunk #5...
1 devices selected to analyze...
Finding pattern in chunk #6...
Comparing patterns in chunk #6...
1 devices selected to analyze...
Finding pattern in chunk #7...
Comparing patterns in chunk #7...
1 devices selected to analyze...
Finding pattern in chunk #8...
Comparing patterns in chunk #8...
1 devices selected to analyze...
Finding pattern in chunk #9...
Comparing 

In [27]:
head cas_offinder_output.txt

TGAACCAGACCACGGCCCGTNGG	chr1	36654002	TGgACCAGACCACaGgCCtaTGG	+	5
TGAACCAGACCACGGCCCGTNGG	chr1	18029332	gGgAtCAGACCcCtGCCCGTGGG	+	5
TGAACCAGACCACGGCCCGTNGG	chr1	22988021	TGAAttAGACCACtGtCCcTTGG	-	5
TGAACCAGACCACGGCCCGTNGG	chr1	17709871	TGAACCAGACCtaGaCCCcTGGG	-	4
TGAACCAGACCACGGCCCGTNGG	chr1	47695328	TGAACCAGACCgatcCCaGTTGG	-	5
TGAACCAGACCACGGCCCGTNGG	chr1	23063054	gGAACCAGACCgCGGCCtGcAGG	-	4
TGAACCAGACCACGGCCCGTNGG	chr1	37809486	TGAggCAGACCcCaGCCCtTAGG	-	5
TGAACCAGACCACGGCCCGTNGG	chr1	61253698	TGggCCAGACCAaGGCaCGTGGG	-	4
TGAACCAGACCACGGCCCGTNGG	chr1	42062459	TGAcCaAGAtCAgGGCCCtTGGG	-	5
TGAACCAGACCACGGCCCGTNGG	chr1	15102904	TGAcCCAGACCcCtGtCCtTGGG	+	5
