# **Installation and Setup**

---

Run this to set up CRISPRware environment in Colab



In [1]:
!mkdir -p /root/.mamba/pkgs
!chmod -R 777 /root/.mamba
!wget -qO- https://micromamba.snakepit.net/api/micromamba/linux-64/latest | tar -xvj bin/micromamba

bin/micromamba


In [2]:
!git clone https://github.com/ericmalekos/crisprware crisprware
%cd crisprware

Cloning into 'crisprware'...
remote: Enumerating objects: 590, done.[K
remote: Counting objects: 100% (117/117), done.[K
remote: Compressing objects: 100% (81/81), done.[K
remote: Total 590 (delta 74), reused 75 (delta 35), pack-reused 473 (from 1)[K
Receiving objects: 100% (590/590), 107.75 MiB | 14.38 MiB/s, done.
Resolving deltas: 100% (275/275), done.
Updating files: 100% (121/121), done.
/content/crisprware


In [3]:
!/content/bin/micromamba env create -f environment.yml -n crisprware --root-prefix /content/micromamba --quiet -y
!/content/bin/micromamba run -n crisprware --root-prefix /content/micromamba pip install .

    Be aware that packages installed with 'pip' are managed independently from 'conda-forge' channel.
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m21.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.2/6.2 MB[0m [31m8.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.7/12.7 MB[0m [31m39.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m39.9/39.9 MB[0m [31m47.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.9/12.9 MB[0m [31m130.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m40.8/40.8 MB[0m [31m47.3 MB/s[0m eta [36m0:00:00[0m
[?25hProcessing /content/crisprware
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: crisprware
  Building wheel for crisprware (setup.py) ... 

In [4]:
# A helper function to run commands in the crisprware environment
# Wrap each module in this command before running
# This is only required in the Colab environment
def run_in_crisprware(command):
    !/content/bin/micromamba run -n crisprware --root-prefix /content/micromamba {command}

# **Tutorial**


---

Download [osa1_r7 genome and annotation file](https://rice.uga.edu/download_osa1r7.shtml):

In [5]:
!mkdir -p osa1_r7
%cd osa1_r7
!wget -q --show-progress https://rice.uga.edu/osa1r7_download/osa1_r7.asm.repeat_masked.fa.gz
!wget -q --show-progress https://rice.uga.edu/osa1r7_download/osa1_r7.all_models.gff3.gz
!gunzip *.gz

/content/crisprware/osa1_r7


#### For each protein coding gene, extract the longest isoform model.  
**Note:** the osa1_r7 gff3 annotation needs to be updated to included "gene_id" and "transcript_id" features, the `preprocess_annotation` module will handle this automatically and output an updated gff.

In [6]:
run_in_crisprware('\
                  preprocess_annotation --gtf osa1_r7.all_models.gff3 \
                  --model longest \
                  ')



Conversion successful. GTF file saved as osa1_r7.all_models.gtf

	Saving longest CDS GTF to: /content/crisprware/osa1_r7/osa1_r7.all_models/osa1_r7.all_models_longestCDS.gtf




#### Build genome index

In [7]:
run_in_crisprware('index_genome --fasta osa1_r7.asm.repeat_masked.fa')

Attempting to read raw sequence file (if constructed)...
No raw sequence file "osa1_r7.asm.repeat_masked.fa.forward.dna". Building now...
No raw sequence file "osa1_r7.asm.repeat_masked.fa.reverse.dna". Building now...
[2024-10-09 00:18:44.252] [guidescan2] [[32minfo[m] Constructing genomic index.
[2024-10-09 00:18:44.774] [guidescan2] [[32minfo[m] Constructing forward genomic index.
[2024-10-09 00:20:35.183] [guidescan2] [[32minfo[m] Constructing reverse genomic index.
[2024-10-09 00:22:20.528] [guidescan2] [[32minfo[m] Index construction complete.
	Removing file: osa1_r7.asm.repeat_masked.fa.forward.dna
	Removing file: osa1_r7.asm.repeat_masked.fa.reverse.dna


#### Find AGG adjacent protospacers in CDS

In [8]:
run_in_crisprware('\
                  generate_guides -f osa1_r7.asm.repeat_masked.fa \
                  --locations_to_keep osa1_r7.all_models/osa1_r7.all_models_longestCDS.gtf \
                  --feature CDS \
                  --pam AGG \
                  --threads 2 \
                  --coords_as_active_site \
                  ')


	Chromosomes for which to find targets:	Chr1 Chr10 Chr11 Chr12 Chr2 Chr3 Chr4 Chr5 Chr6 Chr7 Chr8 Chr9 ChrSy ChrUn
	Processing Chr1
	Processing Chr2
	Processing Chr3
	Processing Chr4
	Processing Chr5
	Processing Chr6
	Processing Chr7
	Processing Chr8
	Processing Chr9
	Processing Chr10
	Processing Chr11
	Processing Chr12
	Processing ChrUn
	Processing ChrSy

	Saved output file to /content/crisprware/osa1_r7/osa1_r7.asm.repeat_masked_gRNA/osa1_r7.asm.repeat_masked_gRNA.bed



#### Score gRNAs

To reduce run time for demonstration purposes we will subset the first 10,000 guides and score those.  
We will also drop guides with RS3 score less than 0.5 (--min_rs3 0.5) before off-target scoring

In [9]:
!head -n 10001 osa1_r7.asm.repeat_masked_gRNA/osa1_r7.asm.repeat_masked_gRNA.bed > osa1_r7.asm.repeat_masked_gRNA/10K_guides.bed

In [10]:
run_in_crisprware('\
                    score_guides -b osa1_r7.asm.repeat_masked_gRNA/10K_guides.bed \
                    --threads 2 \
                    --min_rs3 0.5 \
                    --guidescan2_indices osa1_r7.asm.repeat_masked_gscan2/osa1_r7.asm.repeat_masked_gscan2 \
                    --tracr Chen2013 \
                  ')


	Before dropping duplicates:	10000
	After dropping duplicates:	9370


	Beginning RS3 cleavage scoring
	If memory constrained reduce '--chunk_size'

Calculating sequence-based features
100% 9370/9370 [00:33<00:00, 280.38it/s]

	After dropping RS3 cleavage scores below 0.5:	1291


	Beginning Guidescan2 specificity scoring against osa1_r7.asm.repeat_masked_gscan2/osa1_r7.asm.repeat_masked_gscan2
	If memory constrained reduce '--chunk_size'

input:/content/crisprware/osa1_r7/10K_guides_scoredgRNA/tmp/osa1_r7.asm.repeat_masked_gscan2Input.1.csv

	Saved Guidescan input file to /content/crisprware/osa1_r7/10K_guides_scoredgRNA/tmp/osa1_r7.asm.repeat_masked_gscan2Input.1.csv

[2024-10-09 00:25:50.977] [guidescan2] [[32minfo[m] Loading genome index at "osa1_r7.asm.repeat_masked_gscan2/osa1_r7.asm.repeat_masked_gscan2".
[2024-10-09 00:25:53.028] [guidescan2] [[32minfo[m] Successfully loaded genome index.
[2024-10-09 00:25:53.028] [guidescan2] [[32minfo[m] Loading kmers.
[2024-10-09 00:25:

Let's check the scored output:


In [11]:
!head 10K_guides_scoredgRNA/10K_guides_scoredgRNA.bed

#chr	start	stop	context	strand	sequence	RS3_score_Chen2013	specificity_osa1_r7.asm.repeat_masked_gscan2
Chr1	1002474	1002494	ATGGGACATGCACTGGTAACCGAGAGGCAC	+	GACATGCACTGGTAACCGAG	0.8011	1.0
Chr1	1003076	1003096	GTACATGTGGCGGCCCATTATGGAAGGTGC	+	ATGTGGCGGCCCATTATGGA	0.905	1.0
Chr1	1005117	1005137	GCAAAAGGTCAGGAGCAGGAGTACAGGATG	+	AAGGTCAGGAGCAGGAGTAC	1.0079	1.0
Chr1	1005161	1005181	TTACACGAGACAGAGCTATTCATAAGGGTA	+	ACGAGACAGAGCTATTCATA	0.7997	1.0
Chr1	1010347	1010367	GTAAGGAGTAGTGAGACCATGGGGAGGGAT	+	GGAGTAGTGAGACCATGGGG	0.7008	0.5403
Chr1	1010381	1010401	CAGCTGCAGCAGTGATGCATGAGAAGGTGA	+	TGCAGCAGTGATGCATGAGA	0.5768	0.7756
Chr1	1010397	1010417	GCATGAGAAGGTGAAGCTGTTCATAGGAGT	+	GAGAAGGTGAAGCTGTTCAT	0.6933	1.0
Chr1	1010837	1010857	ATTGTTCAGGATAACTGCAAACCAAGGTTT	+	TTCAGGATAACTGCAAACCA	0.5856	1.0
Chr1	1011086	1011106	ATGCAGGCTTGAGCAAGTAGACCTAGGCAA	+	AGGCTTGAGCAAGTAGACCT	1.0078	1.0


#### Finally, we can rank the guides by RS3_score



In [12]:
run_in_crisprware('\
                rank_guides \
                --scored_guides 10K_guides_scoredgRNA/10K_guides_scoredgRNA.bed \
                --targets osa1_r7.all_models/osa1_r7.all_models_longestCDS.gtf \
                --feature CDS \
                --ranking_columns RS3_score_Chen2013 \
                ')



	Column weights not set, setting weights to 1

	osa1_r7.all_models/osa1_r7.all_models_longestCDS.gtf is GTF format
	Processing: 	osa1_r7.all_models/osa1_r7.all_models_longestCDS.gtf 
	Feature: 	CDS 
	Percentile range: 	[0, 100]

	Initial gene count:	55986


	Prior to positional filtering: 

#	Median number of gRNAs per target: 0.0
#	Number of targets with 0 gRNA guides: 55682
#
#	Calculations exluding targets with 0 counts:
#
#		Median number of gRNAs per target: 2.0
#		Minimum number of gRNAs per target: 1
#		Maximum number of gRNAs per target: 20
#
#######################################################################

	Processing: 	osa1_r7.all_models/osa1_r7.all_models_longestCDS.gtf 
	Feature: 	CDS 
	Percentile range: 	[0, 100]


	Prior to filtering: 

#	Median number of gRNAs per target: 0.0
#	Number of targets with 0 gRNA guides: 55683
#
#	Calculations exluding targets with 0 counts:
#
#		Median number of gRNAs per target: 2.0
#		Minimum number of gRNAs per target: 1
#		Maximu

In [None]:
!head -20 10K_guides_rankedgRNA/10K_guides_rankedgRNA.bed

#chr	start	stop	strand	sequence	RS3_score_Chen2013	specificity_osa1_r7.asm.repeat_masked_gscan2	target_id	RS3_score_Chen2013_normalized	combined_weighted
Chr1	5514	5534	+	TGAGTAGTACCTCAGAGTAT	0.9508	1.0	LOC_Os01g01010	0.4949	0.4949
Chr1	10165	10185	+	GAGGAAGAAGTATATTTACA	0.7991	0.9561	LOC_Os01g01010	0.3283	0.3283
Chr1	5517	5537	+	GTAGTACCTCAGAGTATAGG	0.62	1.0	LOC_Os01g01010	0.1316	0.1316
Chr1	4398	4418	+	GATAATGATGGAAAGGTCAT	0.5564	1.0	LOC_Os01g01010	0.0617	0.0617
Chr1	12907	12927	+	GGGCGGAGTGAAGAAGCAGG	0.8379	0.8203	LOC_Os01g01030	0.3709	0.3709
Chr1	14167	14187	+	ATTATCAATGGCACCTACAA	0.7421	0.9512	LOC_Os01g01030	0.2657	0.2657
Chr1	13791	13811	+	TCAACCAAGCAAGATCAATC	0.7158	0.6629	LOC_Os01g01030	0.2368	0.2368
Chr1	12904	12924	+	GCTGGGCGGAGTGAAGAAGC	0.6611	1.0	LOC_Os01g01030	0.1767	0.1767
Chr1	24555	24575	+	GTAGTGAAGAGGAAAGATAC	1.2892	1.0	LOC_Os01g01050	0.8667	0.8667
Chr1	25999	26019	+	GGACAGCAGTGGCAAAACCT	0.9228	1.0	LOC_Os01g01050	0.4642	0.4642
Chr1	26164	26184	+	CAGGTGGCAAGGAAGATAGC	0.

Notice the combined_weighted and RS3_score_Chen2013_normalized columns have the same value because a single --ranking_column was passed.