# **Installation and Setup**

---

Run this to set up CRISPRware environment in Colab



In [1]:
!mkdir -p /root/.mamba/pkgs
!chmod -R 777 /root/.mamba
!wget -qO- https://micromamba.snakepit.net/api/micromamba/linux-64/latest | tar -xvj bin/micromamba

bin/micromamba


In [2]:
!git clone https://github.com/ericmalekos/crisprware crisprware
%cd crisprware

Cloning into 'crisprware'...
remote: Enumerating objects: 590, done.[K
remote: Counting objects: 100% (117/117), done.[K
remote: Compressing objects: 100% (81/81), done.[K
remote: Total 590 (delta 74), reused 75 (delta 35), pack-reused 473 (from 1)[K
Receiving objects: 100% (590/590), 107.75 MiB | 15.06 MiB/s, done.
Resolving deltas: 100% (275/275), done.
Updating files: 100% (121/121), done.
/content/crisprware


In [3]:
!/content/bin/micromamba env create -f environment.yml -n crisprware --root-prefix /content/micromamba --quiet -y
!/content/bin/micromamba run -n crisprware --root-prefix /content/micromamba pip install .

    Be aware that packages installed with 'pip' are managed independently from 'conda-forge' channel.
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m30.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.2/6.2 MB[0m [31m64.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.7/12.7 MB[0m [31m71.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m39.9/39.9 MB[0m [31m28.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.9/12.9 MB[0m [31m100.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m40.8/40.8 MB[0m [31m44.3 MB/s[0m eta [36m0:00:00[0m
[?25hProcessing /content/crisprware
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: crisprware
  Building wheel for crisprware (setup.py) ...

In [4]:
# A helper function to run commands in the crisprware environment
# Wrap each module in this command before running
# This is only required in the Colab environment
def run_in_crisprware(command):
  !/content/bin/micromamba run -n crisprware --root-prefix /content/micromamba {command}

# Tutorial

---



## Preprocess annotation
Use this module to reduce the complexity of genes with multiple isoforms.

There are four simplified gene model options:


*   longest
*   shortest
*   metagene
*   consensus *italicized text*

**longest** and **shortest** will take the isoform with the longest or shortest coding sequence (CDS).    
**metagene** creates a new gene model by taking the *union* of CDSs of all isoforms for a given gene.  
**consensus** creates a new gene model by taking the *intersection* of CDSs for all isoforms for a given gene.  Unlike the other models this can result in a gene model failing to be constructed if two isoforms have no overlap of CDS sequences. These genes are saved in a file called `genes_without_consensus_model.txt`

In [5]:
run_in_crisprware(\
                  'preprocess_annotation --gtf ./tests/test_data/ce11/chrIII_ce11.ncbiRefSeq.gtf \
                  --model metagene consensus longest shortest'\
                  )




	Saving longest CDS GTF to: /content/crisprware/chrIII_ce11.ncbiRefSeq/chrIII_ce11.ncbiRefSeq_longestCDS.gtf
	Saving shortest CDS GTF to: /content/crisprware/chrIII_ce11.ncbiRefSeq/chrIII_ce11.ncbiRefSeq_shortestCDS.gtf
	Saving metagene GTF to: /content/crisprware/chrIII_ce11.ncbiRefSeq/chrIII_ce11.ncbiRefSeq_meta.gtf
	Saving consensus GTF to: /content/crisprware/chrIII_ce11.ncbiRefSeq/chrIII_ce11.ncbiRefSeq_consensus.gtf

	A CONSENSUS MODEL COULD NOT BE GENERATED FOR 22 GENES
	If this number is large, consider filtering by TPM expression more strictly or using a more conservative GTF.
	If this number is small, consider manually removing problematic transcripts from the quantification TSVs and rerunning this module.
	Saving genes for which there is no consensus model to:	/content/crisprware/chrIII_ce11.ncbiRefSeq/chrIII_ce11.ncbiRefSeq_genes_without_consensus_model.txt




This generates all four gene model GTFs for use:

In [6]:
%cd chrIII_ce11.ncbiRefSeq
!ls *.gtf

/content/crisprware/chrIII_ce11.ncbiRefSeq
chrIII_ce11.ncbiRefSeq_consensus.gtf   chrIII_ce11.ncbiRefSeq_meta.gtf
chrIII_ce11.ncbiRefSeq_longestCDS.gtf  chrIII_ce11.ncbiRefSeq_shortestCDS.gtf


We can check which genes did not have consensus models:

In [7]:
!head -5 chrIII_ce11.ncbiRefSeq_genes_without_consensus_model.txt

nono-1
let-805
F25B5.3
hecw-1
pqn-41


If we want to use consensus model genes for downstream analysis and include the metagene model for those genes without a consensus we can do so easily

In [8]:
!grep -Ff chrIII_ce11.ncbiRefSeq_genes_without_consensus_model.txt chrIII_ce11.ncbiRefSeq_meta.gtf | \
cat chrIII_ce11.ncbiRefSeq_consensus.gtf - > consensus_metagene.gtf
%cd ..

/content/crisprware


### Processed RNASeq with preprocess_annotation

In order to further reduce complexity and limit gRNA selection to expressed transcripts, we can make use of transcript per million (TPM) data from the following tools

*   Salmon
*   Kallisto
*   FLAIR
*   Mandalorian

With default parameters all isoforms with TPM > 0 are retained. Use `--top_n <int>` to select only the top expressed isoforms per gene. By default the command infers the file type from the header line of the first input file (all input files should be from the same tool).

This command will use use Salmon `quant.sf` files from three replicates to determine the `min`, `max`, `median`, and `mean` for each isoform. Then take the top two most highly expressed isoform for each gene `--top_n 2` according to the median transcript expression `--top_n_column median`, then create a metagene model `--model metagene` from those two isoforms (order of arguments doesn't matter, the execution is always in the stated order).

This example uses the ncbiRefSeq GTF subsetted to chromsome 19 of Mm39.

In [9]:
run_in_crisprware('\
                  preprocess_annotation --tpm_files ./tests/test_data/processed_rna_seq/salmon/chr19*/*quant.sf \
                  --gtf ./tests/test_data/chr19_ucsc_mm39.ncbiRefSeq.gtf --top_n 2 \
                  --top_n_column median --model metagene'\
                  )



	Processing isoform quantification files

	Removing transcripts below threshold

	Inferring file type from header line

		./tests/test_data/processed_rna_seq/salmon/chr19_ucscRefSeq.comprehensive.index_trimmed_WT_Rep1_SalmonQuant/chr19_quant.sf is a Salmon file

	Initial unique transcripts:			3920
	Transcripts after filtering by expression:	1091

	Generating transcript-gene relationships

	Saving transcript-gene relationships to:	/content/crisprware/chr19_ucsc_mm39.ncbiRefSeq/tmp/tx2gene.tsv
	Retaining top 2 transcripts per gene
	Final unique genes:		566
	Final unique transcripts:	830
	Saving quantification file to:		/content/crisprware/chr19_ucsc_mm39.ncbiRefSeq/tmp/filtered_chr19_ucsc_mm39.ncbiRefSeq.tsv
	Saving transcript filtered GTF to:	/content/crisprware/chr19_ucsc_mm39.ncbiRefSeq/chr19_ucsc_mm39.ncbiRefSeq_filtered.gtf
	Saving metagene GTF to: /content/crisprware/chr19_ucsc_mm39.ncbiRefSeq/chr19_ucsc_mm39.ncbiRefSeq_meta.gtf




  The calculated TPM values are saved in this "filtered" TSV



In [10]:
!head -5 /content/crisprware/chr19_ucsc_mm39.ncbiRefSeq/tmp/filtered_chr19_ucsc_mm39.ncbiRefSeq.tsv | cut -f1,2,6,7,8,9

gene_id	transcript_id	tscript_min	tscript_max	tscript_median	tscript_mean
1110059E24Rik	NM_025423.2	28.109	29.568	28.358	28.678
1700018L02Rik	NR_028360.1	0.078	0.166	0.1	0.115
1700023D09Rik	NR_132423.1	0.302	0.745	0.703	0.583
1810009A15Rik	NM_025463.3	86.062	89.822	87.896	87.927


### Targetting gene TSS and/or TES

If you are interested in CRISPR targetting of gene transcription start sites (TSSs) or transcript end sites (TESs), you can set a window to generate BED coordinate files for downstream use. The first position in `--tss_window <int 1> <int 2>` is always the upstream distance from the TSS/TES and the second is the downstream. Strand orientation is automatically incorporated.

In [11]:
run_in_crisprware('\
                  preprocess_annotation --tpm_files ./tests/test_data/processed_rna_seq/salmon/chr19*/*quant.sf \
                  --gtf ./tests/test_data/chr19_ucsc_mm39.ncbiRefSeq.gtf --top_n 10 \
                  --tss_window 200 150 --tes_window 100 150' \
                  )



	Processing isoform quantification files

	Removing transcripts below threshold

	Inferring file type from header line

		./tests/test_data/processed_rna_seq/salmon/chr19_ucscRefSeq.comprehensive.index_trimmed_WT_Rep1_SalmonQuant/chr19_quant.sf is a Salmon file

	Initial unique transcripts:			3920
	Transcripts after filtering by expression:	1091

	Generating transcript-gene relationships

	Saving transcript-gene relationships to:	/content/crisprware/chr19_ucsc_mm39.ncbiRefSeq/tmp/tx2gene.tsv
	Retaining top 10 transcripts per gene
	Final unique genes:		566
	Final unique transcripts:	1091
	Saving quantification file to:		/content/crisprware/chr19_ucsc_mm39.ncbiRefSeq/tmp/filtered_chr19_ucsc_mm39.ncbiRefSeq.tsv
	Saving transcript filtered GTF to:	/content/crisprware/chr19_ucsc_mm39.ncbiRefSeq/chr19_ucsc_mm39.ncbiRefSeq_filtered.gtf

	Saving TSS:	/content/crisprware/chr19_ucsc_mm39.ncbiRefSeq/TSS_filtered.bed
	Saving TES:	/content/crisprware/chr19_ucsc_mm39.ncbiRefSeq/TES_filtered.bed




BED files are produced for both the filtered transcriptts and metagene model in this case. Remember that the metagene model TSS will be the most upstream TSS of whatever isoforms remain after filtering and consider whether this TSS is the best target. For expression knockdown it may be better to simply target the filtered isoforms as transcription from all such TSSs is supported.

## Index genome

Use this module to build the Guidescan2 off-target index. We can create a small index using only chromosome III from the ce11 genome.

In [12]:
run_in_crisprware('index_genome --fasta tests/test_data/ce11/chrIII_sequence.fasta')

Attempting to read raw sequence file (if constructed)...
No raw sequence file "tests/test_data/ce11/chrIII_sequence.fasta.forward.dna". Building now...
No raw sequence file "tests/test_data/ce11/chrIII_sequence.fasta.reverse.dna". Building now...
[2024-10-09 00:30:11.858] [guidescan2] [[32minfo[m] Constructing genomic index.
[2024-10-09 00:30:11.875] [guidescan2] [[32minfo[m] Constructing forward genomic index.
[2024-10-09 00:30:14.862] [guidescan2] [[32minfo[m] Constructing reverse genomic index.
[2024-10-09 00:30:17.857] [guidescan2] [[32minfo[m] Index construction complete.
	Removing file: tests/test_data/ce11/chrIII_sequence.fasta.forward.dna
	Removing file: tests/test_data/ce11/chrIII_sequence.fasta.reverse.dna


In [13]:
!ls ./chrIII_sequence_gscan2

chrIII_sequence_gscan2.forward	chrIII_sequence_gscan2.gs  chrIII_sequence_gscan2.reverse


For larger genomes the index build step requires substantial RAM which may be limiting on some computers. Precompiled indices for some species are available on the [Guidescan2 website](https://guidescan.com/downloads) and can be downloaded, e.g.  
`wget https://guidescan.com/indices/ce11.zip && unzip ce11.zip`

### Active genome indexing

Creating an index of the "active genome" is simple and amounts to limiting the off-target search space to predefined portions of the genome, thus increasing the relative number of gRNAs that will be considered uniquely targeting. This is done by subsetting the genome with the use of BED and/or GTF files. Any number of "active genomes" can be generated and used in later steps ranking steps. For example now we will created two indicies, one from the the coding sequences of expressed genes, the other from the entire gene body from our preprocessed annotations:

In [14]:
run_in_crisprware('index_genome -f tests/test_data/Mm39_chr19/chr19_GRCm39.primary_assembly.genome.fa.gz \
                  --locations_to_keep ./chr19_ucsc_mm39.ncbiRefSeq/chr19_ucsc_mm39.ncbiRefSeq_filtered.gtf \
                  --feature CDS')
# rename CDS files
%cd chr19_GRCm39.primary_assembly.genome_gscan2/
!for file in chr19_GRCm39.primary_assembly.genome_*; do newfile=$(echo "$file" | sed 's/chr19_GRCm39\.primary_assembly\.genome/CDS/'); mv $file $newfile; done
%cd ..

run_in_crisprware('index_genome -f tests/test_data/Mm39_chr19/chr19_GRCm39.primary_assembly.genome.fa.gz \
                  --locations_to_keep ./chr19_ucsc_mm39.ncbiRefSeq/chr19_ucsc_mm39.ncbiRefSeq_meta.gtf\
                  --feature transcript')
# rename CDS files
%cd chr19_GRCm39.primary_assembly.genome_gscan2/
!for file in chr19_GRCm39.primary_assembly.genome_*; do newfile=$(echo "$file" | sed 's/chr19_GRCm39\.primary_assembly\.genome/meta/'); mv $file $newfile; done
%cd ..

	Unzipping tests/test_data/Mm39_chr19/chr19_GRCm39.primary_assembly.genome.fa.gz
	Unzipped file saved as tests/test_data/Mm39_chr19/chr19_GRCm39.primary_assembly.genome.fa

	Saving merged interval bed to /content/crisprware/chr19_GRCm39.primary_assembly.genome_gscan2/chr19_GRCm39.primary_assembly.genome_gscan2_merged.bed

	Saving subset fasta to /content/crisprware/chr19_GRCm39.primary_assembly.genome_gscan2/chr19_GRCm39.primary_assembly.genome_gscan2_subset.fasta
	Building Index from /content/crisprware/chr19_GRCm39.primary_assembly.genome_gscan2/chr19_GRCm39.primary_assembly.genome_gscan2_subset.fasta
	Saving Index to /content/crisprware/chr19_GRCm39.primary_assembly.genome_gscan2/chr19_GRCm39.primary_assembly.genome_gscan2
Attempting to read raw sequence file (if constructed)...
No raw sequence file "/content/crisprware/chr19_GRCm39.primary_assembly.genome_gscan2/chr19_GRCm39.primary_assembly.genome_gscan2_subset.fasta.forward.dna". Building now...
No raw sequence file "/content/cri

Now we have two off-target indices, one composed of CDS entries, the other of metagene transcripts (exons and introns from all metagene models). We will make use of these in the scoring guides section.

In [15]:
%cd chr19_GRCm39.primary_assembly.genome_gscan2/
!ls
%cd ..

/content/crisprware/chr19_GRCm39.primary_assembly.genome_gscan2
CDS_gscan2.forward  CDS_gscan2_merged.bed  meta_gscan2.forward	meta_gscan2_merged.bed
CDS_gscan2.gs	    CDS_gscan2.reverse	   meta_gscan2.gs	meta_gscan2.reverse
/content/crisprware


## Generate gRNAs

Continuing with the mouse chromosome 19 and annotation, let's find all NGG gRNAs in coding sequences

In [16]:
run_in_crisprware('\
                  generate_guides --fasta ./tests/test_data/Mm39_chr19/chr19_GRCm39.primary_assembly.genome.fa.gz \
                  --locations_to_keep ./tests/test_data/chr19_ucsc_mm39.ncbiRefSeq.gtf \
                  --feature CDS \
                  ')

	Unzipping ./tests/test_data/Mm39_chr19/chr19_GRCm39.primary_assembly.genome.fa.gz
	Unzipped file saved as ./tests/test_data/Mm39_chr19/chr19_GRCm39.primary_assembly.genome.fa

	Chromosomes for which to find targets:	chr19
	Processing chr19

	Saved output file to /content/crisprware/chr19_GRCm39.primary_assembly.genome_gRNA/chr19_GRCm39.primary_assembly.genome_gRNA.bed

	Removing file: ./tests/test_data/Mm39_chr19/chr19_GRCm39.primary_assembly.genome.fa


This finds all NGG protospacers and outputs a BED/TSV. The fourth column will be used for Guidedscan2 scoring. By default the "context" column is formatted as 30 nucleotides centered on protospacer sequence for Ruleset 3 scoring.

In [17]:
!head -6 chr19_GRCm39.primary_assembly.genome_gRNA/chr19_GRCm39.primary_assembly.genome_gRNA.bed

#chr	start	stop	id,sequence,pam,chromosome,position,sense	context	strand
chr19	3311533	3311533	chr19:3311518:+,GGGCCCTCTTGGCCGGGTCC,NGG,chr19,3311518,+	AGCTGGGCCCTCTTGGCCGGGTCCAGGGCC	+
chr19	3311989	3311989	chr19:3311974:+,CTTCAGGCAGATGGTGGCTG,NGG,chr19,3311974,+	CTCACTTCAGGCAGATGGTGGCTGAGGCAG	+
chr19	3312065	3312065	chr19:3312050:+,CGAGCACTTGGAGAAGCTAC,NGG,chr19,3312050,+	TGGCCGAGCACTTGGAGAAGCTACAGGTGT	+
chr19	3312097	3312097	chr19:3312082:+,CCTTCACCACAGCAGACACC,NGG,chr19,3312082,+	TCAGCCTTCACCACAGCAGACACCAGGGCG	+
chr19	3312119	3312119	chr19:3312104:+,GGCGTCGAAGTCCTCCTCAC,NGG,chr19,3312104,+	CCAGGGCGTCGAAGTCCTCCTCACAGGGCA	+


## Score gRNAs

Now we can score the gRNAs. First we will take the top 1,000 gRNAs to save time and we will score with both Ruleset3 tracr RNA options `--tracr both`. Then we will calculate specificty scores against the two Guidescan2 indices we created earlier.

In [18]:
!head -1001 chr19_GRCm39.primary_assembly.genome_gRNA/chr19_GRCm39.primary_assembly.genome_gRNA.bed > 1k_chr19.bed
run_in_crisprware('\
                  score_guides --grna_bed 1k_chr19.bed \
                  --guidescan2_indices \
                  chr19_GRCm39.primary_assembly.genome_gscan2/CDS_gscan2 \
                  chr19_GRCm39.primary_assembly.genome_gscan2/meta_gscan2 \
                  --tracr both \
                  --threads 2 \
                  ')


	Before dropping duplicates:	1000
	After dropping duplicates:	930


	Beginning RS3 cleavage scoring
	If memory constrained reduce '--chunk_size'

Calculating sequence-based features
100% 930/930 [00:04<00:00, 232.04it/s]
Calculating sequence-based features
100% 930/930 [00:04<00:00, 210.02it/s]

	After dropping RS3 cleavage scores below -inf:	930


	Beginning Guidescan2 specificity scoring against chr19_GRCm39.primary_assembly.genome_gscan2/CDS_gscan2
	If memory constrained reduce '--chunk_size'

input:/content/crisprware/1k_chr19_scoredgRNA/tmp/CDS_gscan2Input.1.csv

	Saved Guidescan input file to /content/crisprware/1k_chr19_scoredgRNA/tmp/CDS_gscan2Input.1.csv

[2024-10-09 00:32:46.690] [guidescan2] [[32minfo[m] Loading genome index at "chr19_GRCm39.primary_assembly.genome_gscan2/CDS_gscan2".
[2024-10-09 00:32:46.697] [guidescan2] [[32minfo[m] Successfully loaded genome index.
[2024-10-09 00:32:46.697] [guidescan2] [[32minfo[m] Loading kmers.
[2024-10-09 00:32:46.697] [guides

We can check the score results:

In [19]:
!head -6 1k_chr19_scoredgRNA/1k_chr19_scoredgRNA.bed

#chr	start	stop	context	strand	sequence	RS3_score_Hsu2013	RS3_score_Chen2013	specificity_CDS_gscan2	specificity_meta_gscan2
chr19	3311533	3311533	AGCTGGGCCCTCTTGGCCGGGTCCAGGGCC	+	GGGCCCTCTTGGCCGGGTCC	-1.9767	-2.0397	1.0	1.0
chr19	3311989	3311989	CTCACTTCAGGCAGATGGTGGCTGAGGCAG	+	CTTCAGGCAGATGGTGGCTG	0.1902	-0.3443	1.0	1.0
chr19	3312065	3312065	TGGCCGAGCACTTGGAGAAGCTACAGGTGT	+	CGAGCACTTGGAGAAGCTAC	-0.2774	0.0063	1.0	1.0
chr19	3312097	3312097	TCAGCCTTCACCACAGCAGACACCAGGGCG	+	CCTTCACCACAGCAGACACC	0.41	0.3476	1.0	0.8922
chr19	3312119	3312119	CCAGGGCGTCGAAGTCCTCCTCACAGGGCA	+	GGCGTCGAAGTCCTCCTCAC	0.2062	0.4987	1.0	1.0


## Rank gRNAs

The final step is filtering and ranking gRNAs and matching them to their gene or transcript target.  
 - First we will filter out gRNAs that are not within the 5th-65th percentile of a coding sequence for a gene model in the metagene GTF: `--targets chr19_ucsc_mm39.ncbiRefSeq/chr19_ucsc_mm39.ncbiRefSeq_meta.gtf --feature CDS --percentile_range 5 65`  
 - Then we will filter out any gRNAs with scores less than 0.0 for either RS3 score method `--filtering_columns RS3_score_Chen2013 RS3_score_Chen2013 --minimum_values 0.0 0.0`.  
 - Finally, for each gene, we rank the remaining gRNAs according to the specificity score against the metagene_guidescane index `--ranking_columns specificity_meta_gscan2`, and select the top two `--number_of_guides 2`.

In [20]:
run_in_crisprware('\
                rank_guides \
                --scored_guides 1k_chr19_scoredgRNA/1k_chr19_scoredgRNA.bed \
                --targets chr19_ucsc_mm39.ncbiRefSeq/chr19_ucsc_mm39.ncbiRefSeq_meta.gtf \
                --feature CDS \
                --percentile_range 5 65 \
                --filtering_columns RS3_score_Chen2013 RS3_score_Chen2013 \
                --minimum_values 0.0 0.0 \
                --ranking_columns specificity_meta_gscan2 \
                --number_of_guides 2 \
                --output_all \
                ')



	Column weights not set, setting weights to 1

	chr19_ucsc_mm39.ncbiRefSeq/chr19_ucsc_mm39.ncbiRefSeq_meta.gtf is GTF format
	Processing: 	chr19_ucsc_mm39.ncbiRefSeq/chr19_ucsc_mm39.ncbiRefSeq_meta.gtf 
	Feature: 	CDS 
	Percentile range: 	[0, 100]

	Initial gene count:	442


	Prior to positional filtering: 

#	Median number of gRNAs per target: 0.0
#	Number of targets with 0 gRNA guides: 424
#
#	Calculations exluding targets with 0 counts:
#
#		Median number of gRNAs per target: 33.5
#		Minimum number of gRNAs per target: 2
#		Maximum number of gRNAs per target: 130
#
#######################################################################

	Processing: 	chr19_ucsc_mm39.ncbiRefSeq/chr19_ucsc_mm39.ncbiRefSeq_meta.gtf 
	Feature: 	CDS 
	Percentile range: 	[5, 65]

	Number of CDS entries before processing: 4650

	Number of CDS entries after processing: 3039


	Prior to filtering: 

#	Median number of gRNAs per target: 0.0
#	Number of targets with 0 gRNA guides: 425
#
#	Calculations exludi

Any score columns, or any combination of score columns, can be used for filtering and ranking.  
Then final output

In [21]:
!head -5 1k_chr19_rankedgRNA/1k_chr19_rankedgRNA.bed

#chr	start	stop	strand	sequence	RS3_score_Hsu2013	RS3_score_Chen2013	specificity_CDS_gscan2	specificity_meta_gscan2	target_id	specificity_meta_gscan2_normalized	combined_weighted
chr19	3766551	3766551	+	TGCTGAGGGACACAGTTCCA	0.2244	0.3005	1.0	1.0	1810055G02Rik	1.0	1.0
chr19	3766705	3766705	+	GAGCACTCCAACAACCAGAG	0.9921	0.4434	1.0	1.0	1810055G02Rik	1.0	1.0
chr19	3968702	3968702	+	TCTGGTTGATGATTCTGCCC	0.5537	0.3106	1.0	1.0	Aldh3b1	1.0	1.0
chr19	3968756	3968756	+	AACGTGTGATGGCATTCTGT	0.5134	0.7794	1.0	1.0	Aldh3b1	1.0	1.0
