In [3]:
import guido

# Create guido genome
genome = guido.Genome(genome_name='AgamP4_VB', 
                      genome_file_abspath='/Users/nkranjc/imperial/guide_tool/guido/tests/data/AgamP4.fa',
                      annotation_file_abspath='/Users/nkranjc/imperial/guide_tool/guido/tests/data/AgamP4.12.gtf')


In [3]:
genome.build(bowtie_path='/Users/nkranjc/imperial/guide_tool/guido/bin/bowtie/')

Indexing genome annotation.
Building Bowtie index
Done: /Users/nkranjc/imperial/guide_tool/guido/tests/data/AgamP4_VB
AgamP4_VB genome data can now be used by Guido: /Users/nkranjc/imperial/guide_tool/guido/tests/data/AgamP4_VB.guido


Guido genome file is now created in the same location as referenced genome FASTA file.

In [4]:
genome.__dict__

{'genome_name': 'AgamP4_VB',
 '_bowtie_ignore': False,
 'bowtie_index': None,
 'genome_file_abspath': PosixPath('/Users/nkranjc/imperial/guide_tool/guido/tests/data/AgamP4.fa'),
 'annotation_file_abspath': PosixPath('/Users/nkranjc/imperial/guide_tool/guido/tests/data/AgamP4.12.gtf')}

`genome` can be used now to search for gRNAs. We can also use it next time without needing to build the genome again by calling `load_genome_from_file()` function.

In [5]:
import guido
genome = guido.load_genome_from_file(guido_file='/Users/nkranjc/imperial/ref/new/AgamP4.guido')

To search gRNAs a locus needs to be defined either by chromosomal location or by gene name.

In [6]:
l = guido.locus_from_coordinates(genome, 'AgamP4_2R', 48714541, 48714666)
l.find_guides()

[gRNA-1(AAGTTTATCATCCACTCTGACGG|AgamP4_2R:48714550-48714572|+|),
 gRNA-2(CGCAATACCACCCGTCAGAGTGG|AgamP4_2R:48714561-48714583|-|),
 gRNA-3(AGTTTATCATCCACTCTGACGGG|AgamP4_2R:48714551-48714573|+|),
 gRNA-4(TTATCATCCACTCTGACGGGTGG|AgamP4_2R:48714554-48714576|+|),
 gRNA-5(TCTGAACATGTTTGATGGCGTGG|AgamP4_2R:48714589-48714611|-|),
 gRNA-6(CATAATCTGAACATGTTTGATGG|AgamP4_2R:48714594-48714616|-|),
 gRNA-7(GTTTAACACAGGTCAAGCGGTGG|AgamP4_2R:48714637-48714659|-|),
 gRNA-8(TATGTTTAACACAGGTCAAGCGG|AgamP4_2R:48714640-48714662|-|)]

In [7]:
l.find_guides(selected_features='exon')

[gRNA-1(AAGTTTATCATCCACTCTGACGG|AgamP4_2R:48714550-48714572|+|),
 gRNA-2(CGCAATACCACCCGTCAGAGTGG|AgamP4_2R:48714561-48714583|-|),
 gRNA-3(AGTTTATCATCCACTCTGACGGG|AgamP4_2R:48714551-48714573|+|),
 gRNA-4(TTATCATCCACTCTGACGGGTGG|AgamP4_2R:48714554-48714576|+|),
 gRNA-5(TCTGAACATGTTTGATGGCGTGG|AgamP4_2R:48714589-48714611|-|),
 gRNA-6(CATAATCTGAACATGTTTGATGG|AgamP4_2R:48714594-48714616|-|)]

In [None]:
l.find_guides()

To define locus by gene name we can use `guido.locus_from_gene()` function.

In [9]:
l = guido.locus_from_gene(genome, 'AGAP005958')

l.sequence

>AgamP4_2L:24049834-24051517
TCCAGTCCAAGGTAGTCAGTATCACAGAATCCACCAGGCTCTGTGAGCCACGCGTGGGCACGGGGTGGTCAGGTTTGTGTCTCAGTGTCAGTGTAGTCGTGTAGTCAGTAGTGCGTCAGTCCCTCCCCGATGAGAGATGCTTAGTAACAATCATCGTCACACTCCTTGCAAACGTACTTCAACCGGGACGAGGCGCCACCAACGCCGCACGTGTTGTTGAGAAAATCGCTCAAGTTCGACGACAGCGACAGTAGATGCTCGTTCTTGCCCACGACCACGTGCGTCGACAGCCGGCACTCGTCCGCGTTCTGAATCAGCTCAAAGTTACACTCCCTGTACGGCTGATCGCTGTCCCAGAGGTACAGCTCGGCCTCGCCACGGTACCGCAGCACCACCCCCTTGCCACCGTCCGTGCCGAGCAGGACGGAGTGTTTGCGCTTGGTGCCGAGCTCGACGATCGCGCTCGAACCCTTGCCGCTGCGCAGATGTTCCGACTTGATCGAGTAAACCTTCTGGCCGCACAGATAGCTGAAGATCACCAGGCTCTGGCCCTTCGGTTTCGGGATGAGCAGCAGGTAGAGCACGTCCGAATCGCCGCAGCCAGCCGAGATGGCACGCGGTAGCACCACGCGGAAGCTCTTGTTGTTGTTCACGTCCAGCACAATGATTGCACCATCGCCATCGGAAATGTAGCTGCAAGTTGGGAGATCGTTAGCATCACGCACGTGATTTTACTTTGCATGCTTTACCCAACACAATACTTACACAAACGGATGGCCGAGCTCGTTGTAATCAGTGACGAGATACTGGAGCCGCGAGCTGGATTTCACAATCTCCGACAGGTCGATCGTCTTGACCGTCTTGTCGTTGCTCAAGTTGAAGGCATACACCTTCGGCGGGCAGCGCTTGATCGGTTGCTCCAGGAAGTTGGTGATGCCCGAGTCCAGCACCCACGCGATGCGCTGCAAGGA

In [10]:
l.sequence.seq

'TCCAGTCCAAGGTAGTCAGTATCACAGAATCCACCAGGCTCTGTGAGCCACGCGTGGGCACGGGGTGGTCAGGTTTGTGTCTCAGTGTCAGTGTAGTCGTGTAGTCAGTAGTGCGTCAGTCCCTCCCCGATGAGAGATGCTTAGTAACAATCATCGTCACACTCCTTGCAAACGTACTTCAACCGGGACGAGGCGCCACCAACGCCGCACGTGTTGTTGAGAAAATCGCTCAAGTTCGACGACAGCGACAGTAGATGCTCGTTCTTGCCCACGACCACGTGCGTCGACAGCCGGCACTCGTCCGCGTTCTGAATCAGCTCAAAGTTACACTCCCTGTACGGCTGATCGCTGTCCCAGAGGTACAGCTCGGCCTCGCCACGGTACCGCAGCACCACCCCCTTGCCACCGTCCGTGCCGAGCAGGACGGAGTGTTTGCGCTTGGTGCCGAGCTCGACGATCGCGCTCGAACCCTTGCCGCTGCGCAGATGTTCCGACTTGATCGAGTAAACCTTCTGGCCGCACAGATAGCTGAAGATCACCAGGCTCTGGCCCTTCGGTTTCGGGATGAGCAGCAGGTAGAGCACGTCCGAATCGCCGCAGCCAGCCGAGATGGCACGCGGTAGCACCACGCGGAAGCTCTTGTTGTTGTTCACGTCCAGCACAATGATTGCACCATCGCCATCGGAAATGTAGCTGCAAGTTGGGAGATCGTTAGCATCACGCACGTGATTTTACTTTGCATGCTTTACCCAACACAATACTTACACAAACGGATGGCCGAGCTCGTTGTAATCAGTGACGAGATACTGGAGCCGCGAGCTGGATTTCACAATCTCCGACAGGTCGATCGTCTTGACCGTCTTGTCGTTGCTCAAGTTGAAGGCATACACCTTCGGCGGGCAGCGCTTGATCGGTTGCTCCAGGAAGTTGGTGATGCCCGAGTCCAGCACCCACGCGATGCGCTGCAAGGAAAGAGACGACACGTTAGCATGTTTTCAT

Guido can simulate end-joining and predict MMEJ deletion profiles, which can be leveraged when planning a CRISPR-Cas9 experiment to avoid in- or out-of-frame deletions.

MMEJ can be predicted for each gRNA in a locus by using `l.simulate_end_joining()`.

In [13]:
l.simulate_end_joining()

Individual gRNA's MMEJ patterns can be accessed through `mmej_patterns` property of a `Guide` object in `l.guides`.

In [14]:
l.guide(5).mmej_patterns

[{'left': 'TCCAGTCCAAGGTAGTCAGTATCACAGAATCCACCAGGCTCTGTGA-----',
  'left_seq': 'TCCAGTCCAAGGTAGTCAGTATCACAGAATCCACCAGGCTCTGTGA',
  'left_seq_position': 46,
  'right': 'GCGTGGGCACGGGGTGGTCAGGTTTGTGTCTCAGTGTCAGTGTAGTCGTGTAGTCAGTAGTGCGTCAGTCCCTCC',
  'right_seq': 'GCGTGGGCACGGGGTGGTCAGGTTTGTGTCTCAGTGTCAGTGTAGTCGTGTAGTCAGTAGTGCGTCAGTCCCTCC',
  'right_seq_position': 5,
  'pattern': 'GC',
  'pattern_len': 2,
  'pattern_score': 311.6,
  'deletion_seq': 'GCCAC',
  'frame_shift': '+'},
 {'left': 'TCCAGTCCAAGGTAGTCAGTATCACAGAATCCACCAGGCTCTGTGAGC---',
  'left_seq': 'TCCAGTCCAAGGTAGTCAGTATCACAGAATCCACCAGGCTCTGTGAGC',
  'left_seq_position': 48,
  'right': '+++++++CACGGGGTGGTCAGGTTTGTGTCTCAGTGTCAGTGTAGTCGTGTAGTCAGTAGTGCGTCAGTCCCTCC',
  'right_seq': 'CACGGGGTGGTCAGGTTTGTGTCTCAGTGTCAGTGTAGTCGTGTAGTCAGTAGTGCGTCAGTCCCTCC',
  'right_seq_position': 17,
  'pattern': 'CAC',
  'pattern_len': 3,
  'pattern_score': 303.5,
  'deletion_seq': 'CACGCGTGGG',
  'frame_shift': '+'},
 {'left': 'TCCAGTCCAAGGTAGTCAGTATC

Each predicted MMEJ deletion has a score that is based on Bae et al. 2014 scoring (https://doi.org/10.1038/nmeth.3015) and indicates the propensity for the deletion in vivo.

We can visualise predicted MMEJ deletion profiles for gRNA-6, together with microhomologous pattern, score and whether a deletion produces frame shift (+) or not (-) based on deletion length.

In [15]:
print(l.guide(5).id)

for mp in l.guide(5).mmej_patterns:
    deletion = f"{mp['left']}{mp['right']}\t\t{mp['pattern']}\t\t{round(mp['pattern_score'])}\t\t{mp['frame_shift']} ({len(mp['deletion_seq'])})"
    print(deletion)

gRNA-6
TCCAGTCCAAGGTAGTCAGTATCACAGAATCCACCAGGCTCTGTGA-----GCGTGGGCACGGGGTGGTCAGGTTTGTGTCTCAGTGTCAGTGTAGTCGTGTAGTCAGTAGTGCGTCAGTCCCTCC		GC		312		+ (5)
TCCAGTCCAAGGTAGTCAGTATCACAGAATCCACCAGGCTCTGTGAGC---+++++++CACGGGGTGGTCAGGTTTGTGTCTCAGTGTCAGTGTAGTCGTGTAGTCAGTAGTGCGTCAGTCCCTCC		CAC		304		+ (10)
TCCAGTCCAAGGTAGTCAGTATCACAGAATCCACCAGGCTCT---------++GTGGGCACGGGGTGGTCAGGTTTGTGTCTCAGTGTCAGTGTAGTCGTGTAGTCAGTAGTGCGTCAGTCCCTCC		GTG		288		+ (11)
TCCAGTCCAAGGTAGTCAGTATCACAGAATCCACCAGGCTCTGTGA-----++++++GCACGGGGTGGTCAGGTTTGTGTCTCAGTGTCAGTGTAGTCGTGTAGTCAGTAGTGCGTCAGTCCCTCC		GC		231		+ (11)
TCCAGTCCAAGGTAGTCAGTATCACAGAATCCACCA---------------+++++GGCACGGGGTGGTCAGGTTTGTGTCTCAGTGTCAGTGTAGTCGTGTAGTCAGTAGTGCGTCAGTCCCTCC		GGC		221		+ (20)


To find potential off-targets for each gRNA in silico you can use `l.find_off_targets()` function.

In [None]:
l.find_off_targets(bowtie_path='/Users/nkranjc/imperial/guide_tool/guido/bin/bowtie/')

A dictionary with a list of off-targets is returned for each gRNA. Each off-target contains information about mismatches between the off-target and gRNA sequence and the genomic location of the off-target.

To run example full analysis from start to finish, please run the script below. Please note that you have to download the conservation score from the repository where it is stored [https://github.com/nkran/AgamP4_conservation_score](https://github.com/nkran/AgamP4_conservation_score) and to set the correct path to the H5 file in the script below. Same goes for your local `bowtie` dependency path.


In [23]:
import h5py
import numpy as np

# load genome
G = guido.load_genome_from_file('/Users/nkranjc/imperial/ref/new/AgamP4.guido')

# create locus
l = guido.locus_from_gene(G, "AGAP011377")

# find guides in the exons of the gene
l.find_guides(min_flanking_length=0, selected_features={'exon'})

# simulate end joining
l.simulate_end_joining()

# get azimuth score
l.add_azimuth_score()

# get off-targets
off_targets = l.find_off_targets(bowtie_path='/Users/nkranjc/imperial/guide_tool/guido/bin/bowtie/')

# get conservation score
with h5py.File('/Users/nkranjc/imperial/conservation/data/AgamP4_conservation.h5', mode='r+') as data_h5:
    cs =          data_h5[l.chromosome.split('_')[1]]['Cs'][0,l.start-1:l.end]

# add conservation score to a layer of the locus
l.add_layer('cs', layer_data=np.array(cs))

rank = l.rank_guides(layer_names=['mmej_sum_score', 'mmej_oof_score', 'azimuth_score', 'ot_sum_score', 'ot_cfd_score_mean', 'ot_cfd_score_max', 'ot_cfd_score_sum', 'cs'],\
                      layer_is_benefit=[True, True, True, False, True, True, True, True])


Export guides to BED file and export sorted list of guides and their data to CSV

In [26]:
l.guides_to_bed('/Users/nkranjc/imperial/guide_tool/output/AGAP011377-exons-guides.bed')
l.guides_to_dataframe().sort_values('rank').to_csv('/Users/nkranjc/imperial/guide_tool/output/AGAP011377-exons-guides.csv')