Minimum Non-Reference gRNA finder
- Finds the minimum gRNA set required to target multiple alignable genes in multiple non-reference genomes
- BLAST+ suite
- Python 3
- Biopython
- Bash
- bedtools
- mafft
- Identify candidate targets in non-reference genome
- Extract user-specified reference gene(s) from a reference genome (.fasta) using GFF3 annotation (.gff)
- Sequence(s) include introns
- Optional: User may specify a protein domain (using CDD PSSM-ID) to restrict search
- CDS-only regions of user-specified reference gene(s) from a reference genome (.fasta) will be extracted and translated using GFF3 annotation
- RPS-BLAST protein sequence(s) to domain database and identify domain range(s)
- Extract user-specified reference gene(s) from a reference genome (.fasta) using GFF3 annotation (.gff) and restricted to the corresponding genomic coordinates of the domains
- BLASTn reference gene(s) against non-reference genome(s) (.fasta)
- Filter hits by minimum % identity (optional)
- Merge overlapping hits within specified distance of each other (to accommodate introns/insertions)
- Filter merged hits for minimum length and % identity
- Extract user-specified reference gene(s) from a reference genome (.fasta) using GFF3 annotation (.gff)
- Identify candidate gRNA in non-reference targets
- Restricted by user-specified PAM and gRNA length
- Screen candidate gRNA
- Eliminate candidate gRNA with off-target hits
- Mask non-reference targets in non-reference genome(s) (.fasta)
- Only regions the length of targets with 100% identity to targets will be masked
- All non-reference genomes provided will be screened simultaneously so all candidate gRNA that pass this screening test should not have off-targets in any of the non-reference genomes provided
- User may also provide sequences to check against
- BLASTn candidate gRNA against masked non-reference genome(s)
- Optional: Screen reference genome also
- Eliminate candidate gRNA that align with masked non-reference genome(s) and fail maximum match/gaps criteria
- Mask non-reference targets in non-reference genome(s) (.fasta)
- Eliminate candidate gRNA that do not align within the CDS of reference genes
- Extract CDS-only regions of user-specified reference gene(s) from a reference genome (.fasta) using GFF3 annotation (.gff)
- If the user specified a domain, the range will be restricted accordingly
- Align non-reference target sequences (output of step 1.5) with reference sequences from steps 1.1 (or 1.1.3 if domain is specified) and 2.1
- For all candidate gRNA, check their position in the alignment (based on where in each non-reference target they originate) and eliminate any gRNA without AT LEAST ONE alignment within the reference CDS regions
- Extract CDS-only regions of user-specified reference gene(s) from a reference genome (.fasta) using GFF3 annotation (.gff)
- Eliminate candidate gRNA with off-target hits
- Find minimum gRNA set that covers all target sequences
- Step 1
- Data:
- Reference genome (--ref xxx.fasta)
- Reference GFF3 annotation (--gff xxx.gff)
- Non-reference sequences/genome (--nonref xxx.fasta)
- Parameters:
- Gene IDs (--gene)
- Optional parameters:
- Minimum hit % identity (--minid 85 (%))
- Minimum candidate target length (--minlen 0 (bp))
- Maximum merge buffer (--buffer 100 (bp))
- Optional for domain restriction:
- PSSM-ID and rpsblast+ database
- Data:
- Step 2
- Parameters:
- PAM (--pam)
- gRNA length (--length)
- Parameters:
- Step 3
- Optional parameters:
- Minimum off-target gaps (--gaps 0)
- Minimum off-target mismatch (--mismatch 1 (bp))
- Optional data:
- Background sequences (--background xxx.fasta)
- Optional parameters:
- Step 4
- Optional paramters:
- Minimum set algorithm (--algo LAR)
- Optional paramters: