CLdb is toolset for organizing and analyzing large amounts of CRISPR data.
Existing webtools are fairly easy to use but suffer from some limitations:
- They can require A LOT of tedious clicking/typing for large datasets.
- ie., they don't SCALE well.
- They are not very flexible.
- Adding the tools to existing analysis workflows can be challenging.
- For instance, it is usually hard to incorporate such tools into an IPython notebook unless the webtool has a good API.
- They often limit tranparency and reproducibility of the analysis.
- Lack of tranparency and reproducibility are a major issue in bioinformatics.
- Flexible and scalable spacer Blasting
- Filter out spacer blast hits to other CRISPR arrays
- Get the protospacer of each blast hit
- This includes the adjacent PAM region
- Get the crRNA DNA template (crDNA) for each blast query
- Make protospacer-crDNA alignments
- Get summaries on protospacer-crDNA mismatches for the SEED sequence and entire protospacer
- Get the PAM regions for each hit
- Make detailed comparative plots of CRISPR systems
- The plots can include information on:
- CAS gene conservation among CRISPRs
- Spacer conservation among CRISPRs
- Location of the leader region
- Summarize your dataset quickly
- Get the number of spacers shared among:
- CRISPR loci
- CRISPR subtypes
- taxa
- Make repeat consensus sequences
- Use for making weblogos or trees
- Organize and query subsets of your CRISPR dataset
- Select by subtype, taxa, or individual CRISPR loci
- Make gff3 files of the CRISPR features
NOTE: Currently, only *nix systems are supported.
git clone https://github.com/nyoungb2/CLdb.git
cd CLdb
echo 'source '`pwd`'/sourceMe' >> ~/.profile
source ~/.profile
CLdb
should now be in your $PATH.
See this for more info
on the $PATH variable.
Also, bash command line completion should now be set up (see below).
CLdb is set up as a command-subcommand app, much like git.
Like git, tab completion can be used to view subcommands of the main command.
Bash tab-completion will allow you to list the subcommands
or sub-subcommands of CLdb. Subcommands will be listed
upon double-tabbing after '--' For example CLdb -- <tab><tab>
will bring up all of the CLdb subcommands.
Example command-subcommand: CLdb -- makeDB -h
Some subcommands (eg., arrayBlast
) have their own subcommands
(sub-subcommands). Tab completion should work similarily.
Example command-sub-subcommand: CLdb -- arrayBlast -- run -h
- Remember: tab completion for (sub)subcommands will only work after
the "--". So, with sub-subcommands:
CLdb -- arrayBlast -- <tab><tab>
Not all dependencies are needed depending on what you plan on doing with CLdb. The perl modules can be downloaded easily (hopefully) with cpanminus.
- Perl modules:
See the "conda install" line in perlpackage.yml. Make sure to activate the conda env that you create.
- Help for CLdb command:
CLdb -h
- Help for CLdb subcommands:
CLdb -- subcommand -h
ORCLdb --perldoc -- subcommand
- Example:
CLdb --perldoc -- array2fasta
- Help for CLdb subsubcommands:
CLdb -- subcommand -- subsubcommand -h
ORCLdb -- subcommand --perldoc -- subsubcommand
- NOTE:
--perldoc
flag used after thesubcommand
- NOTE:
- Example:
CLdb -- arrayBlast --perldoc -- run
- See the Jupyter notebooks in ./doc/
WARNING: this is very out-of-date
- See the wiki.
All feedback is welcome, except for bug reports... OK fine, ALL feedback is welcome.
Please provide it via Issues on GitHub.
Copyright (C) 2015 Nick Youngblut
This program is free software; you can redistribute it and/or modify it under the terms of either: the GNU General Public License as published by the Free Software Foundation; or the Artistic License.
See Lhttp://dev.perl.org/licenses/ for more information.