cubar is a package for codon usage bias analysis in R. Main features are as follows:
- Codon level analyses
- Calculate codon weights based on gene expression, tRNA availability, and mRNA stability;
- Calculate relative synonymous codon usage (RSCU);
- Machine learning-based inference of optimal codons;
- Visualization codon-anticodon pairing relationships;
- Gene level analyses
- Tabulate codon frequency of each coding sequence;
- Measure codon usage similarity to highly expressed genes with Codon Adaptation Index (CAI);
- Quantify the influence of codon usage on mRNA stability with Mean Codon Stabilization Coefficients (CSCg);
- Measure codon usage bias with the nonparametric index Effective number of codons (ENC);
- Measure the fraction of pre-determined optimal codons (Fop) in each sequence;
- Overall GC content (GC) or that of 3rd synonymous positions (GC3s) or 4-fold degenerate sites (GC4d);
- Quantify whether codon usage matches tRNA availability using tRNA Adaptation Index (tAI);
- Measure the deviation from porportionality (Dp) of viral synonymous codon usage from host tRNA supply;
- Utilities
- Sliding window analysis of codon usage within a coding sequence;
- Optimize codon usage based on optimal codons for heterologous expression;
- Test differential usage of codons between two sets of sequences;
Main advantages of cubar
are as follows:
- Process large datasets (>10,0000 sequences) efficiently using the
Biostrings
anddata.table
backends; - Support genetic codes cataloged by NCBI as well as custom ones;
- Integrate with other data analysis or bioinformatic packages in the R ecosystem;
Depends
R
(>= 4.1.0)
Imports
Biostrings
(>= 2.60.0),IRanges
(>= 2.34.0),data.table
(>= 1.14.0),ggplot2
(>= 3.3.5),rlang
(>= 0.4.11)
The latest release of cubar
can be installed with:
install.packages("cubar")
The latest developmental version of cubar
can be installed with:
devtools::install_github("mt1022/cubar", dependencies = TRUE)
Documentation can be found within R (by typing ?function_name
). The following tutorials are available from our website:
- Get Started: A brief introduction demonstrating the basic usage of
cubar
; - Non-standard Genetic Code: How to use
cubar
with non-standard genetic codes; - Theories behind cubar: The mathematical details behind the core functions in
cubar
;
Please use GitHub issues for bug reports, questions, and feature requests.
- Biostrings for sequence input/output and manipulation;
- Peptides for peptide- or protein-related indices;
GitHub Copilot was used to suggest code snippets in the development of this package. Thanks the GitHub Education teacher program for providing free access to GitHub Copilot.