Symmetric Group of the Genetic-Code Cubes
Department of Biology. Eberly College of Science.
Pennsylvania State University, University Park, PA 16802
Figure 1. Graphical summary of the subjects covered by this work. A, the development of the symmetric group of the genetic-code cubes is presented. B, amino-acid PC-scales from codon norms are derived from subsets of the genetic-code cubes and optimized on a set of homologous proteins. It is shown that the amino-acid PC-scales are correlated with the physicochemical indexes reported by studies on protein folding and protein interactions. C, a Weibull probability distribution model based on the thermodynamics of the mutational process on gene populations is estimated on experimental datasets of aligned mutational variants of protein sequences. D, a feasible application of this result to de novo vaccine design is provided.
This material is supporting information for the paper "Symmetric Group of the Genetic-Code Cubes. Effect of the Genetic-Code Architecture on the Evolutionary Process" (Sanchez 2018). The derivation of the algebraic structure of the symmetric group of the genetic-code cubes (G**C, ∘) is given in the manuscript. A deep complexity of the quantitative relationships between codons and their encoded amino acids is unveiled by group (G**C, ∘). These quantitative relationships expressed by group (G**C, ∘), its subgroups and cosets were quantitatively manifested in the amino-acid PC-scales derived from codon norms. These scales are strongly correlated with the physicochemical indexes reported by studies on protein folding and protein interactions. The effect of the genetic code architecture on the evolutionary process was exposed by a Weibull distribution model inferred for the mutational process. For a set of homologous protein, different amino PC-scales can be estimated in different subsets of genetic code-cubes through the application of an optimization algorithm. The size of the set of all possible amino-acid PC-scales is large enough to reflect the huge diversity of evolutionary strategies found in natural protein-encoded genes. The result presented here would be particularly relevant to predict immunoescape epitope variants originated in populations of pathogenic microorganisms and viruses. This knowledge would improve the lifespan of de novo vaccines as well as the neutralization of potential superbugs. Current results indicate that, on thermodynamic basis, a stochastic deterministic mutational process is constrained by the genetic code architecture.
- Requirements ===============
The documents available here are Computable Document Format (CDFs), which permit the user interact with the document. That is, a CDF is something similar to a PDF with the fundamental difference that the readers can perform by himself/herself the computation discussed in the text. A CDF Player is required to interact with a CDF, in the same way that, for example, Adobe Reader is required to open a PDF. The CDF Player can be downloaded from http://www.wolfram.com/cdf/. The installation is straightforward.
- Introduction to Z5-Genetic-Code vector space ============================================================
An interactive introduction to Z5-Genetic-Code vector space is given in the CDF: IntroductionToZ5GeneticCodeVectorSpace.cdf. This CDF would be useful for the undergraduated students cursing Abstract Algebra, since several basic abstract concepts and mathematical operations are now visualized in the concret scenario of the genetic code cubes. However, no specialized knowledge is required to read it, and those concepts not explained in the document have external links to wikipedia, mathwork or Groupprops (group property wiki). So, a student can study its content in a self-taught way. The theoretical background for Z5-Genetic-Code vector space is given in (Sánchez and Grau 2009).
- Genetic-Code-Scales of Aminoacids ====================================
The application of the theory developed in the paper (Sanchez 2018) is illustrated in the CDF: Genetic-Code-Scales_of_Amino-Acids.cdf. This is a CDF containing an interactive graphical user interface tool to generate genetic code based PC-scales. File GeneticCodePC-scales&Weibull-fit_snapshots.pdf on how to use the CDF and file GeneticCodeScales.wl is required to run Genetic-Code-Scales_of_Amino-Acids.cdf and both files must be in the same folder.
The subjacent sets from the subgroups of the symmetric group of genetic-code cubes are given to explore different options to generate PC scales of amino acids correlated with physicochemical properties found in AAindex database (Kawashima et al. 2008). The analysis for six protein sequence alignments is provided as well:
- Repeat domain of breast cancer type 2 susceptibility protein
- Oxaloacetate decarboxylase, gamma chain
- p53 DNA binding domain
- Photosynthesis system II assembly factor YCF48 (PSII BNR repeat protein)
- Influenza HA protein
- HIV-1 ENV protein
- HIV-1 GAG protein.
For each scale created by the user, the CDF will estimate the cumulative distribution function to estimate probability of fixation of a given mutation in the population of selected protein.
Kawashima, Shuichi, Piotr Pokarowski, Maria Pokarowska, Andrzej Kolinski, Toshiaki Katayama, and Minoru Kanehisa. 2008. “AAindex: amino acid index database, progress report 2008.” Nucleic Acids Research 36 (suppl 1): D202–D205. doi:10.1093/nar/gkm998.
Sanchez, Robersy. 2018. “Symmetric Group of the Genetic-Code Cubes. Effect of the Genetic-Code Architecture on the Evolutionary Process.” MATCH Commun. Math. Comput. Chem. 79: 527–60. http://match.pmf.kg.ac.rs/electronic_versions/Match79/n3/match79n3_527-560.pdf.
Sánchez, Robersy, and Ricardo Grau. 2009. “An Algebraic Hypothesis About the Primeval Genetic Code Architecture.” Mathematical Biosciences 221 (1): 60–76. doi:https://doi.org/10.1016/j.mbs.2009.07.001.