This application was written in response to a blog post here where the author describes writing a program that would search for the text of the Bible within the DNA of a human being. I commented to the author that, since the character encoding being applied to the raw data was arbitrary and human-chosen, rather than being inherant in the data, one could use a carefully designed character encoding scheme to produce any desired textual output. The author, in my view, concedes this in the article by stating an intention to search for various versions of the Bible in the output, whether Hebrew or Greek or KJV etc. When explained to the author in those terms, my point was dismissed. Oh well.
I offer this code not as an argument for or against the existance of God, or to malign the author of the original article. The stated intent of the original project was to see what kind of text could be discerned from a DNA sample. My goal was to show that there is no inherent textual representation with a DNA sequence, and that arbitrary textual output can be produced using the same principles as a cryptographic one-time pad. In fact, that is exactly how I am treating the DNA data. The "character_encoding.dat" file is nothing more than the encrypted version of the complete works of William Shakespeare, obtained via Project Gutenberg and used with permission. The DNA source file was the same as that used by the original author, Homo_sapiens.GRCh38.dna.chromosome.5.fa, which can be retrieved from ftp://ftp.ensembl.org/pub/release-92/fasta/homo_sapiens/dna/.
It was an interesting exercise. I hope someone else may find it useful. It's released under the MIT license, as described elsewhere in this repository.
php genome_anything.php Homo_sapiens.GRCh38.dna.chromosome.5.fa 5858792
The 5858792 is the number of bytes in the character_encoding.dat file and has no significance beyond that. To produce your own ciphertext, simply place your message inside character_encoding.dat, run as above with an appropriate character limit, and save the output to a new file. Once complete, rename the new file to character_encoding.dat, and the symmetric decryption process is simply the same process again.