# Hacking Nature

_TLDR; A strain of protein is a sequence of amino acids. As is it's a dead molecule, but once folded from a simple strain to a three-dimensional structure it comes alive and serves a purpose - This procedure is called protein folding and is triggered by rhibosomes._

_Proteins are the driving forces for a whole lot of bodily mechanism, so when shit hits the fan during the folding process you could pick amongst a huge pile of dangerous side-effects._

_This notebook seeks to investigate the genomes responsible for diseases related to protein misfolding. There's no real agenda - I'm just trying to learn something interesting._

### 14.11.2021
Reading about allergies I stumbled upon this term `protein folding`. As I dug into the mechanism of misfolded proteins it soon got apparent that it was the root cause of a whole variety of diseases affecting everything from our brains and nervous system to our heart, kidneys, liver and digestive tract. 

It seems different modulations of a class of proteins called [Amyloids](https://en.wikipedia.org/wiki/Amyloid) is the common denominator for a lot of these diseases, being the origin of things like Alzheimers, parkingsons, and several variants of allergies.

_*What generally happens as a consequence of misfolded proteins -- in this case amyloids -- is that they form deposits around cells which disrupts the function of the tissue and organs. I.e brain and heart*_

### [Protein Folding](https://en.wikipedia.org/wiki/Protein_folding)
Protein folding is the physical process by which a protein chain is translated to its native three-dimensional structure, typically a "folded" conformation by which the protein becomes biologically functional.

Many allergies are said to be caused by the incorrect folding of proteins, which causes our immune system not to produce antibodies for certain protein structures. _I have no clue as to what that means,_ but I wonder if it has any relation to the mechanisms of the spike protein found in coronaviruses.

Alzheimers is another example of a condition caused by protein misfolding.

### How does protein misfolding relate to allergies
I don't yet know exactly how this relates to misfolded proteins. 

An allergic reaction happens when an antibody (immunoglobulin) connects to an antigen (antibodygen, allergen) and triggers the release of histamines which is an inflammatory chemical. 

A person breaths in an allergen. Immune-cells (Dendritic Cells) transport the allergen-molecole to the lymph nodes presenting it to an Naive T-cell. Here one of two things happen; it's ignored, or we start making specific immunoglobulin which causes a type 1 allergic reaction. 

In many ways the dendritic cell is responsible for the person being allergic or not. Since it eithers present the allergen to the T-cell with a costimulatory molecule triggering the production of specific immunoglobulin-cells which dialites the blod vessels -- making tissue around it more likely to swell -- and causing the bronchus to contrict, making it harder to breath. Or.. the dendritic cell presents the allergen with a tolerogenic molecule which results in mostly nothing.

### [Diseases related to incorrect protein folding](https://www.news-medical.net/life-sciences/Protein-Folding.aspx)
Alzheimer's disease is an example of a neurodegenerative condition caused by protein misfolding. This disease is characterized by dense plaques in the brain caused by misfolding of the secondary β-sheets of the fibrillar β-amyloid proteins present in brain matter. Huntington's disease and Parkinson's disease are other examples of neurodegenerative diseases associated with protein misfolding.

Cystic fibrosis (CF) is a fatal disease caused by misfolding of the cystic fibrosis transmembrane conductance regulator (CFTR) protein. In most cases of CF, the phenylalanine at position 508 of the CFTR is deleted, causing misfolding of the regulator protein. Several allergies have also been shown to be caused by incorrect protein folding.

**A protein is consideres to be misfolded if it cannot achieve its normal native state**


### Some Papers and Wikipedia articles

* [Modulation of β-Amyloid Fibril Formation in Alzheimer’s Disease by Microglia and Infection](https://www.frontiersin.org/articles/10.3389/fnmol.2020.609073/full)
* [A Cell Surface Receptor Complex for Fibrillar β-Amyloid Mediates Microglial Activation](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6742111/)
* [Amyloidosis](https://en.wikipedia.org/wiki/Amyloidosis)
* [Prions and the Potential Transmissibility of Protein Misfolding Diseases*](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4784231/)
* [Allergen-Specific Antibodies Regulate Secondary Allergen-Specific Immune Responses](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6344431/)
* [IgE cross-linking critically impairs human monocyte function by blocking phagocytosis](https://pubmed.ncbi.nlm.nih.gov/23374271/)
*[Diseases of the immune system](https://books.google.no/books?id=chs_lilPFLwC&pg=PA159&redir_esc=y#v=onepage&q&f=false)
*[Costimulatory molecules on immunogenic versus tolerogenic human dendritic cells](https://www.frontiersin.org/articles/10.3389/fimmu.2013.00082/full)

### NOTES
- Denaturation is the opposite of protein folding, which turns the folded protein back to it's unfolded state. In other words the protein is dead. This generally happens when you burn or cook something. I wonder what consequences this has (if any), and if it's connected to processed foods [causing cancer](https://www.bmj.com/content/365/bmj.l2289). 


In [70]:
# Amyloid precursor protein genome
# https://www.ncbi.nlm.nih.gov/nuccore/NC_000021.9?report=genbank&from=25880550&to=26171128&strand=true
with open("amyloid_precursor_protein.txt") as f:
    APP = f.read()

In [73]:
genome = ""

for i in APP.split("\n"):
    for c in "0123456789 ":
        i = i.replace(c, "")
    genome += i
    
# number of base-pairs
genome = genome.upper()
len(genome)

290579

In [74]:
import zlib
a = zlib.compress(genome.encode("utf-8"))
len(a)

82576

In [77]:
def DNA_to_mRNA(DNA):
    compliment = {
        "C": "G",
        "G": "C",
        "A": "U",
        "T": "A"
    }
    ret_val = ""
    for l in DNA:
        ret_val += compliment[l]
    return ret_val

mRNA = DNA_to_mRNA(genome)

In [92]:
# see if we have a start codon
mRNA.find("AUG")

616

In [99]:
# find index of start codon
start_codon = mRNA.find("AUG")
mRNA[start_codon:]

'AUGGGGGGUUCUGGAAUUGGGUUCAGAAAUUACGUCUCUUCGGCCCCCAGGCAGUUACCCUGGGGAGAGGAGAGGCGGGGGCGAACGCCUGCAGGUCGCGUAGGGGCGAAAGCCGGGUCGGGACGGGGUCCCUCAGCGCGAGGCCGGGCGACUCUCCCUCGCCCGCUCCGCGACCAGAGGGACCAAGGCGCGGUCGGGCCCCGCUCUUCCCAUCCCCCGCUGGGACUCGGGUCUGGGGCUGAAUCAGGGACGGAACCUUCGCCCCCAGCCCCCUCCGCUCUCUGUAAGUCUGUCCCCCCUUCCCCCUUCCUCGUGCACCCCUUUUGGCUUUUGCGUCGCAGGGAUUUCGGUCAGGAAGCGAGAACUUAGCAACGGGAAAGGAAAGAGAACCCGGGACCCCUCCUCCUCCUCCUCAGCCCUGUCGGGUCCUUCGAUCCUCGGGGAGCGAAAAGGGACGCAAGAGGCGGAGCAAAGUGGAAGGAGAGAGUGGGGGUAGGGGGAAGGAACGAGGUUGGAGAAAAGCGCCGACGACCGCCGCCGUCGGACCCACGCUGGGAUCGGGCUCGGCCGCAGCCCCAGCGCGUGGCGGAAGGUGUCCGUUUGAAACACGUACAGGCGCAGAGGGAAAACACAUUGAACGCUCUUUACCCUCCCCAGCCUCUGGGUAUCGGUGGGGCGCGGAAGGGGUUCAGACCUGCAGGGCCCUCAAGCGAGCGUGGGAUUCGAGACAGACUCUCCGUCUUCCAGGCGCCCUUGUUUUCGGCGCUGUGGGGCGGGAGCGGCACCCGGCUCGGCGCAAUCAGAGUCACGCGGUGGCUCCGAGGGCGGGCCGAGGGAAGGGCCCCGACGCGCAGGCGGACCCUCCCCCGUCUCUGCGACGGCCCCGAAGACCGCCUCGGAGCCCCGGGGAGACGAAGGGUGGGAGCCUAGUGGAGGCUCCGAAUUAAGACACGACCCGACAUUUCAGGAACCGCCCCUCCUUUAUUCCGCCUCUCACCC

In [179]:
def find_codons(mRNA):
    # GGG GGU UCU
    codons = []
    stops = ["UAA", "UAG", "UGA"]
    start_i = mRNA.find("AUG")
    n_mRNA = mRNA[start_i:]
    
    for i in range(0, len(n_mRNA), 3):
        codon = n_mRNA[i:i+3]
        codons.append(codon)
        if codon in stops:
            break
            
    return codons
    
find_codons(mRNA)

['AUG',
 'GGG',
 'GGU',
 'UCU',
 'GGA',
 'AUU',
 'GGG',
 'UUC',
 'AGA',
 'AAU',
 'UAC',
 'GUC',
 'UCU',
 'UCG',
 'GCC',
 'CCC',
 'AGG',
 'CAG',
 'UUA',
 'CCC',
 'UGG',
 'GGA',
 'GAG',
 'GAG',
 'AGG',
 'CGG',
 'GGG',
 'CGA',
 'ACG',
 'CCU',
 'GCA',
 'GGU',
 'CGC',
 'GUA',
 'GGG',
 'GCG',
 'AAA',
 'GCC',
 'GGG',
 'UCG',
 'GGA',
 'CGG',
 'GGU',
 'CCC',
 'UCA',
 'GCG',
 'CGA',
 'GGC',
 'CGG',
 'GCG',
 'ACU',
 'CUC',
 'CCU',
 'CGC',
 'CCG',
 'CUC',
 'CGC',
 'GAC',
 'CAG',
 'AGG',
 'GAC',
 'CAA',
 'GGC',
 'GCG',
 'GUC',
 'GGG',
 'CCC',
 'CGC',
 'UCU',
 'UCC',
 'CAU',
 'CCC',
 'CCG',
 'CUG',
 'GGA',
 'CUC',
 'GGG',
 'UCU',
 'GGG',
 'GCU',
 'GAA',
 'UCA',
 'GGG',
 'ACG',
 'GAA',
 'CCU',
 'UCG',
 'CCC',
 'CCA',
 'GCC',
 'CCC',
 'UCC',
 'GCU',
 'CUC',
 'UGU',
 'AAG',
 'UCU',
 'GUC',
 'CCC',
 'CCU',
 'UCC',
 'CCC',
 'UUC',
 'CUC',
 'GUG',
 'CAC',
 'CCC',
 'UUU',
 'UGG',
 'CUU',
 'UUG',
 'CGU',
 'CGC',
 'AGG',
 'GAU',
 'UUC',
 'GGU',
 'CAG',
 'GAA',
 'GCG',
 'AGA',
 'ACU',
 'UAG']