## A) get data
go to uniprot and download the fasta protein sequence for "G-protein coupled receptor 183" - aka "P32249"

## B) Create a protein class

that can be initialized by a name (or ID), sequence and a metrics dictionary, like:

In [71]:
metrics = {
    "hydropathy": {"A" : "..."},
    "pI": {"A": "..."},
}

In [72]:
import pandas as pd
from pathlib import Path

In [73]:
import plotly.graph_objects as go

In [74]:
amino_acids = pd.read_csv(Path("../../data/amino_acid_properties.csv"))
amino_acids = amino_acids.rename(columns={"hydropathy index (Kyte-Doolittle method)": "hydropathy"})
amino_acids = amino_acids.set_index("1-letter code")
metrics = amino_acids.to_dict("dict")

metrics

{'Name': {'A': 'Alanine',
  'R': 'Arginine',
  'N': 'Asparagine',
  'D': 'Aspartic acid',
  'C': 'Cysteine',
  'E': 'Glutamic acid',
  'Q': 'Glutamine',
  'G': 'Glycine',
  'H': 'Histidine',
  'I': 'Isoleucine',
  'L': 'Leucine',
  'K': 'Lysine',
  'M': 'Methionine',
  'F': 'Phenylalanine',
  'P': 'Proline',
  'S': 'Serine',
  'T': 'Threonine',
  'W': 'Tryptophan',
  'Y': 'Tyrosine',
  'V': 'Valine'},
 '3-letter code': {'A': 'Ala',
  'R': 'Arg',
  'N': 'Asn',
  'D': 'Asp',
  'C': 'Cys',
  'E': 'Glu',
  'Q': 'Gln',
  'G': 'Gly',
  'H': 'His',
  'I': 'Ile',
  'L': 'Leu',
  'K': 'Lys',
  'M': 'Met',
  'F': 'Phe',
  'P': 'Pro',
  'S': 'Ser',
  'T': 'Thr',
  'W': 'Trp',
  'Y': 'Tyr',
  'V': 'Val'},
 'Molecular Weight': {'A': 89.1,
  'R': 174.2,
  'N': 132.12,
  'D': 133.11,
  'C': 121.16,
  'E': 147.13,
  'Q': 146.15,
  'G': 75.07,
  'H': 155.16,
  'I': 131.18,
  'L': 131.18,
  'K': 146.19,
  'M': 149.21,
  'F': 165.19,
  'P': 115.13,
  'S': 105.09,
  'T': 119.12,
  'W': 204.23,
  'Y': 181.

In [75]:
metrics["hydropathy"]["A"]

1.8

In [76]:
from collections import deque

import numpy as np

In [81]:
class Protein(object):
    def __init__(self, name, sequence, dictionary):
        self.name = name
        self.sequence = sequence
        self.dictionary = dictionary


    def plot(self, metric="hydropathy", window_size=10):

        y_data = []
        for aa in self.sequence:
            y_data.append(self.dictionary[metric][aa])

        sliding_window = deque([], maxlen = window_size)
        mean_list = []
        for y in y_data:
            sliding_window.append(y)
            mean = np.mean(list(sliding_window))
            mean_list.append(mean)

        data = [
            go.Bar(
                y = mean_list,
                x = [x for x in range(len(self.sequence))]
            )
        ]
        fig = go.Figure(data=data)
        fig.update_layout(template="plotly", title=self.name)
        return fig

In [82]:
sequence = "MDIQMANNFTPPSATPQGNDCDLYAHHSTARIVMPLHYSLVFIIGLVGNLLALVVIVQNRKKINSTTLYSTNLVISDILFTTALPTRIAYYAMGFDWRIGDALCRITALVFYINTYAGVNFMTCLSIDRFIAVVHPLRYNKIKRIEHAKGVCIFVWILVFAQTLPLLINPMSKQEAERITCMEYPNFEETKSLPWILLGACFIGYVLPLIIILICYSQICCKLFRTAKQNPLTEKSGVNKKALNTIILIIVVFVLCFTPYHVAIIQHMIKKLRFSNFLECSQRHSFQISLHFTVCLMNFNCCMDPFIYFFACKGYKRKVMRMLKRQVSVSISSAVKSAPEENSREMTETQMMIHSKSSNGK"

name = "GP183_HUMAN G-protein coupled receptor 183"

In [83]:
test = Protein(name, sequence, metrics)

fig = test.plot()

fig.show()

## C) Plot

Create a method that has the following signature:

In [80]:
def plot(self, metric="hydropathy", window_size=1):
    """Create plotly fig object.

    The title of the fig contains protein name, 
    the x axis is the amino acid position (int) and
    y axis shows the metric at each given position. 
    A windows size can be specified to average the metrics using a sliding window 

    Args:
        metric (str, optional): Is equal to the key of the metrics dictionary the class was initialized with. 
            Defaults to "hydropathy".
        window_size (int, optional): Size of the sliding window. Defaults to 5.
    """
    
    return fig


## E) What do you observe?
Describe the pattern you see. What could be the reason for the pattern?

There are patches in the protein with similiar hydropathy, as these affect the 3D structure of the protein/ the folding.

# Optional

### Retrieve sequence programmatically
* Write a function that takes the uniprot identifier as kwarg and returns the sequence using
    * the buffered sequence from e2
    * using urllib
    * using an alternative?

### Plot annotations into the hydropathy plot
* Plot protein topology features into the same plot 
