# Gene Study 

----


Searching online database from NCBI (eg. PubMed, MESH, OMIM, etc.) identify a gene that interests you. We will be using this gene throughout the course, so take time to find something that you would like to analyze in different ways. You do not have to have any previous knowledge about the gene, you will learn everything you need to analyze it as part of class. 


## Let's Hear About Your Gene
In a few sentences, share some basic information about you gene including its function (if there is one), which organism it is found in, and what you found interesting about it. Post it to slack in `#gene`. For example:
> "Bioluminescence refers to the production of light by living organisms. The luxA gene encode the α-subunit of the enzyme luciferase producing the light emitting species. Bioluminescent bacteria are identified in _Vibrionaceae_, _Shewanellaceae_ and _Enterobacteriaceae_ and are mainly found in marine habitats."


## Let's Hear About Your Gene...Literally
We see now, that DNA encodes a tremendous amount of information -- some of which we understand and some that we do not. Many different approaches have been attempted to manipulate and translate these sequences of characters in different ways to try to identify obscured patterns. One of these has been to convert a given DNA sequence into a musical composition.

Given a sequence, map each nucleotide base to a musical note and generate a melody based on the DNA sequence provided. Utilize the translated amino acid to generate a complimentary note for the codon. This note should play for all three bases that make up the codon.

```
♩♩♩ ♩♩♩ 
TAA CAT
 ♬   ♬     
ILE VAL
```

There are many approaches to accomplishing this, but the `sounddevice` Python library is a great place to get started. A simple example of using the library is shown in `sounds.py`. Expand on the above code snippet to play the song of your gene and save it as an audio file. Post the audio file along with you gene description.

Commit your music composition code and the generated audio file in the repository. Submit the URL for your repository.

In [None]:
import sounddevice as sd
import numpy as np
import soundfile as sf # to save souns file created

# Mapping the DNA bases to frequency
base_to_frequency = {
    'A': 440, 'T': 494, 'G': 523, 'C': 587
}

# Mapping the amino acid to frequencies (for codons)
aa_to_frequency = {
    'I': 659, 'V': 698, 'F': 784, 'M': 880, 'L': 988,
    'C': 1047, 'A': 1175, 'G': 1319, 'T': 1397,
    'S': 1568, 'W': 1760, 'Y': 1976, 'P': 2093,
    'H': 2349, 'Q': 2637, 'N': 2794, 'K': 3136,
    'D': 3520, 'E': 3951, 'R': 4186
}

# Mapping codons to amino acids
codon_table = {
    'ATA': 'I', 'ATC': 'I', 'ATT': 'I', 'ATG': 'M',
    'ACA': 'T', 'ACC': 'T', 'ACG': 'T', 'ACT': 'T',
    'AAC': 'N', 'AAT': 'N', 'AAA': 'K', 'AAG': 'K',
    'AGC': 'S', 'AGT': 'S', 'AGA': 'R', 'AGG': 'R',
    'CTA': 'L', 'CTC': 'L', 'CTG': 'L', 'CTT': 'L',
    'CCA': 'P', 'CCC': 'P', 'CCG': 'P', 'CCT': 'P',
    'CAC': 'H', 'CAT': 'H', 'CAA': 'Q', 'CAG': 'Q',
    'CGA': 'R', 'CGC': 'R', 'CGG': 'R', 'CGT': 'R',
    'GTA': 'V', 'GTC': 'V', 'GTG': 'V', 'GTT': 'V',
    'GCA': 'A', 'GCC': 'A', 'GCG': 'A', 'GCT': 'A',
    'GAC': 'D', 'GAT': 'D', 'GAA': 'E', 'GAG': 'E',
    'GGA': 'G', 'GGC': 'G', 'GGG': 'G', 'GGT': 'G',
    'TCA': 'S', 'TCC': 'S', 'TCG': 'S', 'TCT': 'S',
    'TTC': 'F', 'TTT': 'F', 'TTA': 'L', 'TTG': 'L',
    'TAC': 'Y', 'TAT': 'Y', 'TAA': '*', 'TAG': '*',
    'TGC': 'C', 'TGT': 'C', 'TGA': '*', 'TGG': 'W'
}


In [None]:
def generate_tone(frequency, duration_ms=500, sampling_freq=44100):
    '''
    This function generates a tone for a given frequency (from sounds.py)

    Inputs:
      frequency, duration_ms, sampling_freq (int): inputs for tone definition

    Returns:
      tone: the tone for a given frequency
    '''
    t = np.linspace(0, duration_ms / 1000, int(sampling_freq * duration_ms / 1000), False)
    tone = 0.5 * np.sin(2 * np.pi * frequency * t)
    return tone

def read_fasta(fasta_file):
    '''
    This function reads gets the DNA sequence from a given the FASTA file.

    Inputs:
      fasta_file (string): the FASTA file of your gene 

    Returns:
      sequence (string): DNA sequence
    '''
    with open(fasta_file, 'r') as file:
        lines = file.readlines()
    
    # Ignore the first line if it starts with ">" (FASTA header)
    sequence = ''
    for line in lines:
        if not line.startswith('>'):
            sequence += line.strip()
    return sequence
    
def translate_dna_to_aa(dna_sequence):
    '''
    This function translates a given DNA sequence into amino acids

    Inputs:
      dna_sequence (str): the DNA sequence of interest

    Returns:
      amino_acids (str): the amino acids from the DNA
    '''
    amino_acids = ""
    for i in range(0, len(dna_sequence) - 2, 3):
        codon = dna_sequence[i:i+3]
        amino_acids += codon_table.get(codon.upper(), 'X')  # 'X' for unknown codons
    return amino_acids

def dna_to_melody(dna_sequence, sampling_freq=44100):
    '''
    This function converts a DNA sequence to a melody

    Inputs:
      dna_sequence (str): the DNA sequence of interest
      sampling_freq (int): sampling frequency for unknown characters

    Returns:
      melody (array): updated melody
    '''
    melody = []
    
    # Iterate through DNA bases and generate tones
    for base in dna_sequence:
        freq = base_to_frequency.get(base.upper(), 0)
        if freq > 0:
            base_tone = generate_tone(freq, duration_ms=250, sampling_freq=sampling_freq)
            melody.append(base_tone)
        #else: 
            #print(f"Base '{base}' not found in the dictionary.")
    
    # Translate to amino acids and generate codon tones
    amino_acids = translate_dna_to_aa(dna_sequence)
    for aa in amino_acids:
        freq = aa_to_frequency.get(str(aa), 0)
        if freq > 0:
            codon_tone = generate_tone(freq, duration_ms=750, sampling_freq=sampling_freq)
            melody.append(codon_tone)
        #else: 
            #print(f"Amino acid '{aa}' not found in the dictionary.")

    return np.concatenate(melody)

def save_melody(melody, filename="melody.wav", sampling_freq=44100):
    '''
    This function saves the melody as a .wav file.
    Source: https://python-soundfile.readthedocs.io/en/0.11.0/

    Inputs:
      melody: 
      filename (Str): The name you want your sound file to have.
      sampling_freq (int): the sampling frequency

    Returns:
      NA. Saves sound file as.
    '''
    sf.write(filename, melody, sampling_freq)
    print(f"Melody saved as {filename}")


if __name__ == "__main__":
    # Melody for SLC6A4 sequence 
    fasta_file = "gene.fna" # the FASTA file downloaded from NCBI
    slc6a4_seq = read_fasta(fasta_file) # get DNA sequence from FASTA file
    melody = dna_to_melody(slc6a4_seq) # convert sequence to melody

    # save the melody
    save_melody(melody, "SLC6A4_melody_sonia.wav")
    
    # Play the melody
    #sd.play(melody, samplerate=44100)
    #sd.wait()

#### References:
- Role of Serotonin-Associated Genes: https://www.frontiersin.org/journals/genetics/articles/10.3389/fgene.2020.601868/full
- SLC6A4 gene from NCBI: https://www.ncbi.nlm.nih.gov/gene/6532
- 