# Problem

The GC-content of a DNA string is given by the percentage of symbols in the string that are 'C' or 'G'. For example, the GC-content of "AGCTATAG" is 37.5%. Note that the reverse complement of any DNA string has the same GC-content.

DNA strings must be labeled when they are consolidated into a database. A commonly used method of string labeling is called FASTA format. In this format, the string is introduced by a line that begins with '>', followed by some labeling information. Subsequent lines contain the string itself; the first line to begin with '>' indicates the label of the next string.

In Rosalind's implementation, a string in FASTA format will be labeled by the ID "Rosalind_xxxx", where "xxxx" denotes a four-digit code between 0000 and 9999.

**Given**: At most 10 DNA strings in FASTA format (of length at most 1 kbp each).

**Return**: The ID of the string having the highest GC-content, followed by the GC-content of that string. Rosalind allows for a default error of 0.001 in all decimal answers unless otherwise stated; please see the note on absolute error below.

In [12]:
import numpy as np 

def calculate_gc_content(sequence_array=[]): 

  gc_counts = np.zeros((len(sequence_array),1))  # init arr to hold GC 

  for i in range(len(sequence_array)): 
    curr_gc_count = 0  # store gc count for current sequence
    curr_seq = sequence_array[i]  # store current sequence 

    for j in range(len(curr_seq)): 
      if (curr_seq[j] == 'G') or (curr_seq[j] == 'C'):  # check for GC 
        curr_gc_count = curr_gc_count + 1

    curr_gc_count = curr_gc_count / len(curr_seq) * 100 # convert to percentage
    gc_counts[i] = curr_gc_count  # store curr count in array 

  return gc_counts


In [16]:
sequences = [
'GATAGTAACAGCGGCTCAGCATGCAGCCAGAGCTATATGTAGTTGACTGGGACAAGGCAA\
TCGTGAAGCCGCCGTCAGCGTTGTTGCATAGTCGTTCTAAGCGGGCTTCATTTATACGGG\
ACTTGGACAATCTTTACGGGGAAGCGTGGAACACCCCTCCAACAGGCTCCTATCGGATAT\
AGTCAAGGGGGACCGAGGATGGTCTCGCCATCACTAGTTGTAAAAACTGCACGGACGGAA\
TTAGTGACTCACACCTACCCGTAAATCTTGCATGTGCATTGCATGTCCTCAACAGGCGGG\
TGAGGGGGAGAACCGTCTGGTGTCTGCCGGAGGCGCGATGCCGGTCGTTAACCGCAAACT\
CTATCAACCTTCGACGCGCCCTTTCACTCACCTTCGTTTTTCCATGCCGTTGTTTCCAGC\
CTCTTCTAGCCTGCACAATCACGACCGTCTTGTCCTTCTCGCTTAAGTAGTGAGTGGTAT\
ACGGCCATCGCATACATTTGCTGGTGATTGGAGAGTCGCTCGAGGCCACCGCTCCTGAAC\
CACGATTGAATTCCCCGAACGGCCCGTAACTACTAAGTGAAAGATCAGGCCTAATTCTCG\
GAGAACATGTGTTCATCTCATCATTCCTAGTCGAGTGACCGGTACAACGATTACGATTAC\
GGGACCAACGCTGTGGCGAAATTGTGCCACCGTGCGCTCATAAGATATGTTAGGGAAAAT\
GTGAACTACTCCCATTCGGTTCGCAATCGAACACCTGCAATGAAATGCTAGTTTGGCGTC\
GCGTTAGTACGTGGCCTGCAGGGTTGTGTACAATTTACGCGTTGACCATCGGATCCTTTC\
ATCACTATCATACTAAAGAACCCATATTTTAGTAAAGCACTTATTAAGTCCGCAGGGGTC\
GGCGTCTGACGAGGAATTTC',
'ACTAGCCAAGCCCGGCCAATTCCAATAAGGGATCAGAATGATATTCGCAATATTTTGCGT\
ACAGAGCCTGATCCTGCTTCCGAGCAAAAAGTCCGTCCATTTTGAAAAACCAACTGAGGG\
AATTCCGTTAGGACGCCAGAGAATAGATTTACTGTGTAGACGGTAAGGGGCCGGGGATTT\
TTCATCGAGCTCAGCTGCTCCAGTTGACTTAATTATGAAACCGTGGTCTCGATACTACCC\
GTGCGTCCTGCCCTTCTGATCTCTACCGAAGCCAGCACGAACATACCGTGACACCCTTTC\
TATAGTATCTACTACGCCCCCACTTGCTATCCACCAAGTAACCGCCCAGGATTGCGTACT\
CTGCCACTCAAACACCGCTACCCGGCACAACTACCAGACAACACGAAAGATTCCCCCCTA\
ATTCCCTTGCCGCGCATGGGGCTGAAAAAATATCCGCGCTGTGCCTTGTTGGTTGTGACT\
TCAGTCCCTGCTTGTTTTCACGTGTATGGCGTGTACGGCATTACCCCGTATCAATTCATT\
ATTCGATATATATGATACACGGCTACCATGCTTGCCTTTCTTGATATTGATTGGGGCCGC\
TCGCTATCAAAGCTCCTACCCGCTATTATAGCGAGCACCAACCTTACTTCTGGTCGCTAA\
CCCATCTAGTTAGATCGACCTACTACCCCATACCAGTCACGATATGGGGATACTATGCCT\
TTTTCTCGGAGGGGGAGACGTGAATGAGAAGCCCTGTCTTTTAATTCATGTACAGAGAGC\
TCTGAGAAACAGCTATAGCGGACGAAGTGGTCAGCGAGGCTCCAGCAGTAGCGTTGGTAG\
GGTCATGGATTCATTTGCTCCCTCGTACGGAAGAAGTATGCTTTTAGCATACCACCGCAC\
CCCGAAACACCGAGTCTAGAGTCA',
'GCGCTCTCAGTATAAAAAGCGGCGACATCCGCATTGCGTCTCTTACTATTATATTCCGGG\
AGGAGGACAGGGCCAGTGGATGAGGCATCTTCCTCAAACGAAACTGGCTTACGAGTCCGG\
ATATGTCATCGGCCGGTACTGCAAAGCGCGGATGTCCAGTGGGATTTGGGCATAAATCGG\
TACATTCCTAATTACCCCAAAGTGGTGAAACGCTAGTTGCCTTACCGCCAGCAAAGATCC\
ACGATTAGGCAGTAGCTTAACTGTTGTGCAGCTTAGTCCAAGACTACGAGGCTGTACGGA\
TACCACCAGTACTCGTGAACTGCTCTAATTGACGGTCCACTTCTTCTGTGGTCTAGGCGC\
CACGTCGGACCAGGAAAAGCGCAGACACGCCGGGTACATTACCATTATCCGACGGCCTTG\
TGGTCACACTCGCATACACCCAATATATCTACAGGCATATCAGAATCTCTTTCTACGCAA\
ATGTGCGGGACGTAGATGCTTATAATAATCCCTAACTGATCGGTAGTGGTGTGGGCGGCT\
CCTTTAGTACTCAGAGTTAGTGAACCAGCGGGAGCCCTTGGACATATCGCGCCGTTGATG\
CATAAGCATGCAGCACAGGATAAAAAAGGCGCTCAACAACCTTTTTTGTCATAAGACACG\
GCAATTGGCTAGTAAAAGCAGATGACCGGTGCCCTTAAAGGTGTAGTCAAGAGCTACTGC\
ACTAGCGCAGCAAAGCATCCTGTCGAGGTATAAACATTCTCGTGAAGTGCTGGGGACCAT\
CCTACGTACCGAACAACAATTGGGATACGTCGGATGACGATAATAGAATGCGGCTACGAG\
AATTCATCCGGCGGCGGTAAGAGATGAACGACATCATATGACACTGAATGGCATGTCC',
'TAGATAGACTCCCAACCGCTTTCTTATCCTTGTGTCACTTTCGCGATTGCAAGCATTCCT\
AAAGGATCTCTTCCATTTCACGGATTAGGGGGAGACGCCCTGGGCTATTCGAAGACAATA\
CGGCCAGAGGTTTGCTAACATTCACACGCGTCTCATACGAGTTGATATACACTATGCGGC\
TTGAGACTTCGCGCGTCCTAGTATCTAATAGGGATTAGCGTCGGTCACACAACCGACACT\
GGGCATGGCGTCGTACCCTGAAAACTCTCAACATGAATACGTCGTAGTGGCAGTTGCGTG\
TAGAACATCTGCCGAAGCGGTGCACTAGACCGAAGATGAGCCCCGCAGTCATCGAAGAAT\
GATCACTGCATTGTAACATACGAAGGGTTGTCTCATGAGCTGAATAAAGTGTTCGAAAAT\
CTTAGGCCACCGGGCATCTGACTCTCCTAAGATGGTCCTGAATGGTGAGTGAAGCTTTTA\
GCTCCGTTTATGTTTTGGTAGAGGGGACTTTCGTACCTATGCCGAACGTGGAATAATGAT\
TCGTCTTTTTTGGCCGGGATAGCATTGAATAGACCCAATGACACCGTCAATTGGTCGGGC\
AGATATATCGCTGACGCCCGAAGCCAAGAACCGGATGCTGATCAATTTATGCTCCCTAGT\
ATGCGAGCGGCCGGCCTTACATGATCTCTCATTGATTTGCCAGGTGCACAGCTCTTAGTT\
TCAGTGTAGGTAACATATTAGCAGAAAGGTCCTTGGGTATGCACACATGTGCAATGATAG\
TTCGTCCATATGCTGTTATCTACGTCAAGATGCAAAGGTCTGATTAGCTAGCCTTCTGCG\
CGCATGTGACAACAGCCCGTGGTTTAATTTACCT',
'CAAGTTAAGGAGAGACGTCTTACTCGTCAAACGAAATGAATGGCAGTCACGCACGAGTTA\
GGTTTGGAGGTTGATCTAGTGCGCCTCGCTCTTCTTGTGTTCGTCTCATAAGGTAAGTCC\
ACGCCCGTTAAAGCTATTAAGCTCCATCTAAACACTAATGCAAATTGTCTACTGGCTGGC\
TATAATCCTTGACGAGGAGCGAGGTCTGGTCCTCTCTCGAAATAAGTGCTTCGATTGCGA\
CTTGGGCTTAGCAGTTGCTTATAATGGATAGTAGATAGTTTGTTAACCGAGAGCTGAGTC\
CTGTAAATATTATTCCATTATGGACCGCGAAATAGAGAAGGGTATAACAGTCGAGCTGTA\
GCTCGGGACGAGGTATGTAGTCATGATGGGACATCGAGCGACTCGTCGAGTGGGCAGAAC\
ACACTAATTGGAAACGTATGATGACTATAAGCTGCACGTGTTAGTAGAAGCGCAGTTTTC\
TCGGGCCATAATTCTCGGACGAGCCTGGAGTCTAGTATTCTGGCTTGTGGGCGGCACGCG\
CAAATGGACTAATACGAGGATAAACAAGGTCCCGGACCGTACCATTTGGTAGGATTACAT\
GCTAAAGGAGGCTTTGCAGTATCGCTCACATGTTTTCTCCTGTTGAGCCAATGCGATTAT\
TCATGCTATGCGTACCCAAAACGGTGAAATGCGCAGCAAACCTGGGCCTGCCGACAAGTT\
TTATTCAGTGAATCTTCTGAAAGTATATTAGATAAGGGCAGATAACTGAGCTCGTATCAT\
GTCGGGCAAGGGGCCACTAAGCAGTGTGATTATGATATGTTTGTCAACTGTCGTGTGGGA\
AAATGGTTCAGATATAACTACGGTGCGTT',
'CTGGAAGAAGAGACTTTGCGCCACAGTGTAAGCGCAGGGTCGTATGTATCAATCGTGAAG\
CTCGGGTGAGTGGTGGAAGTACCGACACGCTAAGGTCAGCCAAACGCACATCTCACCATT\
CGTATGGTCCGAGTTTTTAGACACCCCAAACGACCCAAGACTGAGGTATGTAGGTCACTC\
TCAATAGCACCCGTTGGTCGGCTCAAATTAAATCAAGTTGGAACGACTGGCAAAACCCTG\
GCGACCGAAAAATTCGAGGTGAACTGCTACGGGTACGCGTCTATCATGAGTACGCCGGTA\
TGGTCACTGTCGAGTGTAATAGCGGAGCCCCCGCAGCTGCCATGGTTAGGGCAGCAGAAA\
ACATTTCTTAACCTAGCACTCTGAACTTGACGCGTGTCTTACTTCATTCGGGCTAGAAAA\
CTATTCCTCGAGAGAGCCCAATAGTTATCACCAGCGGTATTATTCTTCGATTCGTGACAT\
TCCTCCAACCGGAAATTTCATTAGGACAGAGCTGACTTTACTTCGGGGTGGCGGATCGAG\
AGTACAATGGTGACTCTCACTGGTTGAAATACCAATGTTCGGCCTGTATCGACTCGGGGT\
ATGCCTCCCGGCCCGGGTCCTCAGCACCCCCAGAAGTCGACCAGTATCGGTGGGTATGGT\
GCAACCTGAAGCAAGATACGTGGTACTGGCCCCGACTATATGGAACATGGAGTGACATTT\
ACCGCATGCGTCACTAGGCTAACTCTGTAATTCCATGAACGCAGTTGAGCGTTATGCAAG\
TGACTATAAGTATTCATGCTGGTACGCACTGGTCGTTACGTACGTAACCCTCCTTTGATA\
GCTGAAGGGCAGTAGACGACTAGCACGTGTCACAGGCTTGTTTGAAAGTAGAGGACAGGA\
TTTACGTTCCGTCGTGGTTACTTGAATGCACAACTTAGGTCCAGCCCTACTTGGATGACC\
ATTTCGATATGGACAGACATAGTCAGTCCTTCGG'
]
calculate_gc_content(sequences)

array([[50.65217391],
       [49.35064935],
       [49.66592428],
       [47.82608696],
       [46.14499425],
       [50.        ]])