Problem
The GC-content of a DNA string is given by the percentage of symbols in the string that are 'C' or 'G'. For example, the GC-content of "AGCTATAG" is 37.5%. Note that the reverse complement of any DNA string has the same GC-content.

DNA strings must be labeled when they are consolidated into a database. A commonly used method of string labeling is called FASTA format. In this format, the string is introduced by a line that begins with '>', followed by some labeling information. Subsequent lines contain the string itself; the first line to begin with '>' indicates the label of the next string.

In Rosalind's implementation, a string in FASTA format will be labeled by the ID "Rosalind_xxxx", where "xxxx" denotes a four-digit code between 0000 and 9999.

Given: At most 10 DNA strings in FASTA format (of length at most 1 kbp each).

Return: The ID of the string having the highest GC-content, followed by the GC-content of that string. Rosalind allows for a default error of 0.001 in all decimal answers unless otherwise stated; please see the note on absolute error below.

Sample Dataset
>Rosalind_6404
CCTGCGGAAGATCGGCACTAGAATAGCCAGAACCGTTTCTCTGAGGCTTCCGGCCTTCCC
TCCCACTAATAATTCTGAGG
>Rosalind_5959
CCATCGGTAGCGCATCCTTAGTCCAATTAAGTCCCTATCCAGGCGCTCCGCCGAAGGTCT
ATATCCATTTGTCAGCAGACACGC
>Rosalind_0808
CCACCCTCGTGGTATGGCTAGGCATTCAGGAACCGGAGAACGCTTCAGACCAGCCCGGAC
TGGGAACCTGCGGGCAGTAGGTGGAAT
Sample Output
Rosalind_0808
60.919540

In [51]:
with open('rosalind_gc.txt') as f:
    lines = f.readlines()

In [52]:
for line in lines:
  print(line)

>Rosalind_7766

ATTCATCACCTGAGGGCTATAAGTCGAGTAAGTGCAGGTCGTCGCTGCAGCTTCTTCGCA

CTGATCGAGTATGTAGGCCAACGCGAATGTCTGGCTGGTAACGATTCCAAAGTCTGGCTC

AAGGCCTCAAGGTACATCAAGGTACGCGGCATTGAGCTCTGGATAATCTCTAGAGGTCTC

CATAAGGATCGTATATGAAACGTAAATGCTCAGTTGGTAGGTTAGCGGGATTTAGGAGAA

ATTTTGCACACGAGCCCGTTTCAGCCCGGATATTCGGATAGGTCAGGCCTTGCAGGCTAT

TAGGTCAAACGAGTATGGGCAGAAGCGGCAACAACTTAAAATCGTCGCACCGCCTGGCTC

GGAGTCTAATTTTCATTTGGTATAACTGTGCCCGGGTTCCCAACAAGAAAGAGAGACCCT

CTATAACTGAATAGGGACATTATTTCAAGATCGATAGACCAAGTCACACTCCAACGCCCC

AACTTGCTACTTTCCGAGCAGTGTGGATGTAAGTTAACGCTAGGTTTGCTGGGAGAAAAT

GAACGTCGATAAGTATTGACCACTGACGCCACTACCCAGGTAAGTTATAATCCTCCTCAC

CGCTAACCCAGTGCAGTTCGTTGGACATTGCGCGATCATGGAGGCTCTTTGATAGTTTGA

CTTCCCTAGTTTATTGGACTGCCTAGTAGAACTAGTTCCGAGCTCCCGTTGCACGACTAC

TTCAACGCTTTGCATTAAGGCGCGGTTAGAGAACTAGAAACGTTATCCGATCTGGTTGCG

TCAGAGAATCACTGGCCGTGGGGCCCACAAATTCTCAATCCAAGGCAATAGCTCCGAATT

AACCACTGGTTCTCCAG

>Rosalind_9276

GGAAAGTAGTGCAGGTTAGCCTAGGTCTTTACTGCAGTCGATCGGGGCAGCCGCTGCGAA

CACGGAGTATAATCTAAGG

In [53]:
sequence = ''
gc = 0
tmp_gc = 0
tmp_bases = 0
tmp_sq = ''
for line in lines:
  if line[0] == ">":
    if tmp_bases > 0 and tmp_gc/tmp_bases > gc:
      gc = tmp_gc/tmp_bases
      sequence = tmp_sq
    tmp_sq = line[1::]
    tmp_gc = 0
    tmp_bases = 0
  else:
    for base in line:
      if base == 'C' or base == 'G':
        tmp_gc += 1
      if base == 'C' or base == 'G' or base == 'A' or base == 'T':
        tmp_bases += 1
if tmp_bases > 0 and tmp_gc/tmp_bases > gc:
    gc = tmp_gc/tmp_bases
    sequence = tmp_sq
print(sequence,round(gc*100,6))

Rosalind_1584
 50.793651
