**Computing GC Content**
--

**Problem** 

The GC-content of a DNA string is given by the percentage of symbols in the string that are 'C' or 'G'. For example, the GC-content of "AGCTATAG" is 37.5%. Note that the reverse complement of any DNA string has the same GC-content.

DNA strings must be labeled when they are consolidated into a database. A commonly used method of string labeling is called FASTA format. In this format, the string is introduced by a line that begins with '>', followed by some labeling information. Subsequent lines contain the string itself; the first line to begin with '>' indicates the label of the next string.

In Rosalind's implementation, a string in FASTA format will be labeled by the ID "Rosalind_xxxx", where "xxxx" denotes a four-digit code between 0000 and 9999.

**Given:** At most 10 DNA strings in FASTA format (of length at most 1 kbp each).

**Return:** The ID of the string having the highest GC-content, followed by the GC-content of that string. Rosalind allows for a default error of 0.001 in all decimal answers unless otherwise stated; please see the note on absolute error below.

**Sample Dataset**

\>Rosalind_6404
CCTGCGGAAGATCGGCACTAGAATAGCCAGAACCGTTTCTCTGAGGCTTCCGGCCTTCCC
TCCCACTAATAATTCTGAGG

\>Rosalind_5959
CCATCGGTAGCGCATCCTTAGTCCAATTAAGTCCCTATCCAGGCGCTCCGCCGAAGGTCT
ATATCCATTTGTCAGCAGACACGC

\>Rosalind_0808
CCACCCTCGTGGTATGGCTAGGCATTCAGGAACCGGAGAACGCTTCAGACCAGCCCGGAC
TGGGAACCTGCGGGCAGTAGGTGGAAT

**Sample Output**

Rosalind_0808
60.919540


In [None]:
# Biopython
!pip3 install biopython

Collecting biopython
  Downloading biopython-1.79-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl (2.3 MB)
[K     |████████████████████████████████| 2.3 MB 5.1 MB/s 
Installing collected packages: biopython
Successfully installed biopython-1.79


In [None]:
import Bio

from Bio import SeqIO

max_total = 0
max_id = ""

for number, record in enumerate(SeqIO.parse("example.fasta","fasta")):
  #print(str(record))
  conta_G = str(record.seq).count("G")
  conta_C = str(record.seq).count("C")
  tamanho_seq = len(record.seq)
  seq_id = record.id
  #print(conta_G, conta_C, tamanho_seq)
  total = ( (conta_G + conta_C)/tamanho_seq )*100
  #print(record.id, total)

  if (total >= max_total):
    max_id = seq_id
    max_total = total

print(f"{max_id}\n{max_total}")


Rosalind_0808
60.91954022988506


In [None]:
#teste de mesa
seq_Rosalind_6404 = "CCTGCGGAAGATCGGCACTAGAATAGCCAGAACCGTTTCTCTGAGGCTTCCGGCCTTCCCTCCCACTAATAATTCTGAGG"
seq_Rosalind_5959 = "CCATCGGTAGCGCATCCTTAGTCCAATTAAGTCCCTATCCAGGCGCTCCGCCGAAGGTCTATATCCATTTGTCAGCAGACACGC"
seq_Rosalind_0808 = "CCACCCTCGTGGTATGGCTAGGCATTCAGGAACCGGAGAACGCTTCAGACCAGCCCGGACTGGGAACCTGCGGGCAGTAGGTGGAAT"

conta_G = str(seq_Rosalind_6404).count("G")
conta_C = str(seq_Rosalind_6404).count("C")
tamanho_seq = len(seq_Rosalind_6404)
print(conta_G, conta_C, tamanho_seq)


18 25 80
