In [1]:
import time

In [2]:
start = time.perf_counter()

In [3]:
with open('data/rosalind_gc.txt') as f:
    data = f.read()

### Problem

The GC-content of a DNA string is given by the percentage of symbols in the string that are 'C' or 'G'. For example, the GC-content of "AGCTATAG" is 37.5%. Note that the reverse complement of any DNA string has the same GC-content.

DNA strings must be labeled when they are consolidated into a database. A commonly used method of string labeling is called FASTA format. In this format, the string is introduced by a line that begins with '>', followed by some labeling information. Subsequent lines contain the string itself; the first line to begin with '>' indicates the label of the next string.

In Rosalind's implementation, a string in FASTA format will be labeled by the ID "Rosalind_xxxx", where "xxxx" denotes a four-digit code between 0000 and 9999.

Given: At most 10 DNA strings in FASTA format (of length at most 1 kbp each).

Return: The ID of the string having the highest GC-content, followed by the GC-content of that string. Rosalind allows for a default error of 0.001 in all decimal answers unless otherwise stated; please see the note on absolute error below.

Sample Dataset
```
>Rosalind_6404
CCTGCGGAAGATCGGCACTAGAATAGCCAGAACCGTTTCTCTGAGGCTTCCGGCCTTCCC
TCCCACTAATAATTCTGAGG
>Rosalind_5959
CCATCGGTAGCGCATCCTTAGTCCAATTAAGTCCCTATCCAGGCGCTCCGCCGAAGGTCT
ATATCCATTTGTCAGCAGACACGC
>Rosalind_0808
CCACCCTCGTGGTATGGCTAGGCATTCAGGAACCGGAGAACGCTTCAGACCAGCCCGGAC
TGGGAACCTGCGGGCAGTAGGTGGAAT
```

Sample Output
```
Rosalind_0808
60.919540
```

In [10]:
def function(data):
    strings = data.replace('\n','').split('>')
    id = ''
    pc = 0.0
    # ignore the first entry
    for i in strings[1:]:
        bases = i[13:]
        gc = bases.count('G') + bases.count('C')
        pci = gc/len(bases)*100
        if pci > pc:
            pc = pci
            id = i[:13]
    return id, pc

In [11]:
res = function(data)
print(f'{res[0]}\n{res[1]:6f}')

Rosalind_1869
54.526316


In [7]:
print(f'Finished in {time.perf_counter() - start:0.4f} seconds')

Finished in 0.1986 seconds
