# Problem

The GC-content of a DNA string is given by the percentage of symbols in the string that are 'C' or 'G'. For example, the GC-content of "AGCTATAG" is 37.5%. Note that the reverse complement of any DNA string has the same GC-content.

DNA strings must be labeled when they are consolidated into a database. A commonly used method of string labeling is called FASTA format. In this format, the string is introduced by a line that begins with '>', followed by some labeling information. Subsequent lines contain the string itself; the first line to begin with '>' indicates the label of the next string.

In Rosalind's implementation, a string in FASTA format will be labeled by the ID "Rosalind_xxxx", where "xxxx" denotes a four-digit code between 0000 and 9999.

Given: At most 10 DNA strings in FASTA format (of length at most 1 kbp each).

Return: The ID of the string having the highest GC-content, followed by the GC-content of that string. Rosalind allows for a default error of 0.001 in all decimal answers unless otherwise stated; please see the note on absolute error below.

### Sample Dataset
```
>Rosalind_6404
CCTGCGGAAGATCGGCACTAGAATAGCCAGAACCGTTTCTCTGAGGCTTCCGGCCTTCCC
TCCCACTAATAATTCTGAGG
>Rosalind_5959
CCATCGGTAGCGCATCCTTAGTCCAATTAAGTCCCTATCCAGGCGCTCCGCCGAAGGTCT
ATATCCATTTGTCAGCAGACACGC
>Rosalind_0808
CCACCCTCGTGGTATGGCTAGGCATTCAGGAACCGGAGAACGCTTCAGACCAGCCCGGAC
TGGGAACCTGCGGGCAGTAGGTGGAAT
```

### Sample Output
```
Rosalind_0808
60.919540
```

In [1]:
def read_fasta(file):
    """
    Reads a fasta file and returns a dictionary with the name of the sequence as key and the
    nucleotide sequence as value.

    Parameters
    ----------
        file : str
            Path to the fasta file

    Returns
    -------
        fasta : dict
            Dictionary with the name of the sequence as key and the nucleotide sequence as value

    Examples
    --------
    >>> read_fasta("data/sequence.fasta")
    {'seq1': 'ACGTGAGCTAGC', 'seq2': 'ACGTGAGCTAGC'}
    """
    fasta = {}
    with open(file, "r") as f:
        data = f.read().split(">")
        for seq in data:
            name = seq.split("\n")[0]
            nuc = "".join(seq.split("\n")[1:])
            fasta[name] = nuc
    return fasta

In [2]:
def gc_content(seq):
    if len(seq) == 0:
        return 0
    return (seq.count("G") + seq.count("C")) / len(seq)

In [3]:
def max_gc(fasta):
    max_gc = 0
    max_name = ""
    for name, seq in fasta.items():
        gc = gc_content(seq)
        if gc > max_gc:
            max_gc = gc
            max_name = name
    return max_name, max_gc

In [4]:
test = read_fasta("../files/GC_train.txt")
max_gc(test)

('Rosalind_0808', 0.6091954022988506)

## Solution

In [5]:
real = read_fasta("../files/GC_real.txt")
name, gc = max_gc(real)

print(name)
print(gc)

Rosalind_3678
0.5305498981670062
